by r_singh on 8/7/24, 9:16 AM with 3 comments
by r_singh on 8/7/24, 9:44 AM
I have been working on this for a long 3 years. I have paying customers, they are mostly brands or agencies that are looking to collect data for a business intelligence use case. Sometimes university professors use it too. I have been serious about improving it over the last 1 year as demand is surging since the popularity of AI. Bootstrapping a niche business has been a rollercoaster ride for me as a software programmer but I've been fortunate to receive support from customers and it has been fun to learn by doing different things along the way.
I never shared my project on HN because I know that many developers here don't appreciate web scraping projects and honestly I understand their point of view. However, imho public data (especially UGC) should be accessible by machines for permitted use cases like research and analysis tools that attribute the source and don’t engage in malpractices like plagiarism or publishing duplicated copyrighted information. I've gotten good at bypassing bot detection tech along the way and plan to release a general purpose HTML and MD API that works with almost any source including sites like LinkedIn. They're currently in a alpha. If you are interested in using either, please feel free to reach out to me at raunaq@unwrangle.com and I'll be happy to send an API key and docs your way.
I've been thinking a bit about building useful agents or a service around making agents undetectable as well but I haven't started yet because I think agents are still at a very experimental stage and monetising them would be a challenge for me as a solo bootstrapper.
The page linked is a reviews scraper, it scrapes all reviews a for any Yelp listing and makes it available as a CSV to download or as a JSON sent to a web hook. The product also offers a way to scrape search results from Yelp, here's a link to that API: https://docs.unwrangle.com/yelp-search-api/
by server_man3000 on 8/7/24, 11:46 PM
Nice tool btw