Inspiration

Supacrawl was inspired by tools like Selenium and Puppeteer. Wanted to scale crawling websites for data extraction while making it into an easy to use API.

What it does

Supacrawl simplifies crawling pages for LLMs by turning websites into clean, easy-to-read markdown. It’s a straightforward tool that gathers the web content you need, fast and hassle-free.

How we built it

Made with Nextjs and website crawling with Puppeteer. The service creates clean Markdown for LLMs and uses Elevenlabs to turn a webpage into audio automatically.

Challenges we ran into

Web scraping is very difficult to do at scale. Many websites have anti scraping measures set up. In the future we will improve access to more sites.

Accomplishments that we're proud of

When the service actually returned the scrapped data. This allowed us to use Elevenlabs to convert the result of the webpage into audio.

What we learned

Learned about crawling websites at scale. This is important for projects looking to do bulk scraping for things like price comparison websites,

What's next for SupaCrawl

We are going to continue to improve the types of websites we can crawl. Some websites block requests so we will create new ways to access them.

Built With

  • elevenlabs
  • nextjs
  • vercel
Share this project:

Updates