What it does
This project is a to the point Python notebook on web crawling. I decided to use the The Biota of North America Program (BONAP for short) website for scraping since it has no JavaScript that loads part of the site and it contains easy to read HTML and CSS.
Challenges I ran into
Time was a real constraint since I had to optimize the time it took to scrap BONAP's web site. It started out with me waiting for a process to finish in 30 mins. With my starting development being so slow, I had to spend the majority of my time researching coding practices for optimizing web scraping and indexing performance.
What I learned
- I learned a lot about the differences of web scraping and web crawling.
- The most important knowledge I gained was how to work with Jupyter notebooks.
- I learned how to use the Python package Scrapy.
Log in or sign up for Devpost to join the conversation.