Inspiration

I just wanted to strengthen my NLP skills, learn web scraping, and how to build custom dataset from scratch. Finally, I wanted to explore functionalities of spacy NLP library.

What it does

It builds a web scraper.
Uses spacy library to parse the news text data.
Performs NLP processing such as Tokenization and Lemmatization.
Does Dependency Parsing.
Builds Named Entity Recognition(NER).
Visualizes both the dependency parsed tree and NER.
Finally, it builds a dataset from the unstructured and scraped text data

How we built it

I used python request library for the web scraping. Then, I used spacy library for other NLP functionalities.

Challenges we ran into

Didn't really run into any in particular.

Accomplishments that we're proud of

I am happy that I can build a custom dataset from an unstructured text data such as news articles.

What we learned

Web Scrapping and Custom Dataset creation.

What's next for Web-Scraper-NER-Dataset-Builder

I would like to expand the functionalities, in particular, enabling it to scrape any type of data, build a UI interface using framework such as streamlit. In this way, any one or developer/data scientist who wants to do web scraping can have an automated tool that empowers him/her to that.

Built With

Share this project:

Updates