Web Scraper

Webscraper logo

Inspiration

Our inspiration comes from the summer internship at a bank. Wherein we observed the challenges faced by employees at banks to manually check for the news and track down the activities of their clients.

What it does

Our webscrapper, has a wide range of implementations. One of which is, it can helps financial institutions, government, investors help track of the people involved in money laundering, terror financing and other illegal activities. Further, companies and instituitions could use this algorithm for marketing effectively especially by tracking and understanding consumer behavioral change over time and according to the places and culture. Through this algorithm, the investors could stay on top of the news for the companies in which they are interested in, to help them stay updated with the company and improve their research effectiveness.Our webscraper scrapes data of news websites , uses an NLTK filter to filter out key words and compares the names occurring in the articles with the user database of the bank (made using MongoDB)

How I built it

We used a NewsAPI which helps us to fetch the news articles from a wide range of sources. Once we get the articles in our platform, we then filter the article to remove the noise and extract the relevant information through the use of NLTK library in python. In order to track down Money laundering and other related relevant fields, we compare the relevant information extracted from the article (such as Name of the person involved, in this case) and verify and update it in our MongoDB database to help banks and institutions keep track of it.

Challenges I ran into

-setting up a newsAPI -using NLTK to filter out Names -Building up a MongoDB database -establishing a connection between the MongoDB and python scraper

Accomplishments that I'm proud of



We are proud of accomplishing a web scraper which simplifies the process of tracking clients and increases efficiency while reducing the time taken as compared to the manual employee. Our biggest accomplishment is figuring out the APIs and using them effectively in order to create our service. We are very proud of having set up the MongoDB and synced it with our code.

What I learned

We learnt how to pull API's and get request with the API We also got a chance to use The NLTK library in python and filter out the important keywords getting rid of the noise and other junk We learnt how to work in a team with the maximum efficiency dividing the work load and coming up with real life simple solutions to practical problems

Interested in learning more about our project: Visit our GitHub page https://github.com/Mujtaba521/WebScraper

Built With

Updates

khuranajapnit Singh started this project — Nov 24, 2019 08:49 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.