Inspiration
Data sourcing is one of the main problems and most important part of data science pipe line. This python module is going to solve this problem.
What it does
It scrap's the data from different sites like kaggle, Indeed, Imdb, AI-jobs.net.
How we built it
I built it using selenium webdriver and some python code.
Challenges we ran into
There were challenges like selenium was not working on linux machine then I had to figure out how to fix that and then we solved it, I have updated the doc on github too.
Accomplishments that we're proud of
I scraped data cards from kaggle, jobs from Indeed, reviews from imdb, Ai jobs and salaries from ai-jobs.net.
What we learned
I learnt how to scrap and find out insights from the data. Creating word clouds.
What's next for Scraping data from sites like imdb, kaggle, Indeed, ai-jobs
I am thinking of building a generic webscraper which can automatically scrap data from a given url once the url is given as an argument. That's my end goal with this project.
Log in or sign up for Devpost to join the conversation.