Inspiration
We wanted to challenge ourselves and try to web scrap for the first time.
What it does
This project scrapes tweets and other info from Twitter. The user can choose to search for any topic they want. The tweets are stored in dataframes and analyzed through histograms and scatterplots. Regression lines and correlation between certain factors are calculated. We also implemented code to display which user tweeted about the topic the most in our data and searched tweets that contained bad words.
Furthermore, we also had a list of bad / fake news websites and a list of prominent news websites and compared the tweets to them. From there we drew the summary for how many URLs were in our data, how many referenced bad websites and how many referenced prominent ones.
How we built it
We built this projecting using python jupyter notebook, selenium and chrome driver.
Challenges we ran into
One of the challenges we faced was getting the tweets to load in because we had to physically implement scrolling, the page would stop scrolling because it reached the end or if the connection was bad.
Accomplishments that we're proud of
We are proud that we able to scrap data from a website that is constantly updating and proud that we were able to come up with ways to take the info we gathered and analyze it.
What we learned
We had learned how to scrap data from a website using selenium and chrome driver. We wanted to try an alternative rather than using an API and we learned how efficient and useful APIs actually are because of how many issues we faced in this project that would not have happened if we had used an API. We also learned how to plot different graphs using pandas and matplotlib.
What's next for Twitter Scraper
In the future, we want to learn more about Machine Learning and maybe do sentiment analysis on the tweets in our data. Or, we may switch to an API to gather more data and do a more thorough analysis of the data.
Log in or sign up for Devpost to join the conversation.