Twitter Scraper

Inspiration

We wanted to challenge ourselves and try to web scrap for the first time.

What it does

This project scrapes tweets and other info from Twitter. The user can choose to search for any topic they want. The tweets are stored in dataframes and analyzed through histograms and scatterplots. Regression lines and correlation between certain factors are calculated. We also implemented code to display which user tweeted about the topic the most in our data and searched tweets that contained bad words.

Furthermore, we also had a list of bad / fake news websites and a list of prominent news websites and compared the tweets to them. From there we drew the summary for how many URLs were in our data, how many referenced bad websites and how many referenced prominent ones.

How we built it

We built this projecting using python jupyter notebook, selenium and chrome driver.

Challenges we ran into

One of the challenges we faced was getting the tweets to load in because we had to physically implement scrolling, the page would stop scrolling because it reached the end or if the connection was bad.

Accomplishments that we're proud of

We are proud that we able to scrap data from a website that is constantly updating and proud that we were able to come up with ways to take the info we gathered and analyze it.

What we learned

We had learned how to scrap data from a website using selenium and chrome driver. We wanted to try an alternative rather than using an API and we learned how efficient and useful APIs actually are because of how many issues we faced in this project that would not have happened if we had used an API. We also learned how to plot different graphs using pandas and matplotlib.

What's next for Twitter Scraper

In the future, we want to learn more about Machine Learning and maybe do sentiment analysis on the tweets in our data. Or, we may switch to an API to gather more data and do a more thorough analysis of the data.

Built With

Updates

Amy Wang started this project — Nov 08, 2020 01:03 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.