Among many things, news cycles are jam-packed with polls. Turn on the nightly broadcasts and you'll be hit by a wave of numbers, from the latest statistics about the percentage of millenials who believe in god to the number of adults between the age of 65 and 80 that use Spotify. During election years, in particular, political polls seem to be all the rage- every other headline seems to be some variation of "[politician]'s popularity dropped by [x] points following [insert significant world event here].
So famous media outlets like CNN or Fox deem that political polls make great headlines, but how accurate are they? How much faith should we put into polls, particularly political polls that aim to predict the outcome of major elections? Who conducts these polls, and how do they decide who to ask? And how up to date are the numbers?
What it does
Given how politically charged this particular election year has been, from the COVID-19 outbreak to the Trump impeachment trials to the George Floyd movement, our team thought it'd be interesting to present our take on an alternative to political polls.
PolitiSense scrapes data on hundreds of thousands of headlines, posts and comments across the internet, gathering the opinions of the masses from tweets on Twitter or subreddits on Reddit while also sifting through volumes of news headlines from major media outlets such as Fox or CNN. We then see what people are saying about a particular political figure at any given time, analyzing sentiment with word content and level of engagement (number of likes, retweets, upvotes, etc) and aggregating into a single sentiment score.
Using the search feature, find out how the internet feels about a political figure in real-time!
How we built it
- Flask for serving templates and running scraping tasks
- MongoDB to store scraped data
- Python and various APIs (beautiful soup, pushshift, tweepy, etc) for scraping data
- NLTK for sentiment analysis
- Google cloud hosting backend
Challenges we ran into
- various frustrating front end tasks (e.g. the search bar that wouldn't center)
- Getting backend functions up and running on google cloud
- Scraping data from a variety of sources on the internet
- Formatting data for sentiment analysis
- Handoff between front end and back end
- Trying to get a twitter developer account, panicking because you googled and it said that the average developer account gets approved in 19 days, but then realizing you still have a valid set of authentication tokens from a twitter bot tutorial from the high school coding club five years ago
Accomplishments that we're proud of
First time our team has built a coherent end to end system from front end to back end, so we're pretty excited about that. Also proud of the animation of the search bar when the webpage first loads because god knows how long that took
What we learned
Exposure to various APIs, but mainly learned how to smoothly connect the backend to the frontend to deliver a product.
What's next for PolitiSense
Our team thinks there's a lot more that can be done-- our minimum viable product implements only the barest of features, which is the ability to search one politician at a time, with data only from one day prior. In the future, we'd like to add:
- filters adjusting time range of data scraped
- analysis of sentiment on not just politicians, but also political issues or current events in general
- general dashboard that shows data breakdowns of how the aggregate sentiment score was achieved (for example, the sentiment scores from Reddit vs Twitter vs CNN)
- more data being scraped (add in facebook, more subreddits, instagram, etc)
- Live sentiment tracker for events such as political debates
- improve sentiment score calculation algorithms
- tidy up code because there were some questionable segments in there