Election day 2016 - shock across the country as the electoral votes were tallied. How had polls, projections, newspapers and experts all mis-predicted the results. Perhaps, they didn't have a true sense of the political pulse of the country. By utilizing tweets, which serve as a reflection of the thoughts, feelings, and ideas of people around the globe, and analyzing the tweets one may glean otherwise overlooked information. On election day, twitter is especially active, with handles voicing opinions regarding their parties and favorite candidates. With so much data, our team was intrigued by the possibility of predicting the leading party in a state based on the thoughts and feelings published in America's twitter feeds.

What it does

Our project looks at over 400,000 tweets from election day (November 8th) 2016. Our data extraction process extracts tweets on that day and 32 features of each tweet such as the user who posted the tweet and where the tweet was posted from. Utilizing natural language processing (NLP) and a pre-trained neural network, we assigned each tweet a political ideology score, ranging from 0 to 1 where 1 indicates Democratic and 0 indicates Republican. We aggregate the tweets by state and hour to get a real-time experience of how the political sentiments on Twitter progressed throughout election day. Using the aggregated score for each state we then predict the overall result of the election by predicting the allocation of electoral college votes. Our front end design displays this data on an interactive map of the United States which updates temporally.

How we built it

First, we collected over 400,000 tweets from November 8th, 2016. Then, we extracted and transformed this data, filtering based on location in the US, political relevance, and several other features. Next, we focused on creating a neural network model that could predict the political ideology based off textual data. To do so we utilized a dataset of tweets from Congressmen and Congresswomen with the known label being their political affiliation. We created and trained this model using TensorFlow, Scikit-learn, NLTK, Keras, and a LSTM neural network. Additionally, we tuned the parameters to optimize our result. On our test set (60/20/20 split), our neural network correctly predicts political ideology 75% of the time. Once the model was trained, we pickled it and ran it on the election day tweets. Specifically, we clustered the election day tweets by state and hour. Once the model predicted the ideologies for each tweet in all our clusters, we normalized and scaled the results. The final result indicated hour-by-hour whether a state was more likely to vote Democrat or Republican just from the citizens' tweets. Our front-end data visualization utilizes javascript, HTML, and D3.js to create a temporally reactive representation of each state's political affiliation. Moreover, the front end utilizes the predicted affiliation to predict electoral college votes and an overall winner based on the current data.

Challenges we ran into

Some challenges that arose during the development process were the cleaning and pre-processing of the data since the formats were not standard. Specifically, we wanted the ensure all the data captured was appropriated stratified and relevant. Another challenge we faced was the sheer volume of data we were handling and training our model on. In order to meet the time-constraints we parallelized the process. Finally, due to the subjective nature of political ideologies it was challenging to find an appropriate dataset to train the model on with accurate labels.

Accomplishments that we're proud of

We are extremely proud of jointly creating an application with real-world applications and successfully solving many complex, challenging problems. It is exciting that in such a short time span we were able to create a neural network with 75% classification accuracy. Most importantly, however we are proud of working together to create a beautiful visual representation of extremely nuanced and relevant data.

What we learned

Our team truly worked together and every member, from our data processing specialist to our front-end developed, learned and contributed to various aspects of the project. We discussed decisions, challenges, and goals as a team and learned everything from how to train a neural network to how many electoral college votes Vermont has.

What's next for Tweet to Vote

We hope to start pulling real-time Twitter data for the U.S. and constantly updating the map to represent the political climate of the nation. Although, there are several challenges in doing so, we are excited to being tackling the task.

Note: Please open in Firefox at 80% zoom

Share this project: