As social media continues to gain importance in the world of politics, we realized that the massive trove of data available online holds the potential to provide meaningful analysis of opinions on legislation.
What it does
Our program analyzes bills to identify keywords, then pulls and analyzes Tweets from a variety of important political figures, and feeds the resulting data into a trained neural network that outputs a confidence measure in the strength of the bill.
How we built it
Using a variety of scraping and parsing tools, we identified the handles of politically-important Twitter users. Then, we experimented with Twitter's API and Microsoft Azure to pull Tweets and filter out irrelevant ones by keyword. Finally, we processed the Tweets from each user with Python's Natural Language Toolkit and used TensorFlow to create and train a neural network that takes in the output from the Toolkit.
Challenges we ran into
Time and time again, we were faced with scalability challenges, as we pull and analyze hundreds of thousands of Tweets. Although we were often held back by API rate limits, we managed to optimize our code to produce analysis capable of updating in real-time.
Accomplishments that we're proud of
With just a little prior experience in ML and none with Twitter, we were able to design, code, and train two neural networks, and identify the most important Tweets.
What we learned
We definitely learned that it's important to optimize code early on, because problems with scale can become almost impossible to solve otherwise.
What's next for PredictaBill
We hope to gather more data to train a more advanced, accurate neural network, and continue to optimize some of our code that limits our ability to parse data at speed, eventually providing a valuable, new analysis for the political sphere.