Young Americans are passionate, yet unaware of politicians that share the views they are passionate about. Many of them will cast their ballots for the first time in the 2020 Presidential Election. We wanted to create an engine that could help them identify the politicians that are ambassadors for their views, and those who will serve them best if elected.
What it does
The frontend is a website at link. The user inputs text (views/opinions) of any size, which is run through our classifier. The output informs the user the likelihood that each contestant agrees with their views.
How we built it
Challenges we ran into
A lot of data had to be manually sourced as it existed in vastly different forms. Hence, there wasn’t a blanket automation technique we could employ.
It took over 5 hours for our classifiers to train; we were bottlenecked by the training run. Moreover, we weren’t sure if the classifier would have a high recall until after it had been trained. Hence, we ran parallel training operations on different datasets, confirming with the Google representatives that this wouldn’t increase the runtime of any one given training cycle. It resulted in an overall greater efficiency.
The scrawler script and tweet grabber programs were a solution to the difficulties faced when manually curating and synthesizing training data.
The security key that allows us to handshake with Google Cloud and AutoML expires every hour. Currently, we update this key manually. There is an ongoing effort to automate this/use a different authentication method.
Accomplishments that we're proud of
Our best classifier has an accuracy and recall of 94%- we fed it good data! We were able to deliver an end-to-end solution to move society in the direction of political literacy.
What we learned
How a machine learning model can be optimized to ensure convergence. The functioning of deep neural networks. The mind-boggling power and potential of NLP.
What's next for PoliMatch
We hope to further automating data collection and classifier input, increasing the overall data sourcing and training pace. Without the time constraints of a hackathon, we may be able to train a new model with a new, larger dataset (can train over the course of ~weeks). Our current classifier only supports ~10 candidates. We plan to double that number and include all running candidates with your support!