Young Americans are passionate, yet unaware of politicians that share the views they are passionate about. Many of them will cast their ballots for the first time in the 2020 Presidential Election. We wanted to create an engine that could help them identify the politicians that are ambassadors for their views, and those who will serve them best if elected.

What it does

The frontend is a website at link. The user inputs text (views/opinions) of any size, which is run through our classifier. The output informs the user the likelihood that each contestant agrees with their views.

How we built it

The classifier (multi-label) is a DNN-optimized machine learning model that we created using the Google AutoML Natural Language Processing and Google Cloud APIs. We fed the model large sets of data- sentences and phrases that each candidate has spoken in public. The data was sourced from interviews, town halls, and twitter as these sources generally contain unscripted views of the politicians, important for mitigating input bias. This was done using web crawlers and regex, and also manually. We experimented with both single-label and multi-label classifying models, and found that the multi-label classifiers generated more insight in a similar runtime. The backend was built using a single API call to the Google AutoML’s API, and processing the results manually on the frontendThe frontend was built with HTML/CSS/JS and JavaScript. All the effects were created manually. The frontend is optimized for simplicity with a design that takes users less than 3 clicks to access the information they want.

Challenges we ran into

A lot of data had to be manually sourced as it existed in vastly different forms. Hence, there wasn’t a blanket automation technique we could employ.

It took over 5 hours for our classifiers to train; we were bottlenecked by the training run. Moreover, we weren’t sure if the classifier would have a high recall until after it had been trained. Hence, we ran parallel training operations on different datasets, confirming with the Google representatives that this wouldn’t increase the runtime of any one given training cycle. It resulted in an overall greater efficiency.

The scrawler script and tweet grabber programs were a solution to the difficulties faced when manually curating and synthesizing training data.

The security key that allows us to handshake with Google Cloud and AutoML expires every hour. Currently, we update this key manually. There is an ongoing effort to automate this/use a different authentication method.

Accomplishments that we're proud of

Our best classifier has an accuracy and recall of 94%- we fed it good data! We were able to deliver an end-to-end solution to move society in the direction of political literacy.

What we learned

How a machine learning model can be optimized to ensure convergence. The functioning of deep neural networks. The mind-boggling power and potential of NLP.

What's next for PoliMatch

We hope to further automating data collection and classifier input, increasing the overall data sourcing and training pace. Without the time constraints of a hackathon, we may be able to train a new model with a new, larger dataset (can train over the course of ~weeks). Our current classifier only supports ~10 candidates. We plan to double that number and include all running candidates with your support!

Share this project: