Inspiration
What it does
The project aims at classifying a news article into three political categories - Democratic, Republican and Neutral. We have developed a chrome extension which makes it easier to use. One you visited a page just click the extension to know which category the article belongs to.
How we built it
The project is considered of mainly four parts. First is the user facing part which is the chrome extension. It provides an easy way to use the API as it is always accessible whenever you browse any page. The extension extracts the relevant text using Mercury parser. It removed unwanted things like related stories and advertisements. The extension calls an API which is hosted on AWS. This is the second part of the project. We are using Node.js for server and behind the scenes it invoked the Python classifier and returns back the result. The next two parts are data collection and training the model to predict the categories. For the data we considered speeches given by political leaders and extracted the important words. We used two sources of data. Once was publicly available pre-processed data. For getting data which is more specific to the political parties, we crawled http://www.ontheissues.org/default.htm for quotes given by the political leaders. For learning we explored n-gram and long short term memory. We wrote the n-gram classifier in Python using NLTK library. We cleaned the data by removing symbols and stop words and them applied stemming on the words. We then calculated the term frequency and bi-gram frequency for articles of the three categories. For the test text we did the same pre-processing and calculated the above values to get a similarity metric. Based on the the metric the prediction was made.
Challenges we ran into
Low amount of labeled data. More exploration needed in the classifier to get better results.
Accomplishments that we're proud of
Making chrome extensions Setting up AWS to run train and run classifiers and learning N-gram/LSTM to have a complete end to end product.
What we learned
N-gram classifiers, long short term classifiers, AWS, building chrome extensions
What's next for Political Bias
We can get more data and predict the categories accurately. Also we can filter out fake news and get feedback from user to improve the classifier as it is used.
Log in or sign up for Devpost to join the conversation.