Conservative and Liberal Bias Detection in Political Media using Natural Language Processing and Neural Networks
Eliza Berman (eberman3) and Maia Mongado (mmongado)
Read our full written report here.
Introduction:
Biases infiltrate many aspects of spoken and written language. In the sphere of politics, media outlets, as well as individual journalists, often have strong political motivations that can lead to biased sentences in their published works. In our current polarized political climate, the quantity of media being produced from both conservative and liberal outlets has risen greatly. This has created an incredibly useful challenge for NLP researchers to automate the detection of political bias. In the paper that we are implementing, Political News Bias Detection using Machine Learning, the author Minh Vu describes the objectives of building a MultiLayer Perceptron model that characterizes articles by the percentages of liberal, conservative, and neutral sentences. This is an opinion classification problem. We believe this is a nontrivial area of research as the political biases that we are fed can greatly influence our own internalized biases moving forward, which can, in turn, cyclically feed us more political media with those biases. Therefore, it is important to be able to acknowledge and recognize biases present in the content we consume.
Related Work:
We are basing our work off of the paper, Political News Bias Detection using Machine Learning. This article details an implementation that takes as input a political article and outputs the percentages of liberal, conservative, and neutral sentences in the article.
Beyond this paper, which we are re-implementing in part, there are a number of related works relevant to this topic. For example, in the paper, Political Ideology Detection Using Recursive Neural Networks, the authors design a framework that uses RNNs (as opposed to location-invariant approaches such as bag of words) to determine the political ideology at the sentence level. Additionally, in the paper, Detecting Political Bias in News Articles Using Headline Attention, the authors use an attention-based approach to predict political ideology based on headline. By using an attention-based approach, the authors are able to pay more attention to “critical content.” These papers demonstrate different approaches to prediction, through RNNs, attention, MLPs and CNNs.
Data:
Our dataset is the Ideological Books Corpus (IBC), a 2013 dataset of 4062 sentences annotated with political ideology (conservative, liberal, neutral). It has 2025 liberal sentences, 1071 conservative, and 655 neutral. It was crowdsourced through Crowdflower. Each sentence is represented by a parse tree where annotated nodes have a label {conservative, neutral, liberal}. So, it will require a bit of preprocessing to get it into a form ready to be passed into our neural network - Iyyer et al., 2014 includes a Python script on how to access the sentences, phrases, and annotations, so we will most likely be making various modifications to this script.
Methodology:
We have found various papers on different neural networks trying to detect political biases. We are basing our work primarily off one that uses an MLP (rather than the traditional RNN for language learning) and fastText (similar to word2vec in that it obtains vector representations of words but has higher success with rarer words). It turns the words/sentences into vector representations with fastText then passes it to the MLP, which then outputs the classification of the sentence. The best results (81% accuracy) came with parameters (hidden_layer_sizes=(500, 20, 20, 20), max_iter=500, batch_size=32, warm_start=True, early_stopping=True), so we will most likely experiment with those in our architecture. We believe the hardest part about implementing our model will be determining MLP architecture - the paper we are primarily working with does not have too much specific info on what the actual MLP architecture they used is, so that will likely take a lot of experimenting with to determine how many layers we will use.
Metrics:
We plan to run experiments on our multilayer perceptron model by tuning the hyperparameters to see which combination yields the highest accuracy (measured by precision, recall, and F1 score).
In the context of our project, the notion of accuracy refers to the number of sentences that were correctly labeled with the appropriate political ideology. We will use precision, recall, and F1 score as metrics for measuring the accuracy of our model. Precision is the total correctly predicted positives divided by the total number of positives. Recall is the total correctly predicted positives divided by the total number of observations. F1 score is the weighted average of precision and recall.
The authors of the existing project planned to use the metrics that they collected during experiments to optimize the hyperparameters of the MLP. The parameters that they optimized were: hidden_layer_size, max_iter, batch_size, and early_stopping. They also used precision, recall, and F1 score.
Our base goal is to create a working architecture for our model that generates the desired metrics and correctly preprocesses and utilizes our data set. Our target goal is to achieve 70% accuracy on the multilayer perceptron. Our stretch goal is to implement RNN in addition to the multilayer perceptron so that we can compare performance across two different types of deep learning models.
Ethics:
What broader societal issues are relevant to your chosen problem space? In the US today (and internationally) politics affects every sector of society. More specifically, skewed information and bias has become a very pressing issue; most people depend on the news to get basic information about most national and international events, but the agenda and truthfulness of various news outlets are now constantly being called into question. This can lead people to cast misinformed votes, reaffirm historical prejudices, and generally plays a major role in the pipeline to political extremism - when people cannot trust their sources to be neutral, paranoia and hostility quickly set in. So, identifying bias in news sources is of the utmost importance to societal issues such as the right-left divide in this country, the growing prominence of the alt-right, and in general all major political issues. What is your dataset? Are there any concerns about how it was collected, or labeled? Is it representative? What kind of underlying historical or societal biases might it contain? Our dataset is the Ideological Books Corpus (IBC), a 2013 dataset of 4062 sentences annotated with political ideology (conservative, liberal, neutral). It has 2025 liberal sentences, 1071 conservative, and 655 neutral. So, right away, it seems to be skewed in the liberal direction, which could pose a problem in detecting conservative/neutral biases. Also, political ideology itself is tough to pin down - what is considered a conservative standpoint now might have been different even just eight years ago when the IBC was originally annotated. So it could have biases from the political environment in 2013 in its annotation. The annotations were also crowdsourced through Crowdflower, so the annotations might be inconsistent - after all, what seems like a moderately conservative viewpoint to one person may seem moderately liberal to another.
Division of labor:
Maia - I will be responsible for obtaining the dataset, possibly finding additional datasets, and preprocessing it to get in a form ready to pass through the fastText then MLP model. Eliza - I will be responsible for finding compute resources for the neural network and testing it on political news articles. Both - We will both be responsible for setting up the initial training pipeline, tuning hyperparameters and architecture, and doing the final research paper.
Log in or sign up for Devpost to join the conversation.