Inspiration

Social media platforms are great in understanding your preferences and recommending personalized content.

However, they are so good at what they do that they start showing only what you already agree with.

You will stop seeing updates from that friend with a different political view, religious belief or anything that really doesn’t match your existing opinions.

As Eric Schmidt once said:

It will be very hard for people to watch or consume something that has not in some sense been tailored for them.

This is called a filter bubble, where your timeline is filled with only things you agree with.

And ultimately, this a huge problem for our society.

No one can make an educated judgement about politics (or anything, really) by just hearing one side of the story.

And this led me to wonder:

How can I make sure we're hearing other viewpoints when we don't even know what we're missing?

How can I help social media to connect us all together?

How can I help people to read content that challenges their existing beliefs?

And ultimately, how I can help people to be more open-minded and create a more tolerant society?

The answer to all those questions is Unbiased.

What it does

Unbiased is a web browser extension that detects political bias in your Facebook news feed and automatically suggests alternative views.

How I built it

Unbiased has five main components:

  • Browser extension
  • API
  • Political news classifier
  • Political news stance classifier
  • Recommendation system

1. Browser extension

The browser extension was developed using Javascript and Chrome Extensions API.

It has two main files: script.js and background.js.

script.js

This script is responsible for monitoring changes in your news feed, and detecting which of the items are news. I did this by selecting all news feed items that have a link, which of course, may contain non-political content.

This is why we need to send the headline to the suggestions API, which will detect if the headline is related to politics or not.

As this is script doesn't have permission to run CORS requests, we need to send a message to the background.js script, which will then call the API.

background.js

After receiving a message with a headline, the background.js script calls the /suggestions endpoint. Finally, after receiving the response, it sends the response back to the main script, which will then render the templates with the alternative news returned from the API.

2. API

The API was built using Python 3 and Flask.

GET /suggestions?headline={headline}

The main endpoint called by the browser extension is the /suggestions endpoint.

It receives a query param headline and returns a list of alternative views: one that is central and other that is opposite from the headline sent.

An example request and response can be seen below.

Request

curl https://unbiased.us/suggestions?headline=Coronavirus+Is+Officially+A+Pandemic,+But+What+Does+That+Mean?

Response

{
  "headline": "Coronavirus Is Officially A Pandemic, But What Does That Mean?", 
  "suggestions": [
    {
      "domain": "economist.com", 
      "id": "6bce3f764437a8997a732b2a731db514dd3dea36", 
      "image": "https://www.economist.com/sites/default/files/20200208_IRD001.jpg", 
      "link": "https://www.economist.com/international/2020/02/05/scientists-are-racing-to-produce-a-vaccine-for-the-latest-coronavirus", 
      "similarity": 0.8283491563149442, 
      "stance": "central", 
      "title": "Run, don\u2019t walk - Scientists are racing to produce a vaccine for the latest coronavirus | International | The Economist"
    }, 
    {
      "domain": "slate.com", 
      "id": "466acd332a765c9df5a56c91faf6d9eeae34b4ca", 
      "image": "https://compote.slate.com/images/cbf40c69-4949-4638-b7a5-7ba81880bd39.jpeg?width=780&height=520&rect=1560x1040&offset=0x0", 
      "link": "https://slate.com/technology/2020/02/coronavirus-vaccine-possibility-sars-wuhan-research.html", 
      "similarity": 0.9121501602756262, 
      "stance": "left", 
      "title": "How close are we to a coronavirus vaccine?"
    }
  ]
}

There are two other endpoints available: /is_political and /political_stance.

GET /is_political?headline={headline}

Detect if the headline is related to politics or not.

{
  "political": true
}

GET /political_stance?headline={headline}

Return the headline political stance (left, center or right)

{
  "stance": "right"
}

3. Political news classifier

Data

I have used Webhose.io to download thousands of news in different categories.

Training

You can follow the training process in this Jupyter Notebook file.

Saving the model

After finish training our model, we store our model data in three files:

  1. political_classifier.pth

    Contains the model weights, saved using torch.save()

  2. political_classifier.cfg

    Contains a serialized config file of the model, so we don't need to hardcode the model params.

config = {
    'vocab_size': VOCAB_SIZE,
    'labels':  {
         0 : "Non-political",
         1 : "Political"
    },
    'ngrams': 2,
    'embeddings_dim': EMBED_DIM    
}
  1. political_classifier.vocab

    Contains the model words vocabulary.

4. Political Stance classifier

Data

I have used Webhose.io to download thousands of news from a list of 67 newspapers.

Each newspaper was manually classified into EXTREME_LEFT, LEFT, LEFT_CENTER, LEAST_BIASED, RIGHT_CENTER, RIGHT, EXTREME_RIGHT, and the journal stance was used as a proxy for each news stance.

Training

You can follow the training process in this Jupyter Notebook file.

Saving the model

Similar to the previous model, we have also stored our model in three files.

5. Recommendation system

The recommendation system module is responsible for suggesting alternative news based on a headline and a stance.

It works by comparing the embeddings (GloVE) of the headline with the embeddings of all the other news given a specific stance.

To compare, we use cosine similarity and we return the most similar news.

As computing the embeddings every time for all news would be slow, we built an in-memory cache of the embeddings so that the comparison can run faster.

Challenges I ran into

Initially, the idea was to have a more fine-grained classification of political stance (EXTREME_LEFT, LEFT, LEFT_CENTER, LEAST_BIASED, RIGHT_CENTER, RIGHT, EXTREME_RIGHT).

However, this proved to be difficult, especially due to the lack of enough training data.

Also, only using the headline isn't always enough to classify the stance correctly, as we don't have enough context.

Another challenge was the fact the extremes (both right and left), sometimes got mixed up. From my analysis, this was due to the fact that both sides tend to use strong and emotional words, and this was challenging for the classifier to detect.

Accomplishments that I'm proud of

  • Built application with real-world social impact
  • Trained model that automatically detect political news from its headline
  • Trained model that automatically detect political stance (left, center, right) from its headline
  • Built a recommendation system that suggests alternative political views

What I learned

  • Learned how to build and train machine learning models in PyTorch (I had previously only used TensorFlow)
  • How to detect political bias in news headlines (and how hard this is)

What's next for Unbiased

  • Expand to other social networks
  • Get more data to improve the classifiers
  • Improve the models by also passing part of the news content as a context
  • Improve the political stance classifier for the extreme views
  • Add more domains other than politics (fake news, hate speech, sexism, racism)

Built With

Share this project:

Updates