All of us live in a filter bubble. We tend to dismiss the opinions that don't fit our world view as products of an idiot's mind. However there can't be billions of idiots out there. Most people are not stupid but have a different perspective. No one saw Trump and Brexit coming, and they showed us that filter bubbles are more than real and we should do something about them.
What it does
ReFrame lets you explore other opinions and sources for any story you're reading. Based on mentioned topics you are presented with a list of closely related articles, however unlike on social media, special emphasis is given to stories that conflict with the world view presented in the article. ReFrame also highlights the articles' ideological attitudes towards different concepts mentioned in both articles, so you can freely explore different viewpoints.
How we built it
We use a python server with asyncio to continuously scrape dozens of diverse news sites (10509 articles and counting), currently in the focus area of conservative and liberal English-speaking media. Each article is analyzed for entity concepts with dbpedia-spotlight, ideological attitudes are estimated using entity-related sentiment analysis and an analysis of the subjectivity of language. We use spacy and textblob for both.
Given an article to analyze we perform these same steps and rank related articles based on extracted keywords, concept overlap and score them with an expected opinion conflict based on sentiment analysis and subjectivity as well as bias of the top level news site.
Challenges we ran into
- We need to think outside of our filter bubble and research news sites that bring in articles outside of peoples' filter bubbles.
- Async is hard on your head but makes quite a difference on crawling speed
- Classifying ideology is not trivial
- Scrollbars do what they want
- UX of comparing multiple articles based on multiple topics
- How to find related but different articles
- How to update the server when the person with access is sleeping
- Preventing rate limiting
Accomplishments that we're proud of
- "borrowed" more than 1GB of HTML of more than 38 news sites
- nice frontend design (!!!)
- relatively clean backend code (no idea how that could happen)
- fully asynchroneous crawler
- it actually works in the wild