REALIGN

Cover image

Problem

Online news has flooded our lives. In recent years the creation and consumption of content reached levels never seen before. While sailing in this ocean is harder to find an authentic source than ever before. In turbulent times like the COVID-19 infodemic:

Increased waves of fake news spread across the internet and in a filter bubble you are also less likely to receive news containing conflicting information thus you are extremely exposed to the dangers of misinformation.

Solution

Our idea is here to guide readers navigating like the sextant helped sailors back in the days of celestial navigation. Orientation on open waters can be done with the help of triangulation and by measurements relative to the stars.

And that is metaphorically the same what you can do with REALIGN.

Product

REALIGN is the browser extension that helps you realigning your knowledge with reliable news and helps to maintain unbiased online reading habits.

What it does

Triangulation & Alignment

The final product (currently MVP) is a Chrome extension that offers 3 relevant articles for every link. The 3 articles are chosen from the following categories by an AI:

trusted sources to keep you safe (ALIGNMENT via validated fix points)
opposing perspectives to discover more
similar news to help explore deeper

Our solution is fighting against FAKE news spreading not by judging, scoring, or hiding any of the contents, but by offering alternate approaches (based on REAL/valid information) on the topic in a user-friendly way.

Features

Our extension places an icon next to each article and by hovering the mouse over it, it expands and displays 3 suggested articles' titles, domain, and category for them to provide alternative readings for you.
The icon sizes are dynamically representing the size of the clusters that the article is related to the most. With the help of this, you can get a quick map of the topic, based on the link you are viewing.

(See video demo)

How it works

Crawling articles - The backend crawls the web looking for articles in a whitelisted domain list. We decide if the page is an article based on the format (we use Firefox's readerability.js for this). If we find an article, we save the text content and the links of the main content.

Classify articles - A trained ML (PyThorch) algorithm clears and interprets the text and title of the article. It is looking for predefined categories the make our recommendation on point.

Create a connection graph - The world wide web is basically a graph. We save links between sites/domains from articles. This information forms a graph where domains are the nodes and the number of links between the domains.

Recommendation - We are looking for articles in the same category, with a sufficient amount and quality of links pointing to and from these pages. We choose 3 based on the distance from the domain of the original article in the connection graph.

Competition and differentiation

Other existing solutions rely heavily on users reporting and revision of fake items, our product doesn't. Content curator tools serve different goals (focus on professional content creators) and lacking the ability to suggest balanced sources. Filter bubble elimination extensions are focused mainly on politics and also can't tell if the sources read by the user contain misinformation. However, large social platforms are joining the fight against false information, these are not open source and collect data of the user.

USP and impact on a personal level

The main differentiator is our approach to how REALIGN impacts user habits: unlike other similar devices, we don't provide a centralized answer for what is to believe and what is not. Of course, we suggest clean articles only and beyond that guide the user towards a better content diet.

The core of the product is the habitual change that it can promote. You can get used to searching for balanced and quality content, thus not only the quality of consumed information is increased but you can also be a better person with better arguments and decisions.

This way this product can be useful in many ways after COVID-19 in the long term by adding more topics.

Business model

We ❤️ Opensource.

Our values - We believe that this service must be free and open in every single way. This means no price label, no advertisement, no data trade, no patents, and no secrets. Transparency, open-source, and social contribution first.

Made by you and me - We can’t do this alone and we won’t. One of our main challenges is to involve as much brain-power as possible. The contribution is the most anyone can pay.

Found by the people, indirectly - We are not looking for direct backing from the users of our product. In the long term, we trust in NGOs and civil initiatives as patrons/maintainers whose mission is aligned with ours and are willing to improve REALIGN together.

Impact on social level & benefits

Every perspective adds to the whole picture of Information democracy

It works like a vaccine. - Societies could be trained for the next generation of more sophisticated fake news by practicing unbiased reading habits.

Starts the conversation between social silos - People could understand arguments they don't agree with before they judge them.

Slows down media market polarization - This could be the evidence that there's still a market for balanced journalism.

Powers up research - Our databases could be a great asset to open scientific advancements.

Costs and responsibilities

Scale it - Scalability issues can be cured by smart people and money. To make the infrastructure as cheap as possible, the whole system architecture is public. However, we still need machines: Based on our current estimations, serving 10,000,000 pages/day would cost about 3k EUR/month in Amazon’s AWS or about 10 machines on-premise.

Keep it work - The internet changes fast so our solution needs to keep pace and adapt. We count on the OpenSource community (and ourselves) to handle this by Issues and Pull Requests on GitHub. The additional cost is to organize/maintain the community behind the codebase.

Execution plan

What have we done during the weekend

Explored the problem and created service and system concept
Reviewed already existing solutions and their approach
Created an English topic detector ML model for a couple of topics
Wrote a dummy crawler and downloaded about 150k articles from the internet
Parsed the articles and stored the links between the sites
Created a test chrome extension to see it in action
Developed the brand of the solution with the help of DIS.CO
Designed the visual identity concept of the product: a generative dynamic visual identity (DVI) which is driven by the data gathered from the articles. This way the brand is always changing as the corpus of the information available in the investigated online articles.
We created a 100% custom presentation of the concept and branding. (See attached.)

What we have right now

Proof of concept snippets for crawling, categorize, recommend and show articles
Profound, feasible execution concept

Make it real

Make it work on a single language (which is not English)
First, we are looking for a European language (probably Hungarian) to start the project on a smaller scale.

Future

Make it multilingual
Increase the stakes step by step by handling multiple languages, a larger proportion of the internet, and involving foreign communities.

Challenges

We started crawling the whole internet at first. But based on empirical results and the resources we had, we decided to focus on the Hungarian news portals at the end. By that time we had a well trained AI for recognizing topics in English articles, that we ported to Hungarian.