Inspiration
I try to be mostly up to date with what is going on in the world. As such I read a lot of news articles from various sources.
When consuming this type of content there always has been one thing that I particularly dislike - unnecessary opinions of the author intertwined with factual information. While this may be allowable in an opinion essay or personal blog, this is not what I want to see in a news article, which - in principle - ought to be as unbiased and objective as possible.
This inspired me to develop a countermeasure on my own - a browser extension that allows to highlight and rewrite opinionated content from a previewed article.
What it does
NewsFuse is a chrome extension that highlights and translates opinionated text on the site. The extension allows to turn off translation in case the reader is interested solely in the evaluation of each sentence of the article. Highlighting can be disabled to provide a seamless integration of translated sentences into the original text. NewsFuse allows to process the entire article using the Parse Paragraphs button, and individual elements using Grabber mode, where the textual content of only the clicked HTML element will be processed.
How I built it
The solution consists of two components: a chrome extension and a backend server.
The extension part provides the user with an easy-to-use interface, extracts the textual content of the site, which in turn is send to a self-hosted FastAPI backend, responsible for the following:
- removing quotes from the processed text,
- splitting the text into sentences,
- evaluating how opinionated each sentence is,
- sending highly opinionated sentences to the Gemini API,
- processing the API response,
- returning sentences, classifications and translations back to the extension.
The browser extension also processes the backend response, wrapping every sentence with a span element, styled to reflect the classification / translation result.
Backend server processes the text using nltk package, whilst the bias classification is delegated to a customized tensorflow model (based on a pretrained BERT transformer) that was trained on the MBIC dataset.
Challenges I ran into
Building this project brought some tricky moments, especially since it's my first time diving into Chrome extension writing. Here's what I found tough:
Figuring Out Chrome Extensions: Starting from scratch with Chrome extension development was a bit of a puzzle. Getting the hang of how everything works and fits together took some time.
Dealing with Sentences and Styles: Processing the response from the backend appropriately so that the right sentences are wrapped in a highlighting span prove to be quite challenging as there are many exceptions and corner cases (different tags embedded within the textual content, additional styling, invisible script tags placed in-between sentences).
Engineering the right prompt: Drafting the right task for the LLM so that it replaces sentences correctly was quite difficult, especially considering some sentences do not have an unopinionated counterpart or necessitate complete reformulation.
Finding Good Training Data: As it turned out, there are not many high quality datasets available that evaluate sentences in terms of the writer bias/opinions. Fortunately the MBIC one was sufficient to train a model that was good enough to be used.
Despite these challenges, I kept at it and managed to get everything working smoothly for the NewsFuse extension.
Accomplishments that I'm proud of
Primarily - that I managed to combine all the pieces into a one working solution.
What I learned
How to write a Chrome extension 😋
What's next for NewsFuse
There is a large room for improvement for NewsFuse - from employing more robust the classification model that can distinguish different kinds of bias, to improving the functionality and UI of the extensions itself. The aforementioned problem with wrapping sentences in some edge cases still exists and would likely have to be dealt with one-by-one. Furthermore, supporting different languages could be useful for browsing local news that are seldom translated to the English language (although using Google Translate for the entire page is a workaround). Finally, deploying the backend API to the cloud could enable the extension to be used without self-hosting the model - greatly increasing the usability of the tool (however this would require some financing).
Built With
- chrome
- fastapi
- gemini
- javascript
- nltk
- python
- tensorflow
- vertexai
Log in or sign up for Devpost to join the conversation.