Detoxitweet

Example of toxic tweet (shown under the red alert)
Clickbait example
Example of some tweets that are fake news
System Architecture Diagram

Inspiration

Twitter can be a harsh place :C

The reality is that Twitter is a place where millions of people share their voices and use it as their main news feed. The issue with this is that it then becomes very likely for Twitter to be a vector for spreading misinformation and hate. For anyone, this simply means dealing with a lot of issues and being subjected to misinformation and hate that can severely affect your day to day life.

What it does

Detoxitweet leverages the power of AI to analyse all tweets (on all twitter pages that a user may navigate to) to check if they contain fake information, are clickbaits or are toxic. It is constantly checking the page for such tweets, notifying the user of the safety of a tweet. In case this reaches an unreasonable big spike, the tweet is simply hidden. The user can however choose to see the tweet if they want. In this way, we protect the user while also offering them the flexibility to select what they want to see.

How we built it

Detoxitweet uses several AI models on the backend to detect whether a tweet is a toxic or fake news. Detoxitweet is a chrome extension that leverages various browser APIs in order to scrap for the content of tweets and analyse them on our backend. Since this uses AI, we had to ensure that the technology that we use is precise enough and may not misinform the user, which we ensured by using already trained models and resources (GitHub is full of open-source AI).

So we have the following:

the extension was built using javascript, which we use to monitor the twitter feed and inject information inside the twitter page
we have two backend servers, which we use as follows:
- for the "let's categorize comments by toxicity" part, we used IBM's open source MAX-Toxic-Comment-Classifier in a docker container, self hosted on one of our machines (behind a caddy reverse proxy, tunneled with ngrok).
- for the fake news part, we are essentially hosting a Flask Python server on Heroku which uses a pickled random forests model, which was inspired from a variety of fake news detection githubs. We also later realised that the fake news detectors are not very precise, so we created our own SVM model by using the twitter data, which was similar to most clickbait detectors used for youtube. Both use word2vec techniques to tokenize the content of the tweet into data that can be processed by the models

Challenges we ran into

There were many challenges that we had to overcome:

hosting a very big AI on a machine
making sure that the extension script fires and does not read random text (Processing the twitter frontend)
dealing with the fact that Twitter doesn't refresh the change but rather uses history, which deals with a lot of stuff (twitter being a SPA and all that stuff)
dealing with cors and header problems, for which we had to put our toxicity backend behind a caddy reverse proxy and tunnel it with ngrok

Accomplishments that we're proud of

We are proud to have made an actually functional application that helps filter out fake news and hate on a platform that is widely used.

What we learned

We learned how to dynamically alter the source code of a webpage through a Chrome extension. We also learned that AI models can be big enough that you require special hosting.

What's next for Detoxitweet

There is a lot of customization potential and whitelisting that we can do, which is a good idea for the next steps. We could also improve the existing AI by introducing our own models which can work better in production. Possibly we could also add a dataset that would constantly grow and improve the models.