Welcome to ukr.ai!
Try our project out—it's best used on mobile.
Social media is the king of disinformation. Astroturfing elections through the widespread use of Twitter bots, creating narratives based on doctored images, or governments using widespread propaganda through mass media to seize control of regions—when controversy strikes, the internet is there to make sure the facts and fiction are intertwined.
In recent years, political parties have leveraged this to spur on disinformation campaigns. These massive information wars between nations have made it difficult to find news that is credible and trustworthy. When we found out that Russia had launched their attack on Ukraine, we knew that there would be widespread efforts to misconstrue the truth and further confuse the public.
This inspired us to create an AI that could constantly review this mass influx of data and learn to sort the facts from fiction to help counteract the damage done by disinformation and propaganda campaigns.
What it does
The UI is simple. There’s a text box where a user can enter some news they’ve heard, a link to a tweet, or a “fact” that doesn’t sit right. This message is parsed and analyzed by our fine-tuned GPT-3 AI model. Our application will tell you if the statement is misleading or not.
How we built it
AI and back-end
For our AI model, we leveraged OpenAI's GPT-3. To train our models, we needed a large dataset that could tell the AI if a statement was misleading or not. We stumbled across Twitter's Birdwatch, a periodically-updated dataset containing entries of tweet IDs and their respective trustworthiness. Using Birdwatch, we mapped each tweet ID to its text content through Twitter API lookup, determined if the tweet is relevant to the current conflict, parsed the text, and added the label to our training dataset containing statements and their truth value. After formatting our data for GPT-3, we fed it into our AI, which learned what kind of statement was true and what was false.
With a knowledgeable AI, the front-end could query our model with text or a Twitter link—if it was a link, we fetched the tweet's text from Twitter API—and responded with our AI's classification (i.e. misleading, not misleading, unknown) and levels of confidence.
Once we were finished with all of the code, we deployed it using Google Cloud Functions, allowing the front-end to have access to the event-driven serverless functions.
In the front-end, we developed a mobile-first responsive application that used the latest technologies. We were design-oriented from the start, by focusing on how the user would interact with the application. We featured a simple text box with clearly worded instructions within a modern design.
We also wanted to tackle the issue that our ML model would need time to "think", so we created interesting visuals such as a spinning 3D globe and animations to keep the user's attention as the API request to the server loaded.
We used Vue.js and Vite for a modular, easy-to-build application. We knew that we would have to populate content from the back-end to the UI, which Vue can do instantly. Vite has exceptional speed when building and prototyping applications, making development and deployment easier, as well as making testing faster.
We also leveraged Three.JS, one of the most popular libraries used for 3D graphics on the web. This library allowed for precise control over the globe design, and we loaded our own custom textures and animations to best suit our application design.
Accomplishments that we're proud of
- AI often returns proper response even with how limited our dataset is
- Obtaining Tweet text-label mappings using Birdwatch and Twitter API
- Fetching, formatting, cleaning up, and determining relevance of tweets
- Simple and elegant query and response between front-end and back-end
- Seamlessly implementing fluid graphics into the front-end
- Using Google Cloud Functions having no prior experience with serverless APIs
Challenges we ran into
- Properly training our AI with our own data set
- It was difficult to obtain a meaningful dataset from what we parsed from Twitter's Birdwatch fact-checking data
- Our dataset was very limited given how little time has passed since the initial invasion
- Difficult to filter out irrelevant tweets and false reports from Birdwatch's dataset
- Each time we implemented a new dataset into the AI, the fine-tuning process can be very time consuming (up to 18 minutes)
- Implementing 3D animations and graphics
- Making our application and animations responsive on different screen sizes
What we learned
- Training and fine-tuning a GPT-3 model to be able to classify claims as misleading or non-misleading
- Finding and labeling data points is challenging
- Our first time working with Google Cloud Functions to deploy the back-end, which allows front-end to make event-triggered function calls
- How to use Three.JS to create responsive and interactive 3D animations in the browser
- How to use Pandas to parse imbalanced .tsv/.csv data into .jsonl files that GPT-3 can use to optimize its model
- Configuring and setting up Python environments
What's next for ukr.ai
Our AI is in the early stages of the learning process. As more data points are released into Birdwatch's fact-checking dataset and more users submit feedback on our AI's performance, we will continuously retrain and improve our model which will substantially boost its accuracy.
- Python - Fast development with ML libraries
Libraries and APIs
- Twitter API
- Hugging Face's Transformers
- Google Cloud
- Pranav Grover
- Patrick Hu
- Tan Huynh
- Alex Rabinovich
- Stella Zhou
Log in or sign up for Devpost to join the conversation.