Inspiration

We came across and learned about the concept and harmful impacts of disinformation in Civics class recently. It really stuck with me how much disinformation was on the internet, and how much it impacted our elections, healthcare during the COVID crisis, and how widespread it truely was. Not only that, recently, we've been noticing much more disinformation on social media. I found myself being too lazy to fact-check, and sometimes falling for the disinformation spread across the internet.

What it does it do

Returns a pie chart/diagram with percentages on how many sources are true/false, and returns whichever has the most sources. i.e. misinformation 20%, truth 45%, disinformation 35%.

How we built it

Using JavaScript, HTML for website, python for AI

  1. Data Collection
  2. Use news credibility datasets (e.g., FakeNewsNet, LIAR dataset, or Politifact)
  3. Scrape news headlines and articles from trusted and questionable sources
  4. Include metadata like source reputation, language style, and fact-checker labels
  5. Other things yall wanna add as data collection

  6. Data Preprocessing

  7. Text Cleaning: Remove stopwords, punctuation, and unnecessary HTML tags (cuz we gotta preprocess it, this makes it easier if we wanna vectorize it or msth)

  8. Feature Engineering:

  9. TF-IDF or Word2Vec for text representation (tf-idf might work better)

  10. Metadata features: Source reliability score, article length, presence of clickbait words,

  11. Maybe even like the .gov, .edu, etc.

  12. Sentiment analysis: Fake news uses emotional damage (aka sentimental, strong emotional language) a lot of the times lmao

  13. Model Selection & Training

  14. Use LSTM (Long Short-Term Memory) networks or BERT-based transformers for text analysis (bert better im using that as a sub base transformer)

  15. Train the model on labeled data with a trustworthiness score as the output (we might have to manually input it idk how good our data is gonna be)

  16. Fine-tune with Transfer Learning using pre-trained NLP models like RoBERTa or maybe sbert

Challenges we ran into

Time. We all had lots of classes on Saturday, and other extracurriculars to attend. Finding the time to complete this huge project was very difficult.

Training the AI, and connecting it to the website.

  • We had many technical difficulties, as well as training the AI.

Accomplishments that we're proud of

Successfully creating and training an AI from scratch Creating a website for the AI

What we learned

How to create and train an AI, as well as teamwork and collaboration on a huge group coding project. Learned how fun creating these projects with teams are.

What's next for VerifAI

  • Add source verification feature
  • Where the user inputs a source and the AI verifies if it is trustworthy or not
  • Cross references data with multiple articles from the source
  • Trustworthiness Score: AI will provide trustworthiness score based on source.
  • The system will use dynamic data sources, so it can continuously adapt to new insights.

Built With

Share this project:

Updates