Toxicity and misinformation Detection

InspirationAbout the Project: Emakia – Observability for Digital Integrity

Inspiration The digital space is awash with content — some enlightening, some harmful, and some deeply misleading. Emakia was born from a vision to empower individuals and communities with tools that critically evaluate the truthfulness, bias, and toxicity of online narratives. Inspired by the increasing polarization and manipulation in media ecosystems, I set out to build a modular and ethical AI framework that could identify harmful patterns across platforms like Reddit, MaxNews, and Twitter data stored in BigQuery.

are learned weights from feedback-based tuning This logistic formulation gives us flexibility to tune sensitivity dynamically.

What it does

How I Built It

This project integrates multiple systems and principles: Streamlit Dashboard: A lightweight and interactive interface where users can enter DQL queries and review analysis results. Dynatrace DQL API: Enables real-time observability by querying metrics and logs. Gemini-Powered Agents: Three language agents — ToxicityAnalyst, BiasAnalyst, and MisinformationAnalyst — built using Google’s Gemini LLMs via parallel orchestration. BigQuery Integration: Tweets and public statements are ingested and scored using custom queries. MongoDB Storage: Analysis results are stored for auditing, longitudinal tracking, and potential retraining cycles. Modular Codebase: Split into input scrapers (Reddit, MaxNews, BigQuery), analysis agents, and storage utilities — ensuring scalability and flexibility.

What I Learned

How to orchestrate multiple LLM agents in parallel, ensuring context preservation and consistent evaluations. The nuances of building privacy-aware observability systems using Dynatrace and integrating real-time pipelines. Managing credentials securely through secrets.toml, and maintaining reproducibility without exposing sensitive tokens. How bias can manifest subtly in phrasing and narrative structure — requiring sophisticated prompt tuning and interpretability.

Challenges I Faced

Token Scope & API Compatibility: Dynatrace's DQL queries required specific token scopes — simple connectivity worked, but broke under full integration until scopes were correctly matched. Agent Overlap & Contradictions: Bias and misinformation often intertwine. Agent outputs sometimes contradicted each other, requiring additional logic to resolve and interpret results. BigQuery Credentials: Parsing multiline private keys from secrets involved edge-case handling to prevent decode failures. MongoDB Insertion Errors: Handling nested structures and missing content required robust sanitation before writes. Scaling: Real-time sentiment and bias detection demands performance optimizations and thoughtful caching when scaling beyond test volumes.

What's next for Toxicity and misinformation Detection

Built With

bigquery-loader-(get-tweets-from-bigquery)
custom-scraper-modules-(reddit-scraper
db-dtypes
db-utils
dynatrace
dynatrace-dql-client
google-cloud-bigquery
google-cloud-platform-(bigquery
google-genai-(gemini)
google.adk-(parallelagent
google.oauth2
maxnews-rss-or-rest
maxnews-scraper)
minimax
mongodb
python
python-dotenv
reddit
runner
sessionservice)
sql
store-result
streamlit
vertex-ai)

Updates

Corinne David started this project — Jul 25, 2025 07:43 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.