AIScan – Multimodal Toxicity & Misinformation Agent

Emakia hackathon architecture
Google Infrastructure

Inspiration

The 2024 election cycle produced a tsunami of political misinformation, coordinated harassment, and manipulated media across Reddit, Twitter/X, and beyond. Existing moderation tools are text-only, reactive, and blind to the images and videos where the most dangerous content hides. We wanted to build something that could analyze content the way it actually spreads — across text, images, and video simultaneously — rather than treating each modality in isolation.

What it does

AIScan is a multimodal content moderation agent that analyzes political social media posts across three dimensions simultaneously:

Toxicity — hate speech, threats, and personal attacks
Political Bias — left / center / right lean detection
Misinformation — factual inaccuracies and coordinated narratives

The core is a Google ADK ParallelAgent that spawns three specialized LlmAgent sub-agents simultaneously — one per dimension — and aggregates their verdicts into a consensus ensemble result. Running in parallel rather than sequentially cuts latency by ~3× and makes the system more robust than a single monolithic prompt.

For media, Gemini Vision analyzes images and videos embedded in tweets (resolving t.co links via yt-dlp) to detect visual misinformation, manipulated media, and violent imagery. The agent handles all four modalities: text, images, videos, and live social feeds.

The Streamlit frontend offers five analysis modes: Reddit live feed, paste text, upload image, upload video, and batch BigQuery tweets with accuracy comparison against original labels.

How we built it

Data Pipeline — Loaded the tweets_hate_speech_detection dataset into BigQuery as the politics2024 project (78k+ rows).
Fine-Tuning — Ran a supervised Vertex AI tuning job on gemini-2.5-flash using the labeled tweet data, specializing it in political harassment classification.
ADK Agent Layer — Built a ParallelAgent orchestrating three LlmAgent workers. Each runs a focused prompt chain (toxicity / bias / misinfo) against the fine-tuned model endpoint via the Google GenAI SDK.
Vision Layer — gemini_vision_no_key.py uses Application Default Credentials on Cloud Run; gemini_vision_with_key.py uses st.secrets locally. Both call Gemini's multimodal API with base64-encoded images or uploaded video files.
Media Resolution — A multi-step resolver follows t.co redirects → tries direct image/video download → falls back to yt-dlp → falls back to OG image scraping. This handles the majority of real-world tweet media.
Frontend — Streamlit app with the five analysis modes described above.
Deployment — Containerized and deployed to Cloud Run via gcloud run deploy. ADC handles BigQuery auth automatically; Secret Manager injects the API key at runtime.

Challenges we ran into

t.co media resolution required a 4-step fallback chain (redirect → direct file → yt-dlp → OG image) to handle the variety of media hosting strategies Twitter uses.
ADK session state across Streamlit reruns required careful InMemorySessionService and uuid-based session management to avoid state bleed between analyses.
Vertex AI fine-tune latency — the tuning job took significant time; we optimized by freezing early layers and only fine-tuning on classification head layers.
Cloud Run cold starts — mitigated by setting --min-instances 1 in production and keeping the container image lean.

Accomplishments that we're proud of

Built a fully functional parallel multi-agent system using Google ADK that analyzes content across three independent dimensions simultaneously, achieving ~3× latency reduction over sequential approaches.
Created an end-to-end multimodal pipeline — from raw tweet URLs with t.co links all the way through image/video analysis via Gemini Vision — that catches harmful visual content text-only tools completely miss.
Fine-tuned gemini-2.5-flash on 78k+ labeled tweets via Vertex AI, significantly improving domain-specific classification accuracy over general-purpose prompting.
Deployed a production-ready live endpoint on Cloud Run with zero hardcoded secrets, proper ADC integration, and auto-scaling — not just a local demo.
Built real-time accuracy benchmarking against BigQuery ground-truth labels, so the agent can measure its own performance at scale.

What we learned

ADK ParallelAgent dramatically reduces latency vs. sequential LLM calls — ideal for multi-dimensional classification tasks where each dimension is independent.
Gemini Vision + yt-dlp is a powerful combo for real-world tweet media — t.co links often hide the most harmful visual content that text analysis completely misses.
Application Default Credentials make Cloud Run ↔ BigQuery integration seamless with zero key management overhead.
Fine-tuning on domain-specific labeled data significantly improves classification accuracy on political content compared to a general-purpose prompt.
Using BigQuery labeled data as ground truth enables real-time accuracy benchmarking — the agent can compare its predictions against existing human labels at scale.

What's next for AIScan – Multimodal Toxicity & Misinformation Agent

Real-time streaming — Integrate with Twitter/X and Reddit streaming APIs for continuous, always-on monitoring rather than on-demand analysis.
Multilingual support — Expand beyond English to detect toxicity and misinformation in Spanish, Hindi, Arabic, and other high-volume political discourse languages.
Explainability layer — Add citation-backed reasoning so moderators can see why content was flagged, not just the verdict, building trust in AI-assisted moderation.
Human-in-the-loop dashboard — Build a moderation queue where flagged content surfaces for human review, with the agent's confidence scores prioritizing the most ambiguous cases.
Broader fine-tuning — Expand the training dataset beyond the 2024 election cycle to cover ongoing political events, improving generalization across news cycles.
Audio modality — Extend Gemini's multimodal capabilities to analyze podcasts, voice tweets, and video narration for spoken misinformation and incitement.

Built With

application-default-credentials
beautifulsoup4
bigquery
db-dtypes
gemini-2.5-flash
gemini-api
gemini-vision
google-adk
google-cloud-run
google-genai-sdk
pandas
praw
python
python-dotenv
requests
secret-manager
streamlit
vertex-ai
yt-dlp

Updates

CorinneEmakia David started this project — Mar 14, 2026 09:13 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.