Inspiration
The 2024 election cycle produced a tsunami of political misinformation, coordinated harassment, and manipulated media across Reddit, Twitter/X, and beyond. Existing moderation tools are text-only, reactive, and blind to the images and videos where the most dangerous content hides. We wanted to build something that could analyze content the way it actually spreads — across text, images, and video simultaneously — rather than treating each modality in isolation.
What it does
AIScan is a multimodal content moderation agent that analyzes political social media posts across three dimensions simultaneously:
- Toxicity — hate speech, threats, and personal attacks
- Political Bias — left / center / right lean detection
- Misinformation — factual inaccuracies and coordinated narratives
The core is a Google ADK ParallelAgent that spawns three specialized LlmAgent sub-agents simultaneously — one per dimension — and aggregates their verdicts into a consensus ensemble result. Running in parallel rather than sequentially cuts latency by ~3× and makes the system more robust than a single monolithic prompt.
For media, Gemini Vision analyzes images and videos embedded in tweets (resolving t.co links via yt-dlp) to detect visual misinformation, manipulated media, and violent imagery. The agent handles all four modalities: text, images, videos, and live social feeds.
The Streamlit frontend offers five analysis modes: Reddit live feed, paste text, upload image, upload video, and batch BigQuery tweets with accuracy comparison against original labels.
How we built it
Data Pipeline — Loaded the
tweets_hate_speech_detectiondataset into BigQuery as thepolitics2024project (78k+ rows).Fine-Tuning — Ran a supervised Vertex AI tuning job on
gemini-2.5-flashusing the labeled tweet data, specializing it in political harassment classification.ADK Agent Layer — Built a
ParallelAgentorchestrating threeLlmAgentworkers. Each runs a focused prompt chain (toxicity / bias / misinfo) against the fine-tuned model endpoint via the Google GenAI SDK.Vision Layer —
gemini_vision_no_key.pyuses Application Default Credentials on Cloud Run;gemini_vision_with_key.pyusesst.secretslocally. Both call Gemini's multimodal API with base64-encoded images or uploaded video files.Media Resolution — A multi-step resolver follows t.co redirects → tries direct image/video download → falls back to yt-dlp → falls back to OG image scraping. This handles the majority of real-world tweet media.
Frontend — Streamlit app with the five analysis modes described above.
Deployment — Containerized and deployed to Cloud Run via
gcloud run deploy. ADC handles BigQuery auth automatically; Secret Manager injects the API key at runtime.
Challenges we ran into
t.co media resolution required a 4-step fallback chain (redirect → direct file → yt-dlp → OG image) to handle the variety of media hosting strategies Twitter uses.
ADK session state across Streamlit reruns required careful
InMemorySessionServiceanduuid-based session management to avoid state bleed between analyses.Vertex AI fine-tune latency — the tuning job took significant time; we optimized by freezing early layers and only fine-tuning on classification head layers.
Cloud Run cold starts — mitigated by setting
--min-instances 1in production and keeping the container image lean.
Accomplishments that we're proud of
Built a fully functional parallel multi-agent system using Google ADK that analyzes content across three independent dimensions simultaneously, achieving ~3× latency reduction over sequential approaches.
Created an end-to-end multimodal pipeline — from raw tweet URLs with t.co links all the way through image/video analysis via Gemini Vision — that catches harmful visual content text-only tools completely miss.
Fine-tuned
gemini-2.5-flashon 78k+ labeled tweets via Vertex AI, significantly improving domain-specific classification accuracy over general-purpose prompting.Deployed a production-ready live endpoint on Cloud Run with zero hardcoded secrets, proper ADC integration, and auto-scaling — not just a local demo.
Built real-time accuracy benchmarking against BigQuery ground-truth labels, so the agent can measure its own performance at scale.
What we learned
ADK
ParallelAgentdramatically reduces latency vs. sequential LLM calls — ideal for multi-dimensional classification tasks where each dimension is independent.Gemini Vision + yt-dlp is a powerful combo for real-world tweet media — t.co links often hide the most harmful visual content that text analysis completely misses.
Application Default Credentials make Cloud Run ↔ BigQuery integration seamless with zero key management overhead.
Fine-tuning on domain-specific labeled data significantly improves classification accuracy on political content compared to a general-purpose prompt.
Using BigQuery labeled data as ground truth enables real-time accuracy benchmarking — the agent can compare its predictions against existing human labels at scale.
What's next for AIScan – Multimodal Toxicity & Misinformation Agent
Real-time streaming — Integrate with Twitter/X and Reddit streaming APIs for continuous, always-on monitoring rather than on-demand analysis.
Multilingual support — Expand beyond English to detect toxicity and misinformation in Spanish, Hindi, Arabic, and other high-volume political discourse languages.
Explainability layer — Add citation-backed reasoning so moderators can see why content was flagged, not just the verdict, building trust in AI-assisted moderation.
Human-in-the-loop dashboard — Build a moderation queue where flagged content surfaces for human review, with the agent's confidence scores prioritizing the most ambiguous cases.
Broader fine-tuning — Expand the training dataset beyond the 2024 election cycle to cover ongoing political events, improving generalization across news cycles.
Audio modality — Extend Gemini's multimodal capabilities to analyze podcasts, voice tweets, and video narration for spoken misinformation and incitement.
Log in or sign up for Devpost to join the conversation.