Inspiration

The 2024 election cycle produced a tsunami of political misinformation, coordinated harassment, and manipulated media across Reddit, Twitter/X, and beyond. Existing moderation tools are text-only, reactive, and blind to the images and videos where the most dangerous content hides. We wanted to build something that could analyze content the way it actually spreads — across text, images, and video simultaneously — rather than treating each modality in isolation.

What it does

AIScan is a multimodal content moderation agent that analyzes political social media posts across three dimensions simultaneously:

  • Toxicity — hate speech, threats, and personal attacks
  • Political Bias — left / center / right lean detection
  • Misinformation — factual inaccuracies and coordinated narratives

The core is a Google ADK ParallelAgent that spawns three specialized LlmAgent sub-agents simultaneously — one per dimension — and aggregates their verdicts into a consensus ensemble result. Running in parallel rather than sequentially cuts latency by ~3× and makes the system more robust than a single monolithic prompt.

For media, Gemini Vision analyzes images and videos embedded in tweets (resolving t.co links via yt-dlp) to detect visual misinformation, manipulated media, and violent imagery. The agent handles all four modalities: text, images, videos, and live social feeds.

The Streamlit frontend offers five analysis modes: Reddit live feed, paste text, upload image, upload video, and batch BigQuery tweets with accuracy comparison against original labels.

How we built it

  1. Data Pipeline — Loaded the tweets_hate_speech_detection dataset into BigQuery as the politics2024 project (78k+ rows).

  2. Fine-Tuning — Ran a supervised Vertex AI tuning job on gemini-2.5-flash using the labeled tweet data, specializing it in political harassment classification.

  3. ADK Agent Layer — Built a ParallelAgent orchestrating three LlmAgent workers. Each runs a focused prompt chain (toxicity / bias / misinfo) against the fine-tuned model endpoint via the Google GenAI SDK.

  4. Vision Layergemini_vision_no_key.py uses Application Default Credentials on Cloud Run; gemini_vision_with_key.py uses st.secrets locally. Both call Gemini's multimodal API with base64-encoded images or uploaded video files.

  5. Media Resolution — A multi-step resolver follows t.co redirects → tries direct image/video download → falls back to yt-dlp → falls back to OG image scraping. This handles the majority of real-world tweet media.

  6. Frontend — Streamlit app with the five analysis modes described above.

  7. Deployment — Containerized and deployed to Cloud Run via gcloud run deploy. ADC handles BigQuery auth automatically; Secret Manager injects the API key at runtime.

Challenges we ran into

  • t.co media resolution required a 4-step fallback chain (redirect → direct file → yt-dlp → OG image) to handle the variety of media hosting strategies Twitter uses.

  • ADK session state across Streamlit reruns required careful InMemorySessionService and uuid-based session management to avoid state bleed between analyses.

  • Vertex AI fine-tune latency — the tuning job took significant time; we optimized by freezing early layers and only fine-tuning on classification head layers.

  • Cloud Run cold starts — mitigated by setting --min-instances 1 in production and keeping the container image lean.

Accomplishments that we're proud of

  • Built a fully functional parallel multi-agent system using Google ADK that analyzes content across three independent dimensions simultaneously, achieving ~3× latency reduction over sequential approaches.

  • Created an end-to-end multimodal pipeline — from raw tweet URLs with t.co links all the way through image/video analysis via Gemini Vision — that catches harmful visual content text-only tools completely miss.

  • Fine-tuned gemini-2.5-flash on 78k+ labeled tweets via Vertex AI, significantly improving domain-specific classification accuracy over general-purpose prompting.

  • Deployed a production-ready live endpoint on Cloud Run with zero hardcoded secrets, proper ADC integration, and auto-scaling — not just a local demo.

  • Built real-time accuracy benchmarking against BigQuery ground-truth labels, so the agent can measure its own performance at scale.

What we learned

  • ADK ParallelAgent dramatically reduces latency vs. sequential LLM calls — ideal for multi-dimensional classification tasks where each dimension is independent.

  • Gemini Vision + yt-dlp is a powerful combo for real-world tweet media — t.co links often hide the most harmful visual content that text analysis completely misses.

  • Application Default Credentials make Cloud Run ↔ BigQuery integration seamless with zero key management overhead.

  • Fine-tuning on domain-specific labeled data significantly improves classification accuracy on political content compared to a general-purpose prompt.

  • Using BigQuery labeled data as ground truth enables real-time accuracy benchmarking — the agent can compare its predictions against existing human labels at scale.

What's next for AIScan – Multimodal Toxicity & Misinformation Agent

  • Real-time streaming — Integrate with Twitter/X and Reddit streaming APIs for continuous, always-on monitoring rather than on-demand analysis.

  • Multilingual support — Expand beyond English to detect toxicity and misinformation in Spanish, Hindi, Arabic, and other high-volume political discourse languages.

  • Explainability layer — Add citation-backed reasoning so moderators can see why content was flagged, not just the verdict, building trust in AI-assisted moderation.

  • Human-in-the-loop dashboard — Build a moderation queue where flagged content surfaces for human review, with the agent's confidence scores prioritizing the most ambiguous cases.

  • Broader fine-tuning — Expand the training dataset beyond the 2024 election cycle to cover ongoing political events, improving generalization across news cycles.

  • Audio modality — Extend Gemini's multimodal capabilities to analyze podcasts, voice tweets, and video narration for spoken misinformation and incitement.

Built With

  • application-default-credentials
  • beautifulsoup4
  • bigquery
  • db-dtypes
  • gemini-2.5-flash
  • gemini-api
  • gemini-vision
  • google-adk
  • google-cloud-run
  • google-genai-sdk
  • pandas
  • praw
  • python
  • python-dotenv
  • requests
  • secret-manager
  • streamlit
  • vertex-ai
  • yt-dlp
Share this project:

Updates