Forging

Forging is built as a multi-agent system on top of Gemini 3's Interactions API — not a prompt wrapper.

  1. 2-Agent Pipeline with Interaction Chaining: An Observer agent analyzes gameplay video with thinking_level="high", generating 10-20 timestamped tips. Its interaction_id chains to a Validator agent that cross-checks each tip against the video, assigning confidence scores 1-10. Only tips scoring 8+ survive. The video is uploaded once via the File API and persists across the entire chain — no re-upload needed.
  2. Multimodal Video Understanding: Gemini 3 Pro watches gameplay frame-by-frame alongside parsed replay data. For CS2, it reads HUD elements, crosshair placement, and positioning. For AoE2, it tracks resources, unit compositions, and build orders. No game API required.
  3. Structured Output with response_schema: Native JSON schema enforcement ensures deterministic output format at every pipeline step — timestamps, categories, severity, reasoning — directly renderable in the UI.
  4. Extended Thinking: Both agents use high thinking levels for deep reasoning. The Observer reasons about gameplay patterns; the Validator reasons about whether each observation is actually visible in the video or a hallucination.
  5. Follow-up Chat: Chat chains from the Validator's interaction_id, inheriting full pipeline context (video + analysis) without re-sending anything.

Inspiration

I'm an active competitive player in both Age of Empires II and Counter-Strike 2. Like most players trying to climb ranks, I've spent countless hours watching my replays trying to figure out what I did wrong. The problem? I'm not good enough to spot my own mistakes. And hiring a coach at $20-50/hour isn't realistic for regular sessions.

When I saw what Gemini could do with long-form video understanding, the idea clicked: what if AI could watch my gameplay like a human coach would - understanding visual context, identifying patterns, and giving me actionable advice with exact timestamps?

What it does

FORGING lets players upload their match replays or gameplay videos and receive AI-powered coaching. The system:

  • Analyzes full matches (up to 30 minutes, 700MB videos) without chunking
  • Generates timestamped coaching tips - click any tip to jump to that exact moment
  • Enables contextual chat - ask follow-up questions with full match context ("Why did I lose that fight?")
  • Works across game genres - currently supports CS2 (FPS) and Age of Empires II (RTS)

How we built it

The Stack

  1. Frontend: Next.js, React, TypeScript, Tailwind CSS.
  2. Backend: Python FastAPI.
  3. AI: Gemini 3 Pro via the Gemini API.
  4. Infrastructure: Google Cloud Run, Cloud Storage, Google Firestore

Gemini Features Used

  1. File API: Upload 700MB, 30-minute match videos.
  2. Multimodal: Analyze video + replay data + chat together.
  3. Thinking Mode: Deep reasoning for both agents
  4. Interactions API: Chain Observer → Validator with shared context
  5. TTS: Coaching tips with Voice over.
  6. Structured Output: Reliable JSON for UI rendering

Challenges we ran into

Hallucinations in Timestamps

Early versions would generate tips with timestamps where nothing relevant happened. The 2-agent architecture with explicit verification solved this - the Validator cross-checks every timestamp against the actual video. That said, once in a while you still get hallucinations that needs to be fixed for example when a grenade thrown by a nearby teammate and the system thinks it's you actually.

Prompt engineering for game-specific analysis

Getting the AI to understand game-specific concepts (CS2 economy, AoE2 build orders) without being too verbose or missing key moments.

Game-Specific vs Generic

Balancing game-specific knowledge (CS2 economy, AoE2 build orders) with a generic architecture was tricky. Solved with modular parsers and knowledge bases that plug into the same pipeline.

Rate Limits During Development

Heavy video analysis + thinking mode burns through quotas fast. Implemented API key rotation and caching of Gemini file uploads for iterative testing.

Accomplishments that we're proud of

Multi-Agent Verification Pipeline

The Observer → Validator architecture with confidence scoring dramatically reduced hallucinations. Starting from a single prompt using Gemini 2.0 Flash --> Gemini 2.5 Flash --> Gemini 2.5 Pro --> Gemini 3.0 Flash --> Gemini 3.0 Pro, to using a 4 Agent Pipeline and an Orchestrator, and few more variants, took a long time and I'm proud to have learned everything by doing. Learning when to verify and with what: the parser for Age of Empire II DE and Counter Strike 2 is key! Tips that don't match video evidence get filtered out before reaching the user. This went from "AI sometimes makes things up" to "every tip has been cross-checked".

Game-Agnostic Architecture

The same pipeline analyzes both Counter-Strike 2 (FPS) and Age of Empires II (RTS) - two completely different game genres with different visual languages, strategies, and skill sets. Adding a new game requires only a parser and prompts, not new infrastructure.

Voice Coaching

Tips are read aloud using natural TTS, turning the analysis into a spoken coaching session you can listen to while rewatching your gameplay.

End-to-End Deployment

Fully deployed on Google Cloud (Cloud Run, Cloud Storage, Firestore) with a live demo anyone can use. Not just a prototype - a working product.

What we learned

  • Multi-agent systems need explicit verification - A second agent checking the first agent's work dramatically reduces hallucinations
  • Long context changes everything - Not having to chunk a 30-minute match preserves crucial temporal relationships
  • Structured output is underrated - Guaranteeing valid JSON from every response simplified the entire frontend integration
  • Game-agnostic is possible - The same architecture works for an FPS and an RTS with only prompt changes

What's next for Forging

  • Put it in hands of more users ASAP.
  • Skill progression tracking - Compare your metrics across multiple games.
  • Team communication analysis - Analyze voice comms for team coordination.
  • Input analysis - Keyboard/mouse patterns and shortcuts optimization.
  • More games: Valorant, League of Legends, Dota 2, Rocket League.

Built With

Share this project:

Updates