Real-time Demo: The IGL-AI Overlay (Red Text) delivering live tactical advice powered by Gemini 3 Flash.
The Brain: Custom Java backend integrating the gemini-3-flash-preview API with our specialized 'IGL Coach' system prompt.

🎮 Inspiration

Mechanics will only get you so far.

In tactical shooters like Valorant, tracking the economy, enemy utility, map rotations, and win conditions simultaneously creates serious cognitive overload — especially for neurodivergent players who may struggle with executive function under high-pressure situations.

As a first-year Computer Engineering student and active player (Iso/Breach main), I noticed a massive gap in the ecosystem. We have aim trainers for physical mechanics, but nothing that meaningfully trains or supports real-time strategic thinking inside a live match.

Google’s Project Astra demonstrated the future of AI: multimodal agents that “see what you see” and reason in real time. IGL-AI applies that exact paradigm to the millisecond-dependent world of competitive esports.

While Project Astra helps you find your glasses, IGL-AI helps you win rounds.

I set out to build a Radiant-level tactical assistant capable of offloading the mental stack so players can focus purely on execution. That’s how IGL-AI (In-Game Leader AI) was born.

💡 What It Does

IGL-AI is a real-time multimodal tactical engine designed to function as a cognitive prosthetic.

Active Monitoring: A custom Java engine captures gameplay at 60 FPS using external pixel-based screen capture.

Real-Time Multimodal Reasoning: Visual frames are streamed to Google’s Gemini 3 Flash model, leveraging sub-second multimodal vision capabilities to interpret complex game states in real time.

Actionable Strategy: The system detects live contextual patterns (e.g., Spike planted, Resurrection used, player count changes) and filters them through a structured logic layer to generate optimal macro-level suggestions.

Audio Coaching: Instructions are delivered via a lightweight Text-to-Speech engine, allowing players to receive tactical input without diverting visual attention.

⚙️ Architecture & Tech Stack

The system is architected with a strict focus on low latency, architectural separation, and non-invasive design.

Interface Layer: Built using JavaFX with java.awt.Robot for external screen capture. The system operates strictly on pixel data — it does not read game memory, inject code, or interact with internal APIs. It is intentionally designed to remain external and non-invasive.

Logic Gate Backend: To mitigate LLM hallucinations, I engineered a structured verification layer that determines the current game phase (Buy Phase, Mid-Round, Post-Plant, etc.) before transmitting prompts to Gemini. This prevents impossible or contextually invalid advice.

Sensor Layer: A dedicated validation module ensures agent-specific alignment (e.g., preventing advice inconsistent with the active character).

AI Engine: Integrated Gemini 3 Flash for its low-latency multimodal reasoning capabilities, enabling near real-time state interpretation.

Output Layer: A localized Text-to-Speech system converts strategic outputs into concise, coach-like audio cues.

🚧 Overcoming Challenges

Latency vs. Accuracy: A brilliant coaching tip is useless if it arrives five seconds late. By optimizing frame compression and using Gemini Flash, I reduced average round-trip inference latency to under two seconds.

Contextual Hallucinations: Early iterations occasionally produced contradictory advice. This was resolved through the Logic Gate + Sensor Layer architecture, which hard-validates game phase and agent context before allowing output generation.

🏆 Proudest Accomplishments

Architected and deployed a fully functional, low-latency multimodal AI system from scratch.
Successfully grounded abstract visual game states using AI vision without relying on internal APIs or source code.
Designed an accessibility-first tool focused on reducing cognitive overload — an overlooked dimension in esports technology.
Built structured guardrails to meaningfully reduce LLM hallucination risk in real-time environments.

🚀 Roadmap

Post-Game Analysis: Allow users to upload full match VODs for round-by-round breakdown and performance grading.

Economy Modeling: Implement automated enemy credit tracking to teach buy-round prediction and economy management.

Strategic Pattern Learning: Future iterations will incorporate match history to refine contextual suggestions over time.

Technical Compliance Note

IGL-AI operates entirely through external pixel-based screen capture (similar to human observation). It does not read game memory, inject code, or interact with internal APIs. The system is architected to remain external and non-invasive and serves as a proof-of-concept exploring real-time multimodal grounding using Gemini 3.

Built With

computervision
gemini
github
google-cloud
java
multimodal
promptengineering

Updates

Mustafa Aljanabi posted an update — Feb 27, 2026 05:18 PM EST

Building IGL-AI for the Google Gemini 3 Hackathon has been an incredible sprint. As a solo developer, tackling real-time screen capture and multimodal AI analysis was a massive challenge, but getting the round-trip API latency down to under 2 seconds was a huge win for the project's viability.

A massive thank you to everyone checking out the project. While waiting for the official hackathon results, I'm already looking ahead at the roadmap.

Next up on the development block:

VOD Analysis: Building a feature to upload full match videos for automated, round-by-round grading.

Economy Tracking: Training the engine to actively calculate enemy credit usage.

Feel free to leave feedback in the comments or check out the GitHub repo!

Log in or sign up for Devpost to join the conversation.

Mustafa Aljanabi started this project — Feb 09, 2026 12:46 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.