🎮 Inspiration
Mechanics will only get you so far.
In tactical shooters like Valorant, tracking the economy, enemy utility, map rotations, and win conditions simultaneously creates serious cognitive overload — especially for neurodivergent players who may struggle with executive function under high-pressure situations.
As a first-year Computer Engineering student and active player (Iso/Breach main), I noticed a massive gap in the ecosystem. We have aim trainers for physical mechanics, but nothing that meaningfully trains or supports real-time strategic thinking inside a live match.
Google’s Project Astra demonstrated the future of AI: multimodal agents that “see what you see” and reason in real time. IGL-AI applies that exact paradigm to the millisecond-dependent world of competitive esports.
While Project Astra helps you find your glasses, IGL-AI helps you win rounds.
I set out to build a Radiant-level tactical assistant capable of offloading the mental stack so players can focus purely on execution. That’s how IGL-AI (In-Game Leader AI) was born.
💡 What It Does
IGL-AI is a real-time multimodal tactical engine designed to function as a cognitive prosthetic.
Active Monitoring: A custom Java engine captures gameplay at 60 FPS using external pixel-based screen capture.
Real-Time Multimodal Reasoning: Visual frames are streamed to Google’s Gemini 3 Flash model, leveraging sub-second multimodal vision capabilities to interpret complex game states in real time.
Actionable Strategy: The system detects live contextual patterns (e.g., Spike planted, Resurrection used, player count changes) and filters them through a structured logic layer to generate optimal macro-level suggestions.
Audio Coaching: Instructions are delivered via a lightweight Text-to-Speech engine, allowing players to receive tactical input without diverting visual attention.
⚙️ Architecture & Tech Stack
The system is architected with a strict focus on low latency, architectural separation, and non-invasive design.
Interface Layer:
Built using JavaFX with java.awt.Robot for external screen capture. The system operates strictly on pixel data — it does not read game memory, inject code, or interact with internal APIs. It is intentionally designed to remain external and non-invasive.
Logic Gate Backend: To mitigate LLM hallucinations, I engineered a structured verification layer that determines the current game phase (Buy Phase, Mid-Round, Post-Plant, etc.) before transmitting prompts to Gemini. This prevents impossible or contextually invalid advice.
Sensor Layer: A dedicated validation module ensures agent-specific alignment (e.g., preventing advice inconsistent with the active character).
AI Engine: Integrated Gemini 3 Flash for its low-latency multimodal reasoning capabilities, enabling near real-time state interpretation.
Output Layer: A localized Text-to-Speech system converts strategic outputs into concise, coach-like audio cues.
🚧 Overcoming Challenges
Latency vs. Accuracy: A brilliant coaching tip is useless if it arrives five seconds late. By optimizing frame compression and using Gemini Flash, I reduced average round-trip inference latency to under two seconds.
Contextual Hallucinations: Early iterations occasionally produced contradictory advice. This was resolved through the Logic Gate + Sensor Layer architecture, which hard-validates game phase and agent context before allowing output generation.
🏆 Proudest Accomplishments
- Architected and deployed a fully functional, low-latency multimodal AI system from scratch.
- Successfully grounded abstract visual game states using AI vision without relying on internal APIs or source code.
- Designed an accessibility-first tool focused on reducing cognitive overload — an overlooked dimension in esports technology.
- Built structured guardrails to meaningfully reduce LLM hallucination risk in real-time environments.
🚀 Roadmap
Post-Game Analysis: Allow users to upload full match VODs for round-by-round breakdown and performance grading.
Economy Modeling: Implement automated enemy credit tracking to teach buy-round prediction and economy management.
Strategic Pattern Learning: Future iterations will incorporate match history to refine contextual suggestions over time.
Technical Compliance Note
IGL-AI operates entirely through external pixel-based screen capture (similar to human observation). It does not read game memory, inject code, or interact with internal APIs. The system is architected to remain external and non-invasive and serves as a proof-of-concept exploring real-time multimodal grounding using Gemini 3.
Built With
- computervision
- gemini
- github
- google-cloud
- java
- multimodal
- promptengineering
Log in or sign up for Devpost to join the conversation.