Cognitive Oracle: Escape Room

Inspiration

The inspiration for Cognitive Oracle came from a simple question: What if the room you’re sitting in right now held a secret history? We wanted to bridge the gap between static photography and interactive storytelling. By leveraging the advanced spatial reasoning of Gemini 3 Pro, we realized we could transform any mundane environment—a kitchen, an office, or a laboratory—into a high-stakes, logically consistent escape room where the puzzles are as real as the objects in the photo.

What it does

Cognitive Oracle is a procedurally generated mystery engine that turns your surroundings into a game world.

  • Neural Scanning: The system performs a "Vision Pass" to identify every object, container, and architectural detail in an uploaded or AI-synthesized photo.
  • Cognitive Orchestration: Gemini 3 Pro acts as the "World Orchestrator," weaving a unique narrative and a series of complex puzzles based specifically on the detected objects.
  • Fact-Grounded Puzzles: Unlike traditional AI games that hallucinate solutions, our Oracle uses Google Search Grounding to find obscure, real-world technical or historical facts about the items in your room to build the solution logic.
  • Multimodal Immersion: The experience is fully narrated with high-quality AI speech and features a futuristic "Cyber-HUD" interface for deep environmental investigation.

How we built it

We built a sophisticated multi-model pipeline using the Google GenAI SDK:

  • Reasoning (Gemini 3 Pro): Handles the "heavy lifting" of logic, state management, and complex puzzle synthesis.
  • Performance Layer (Gemini 3 Flash): Powers rapid turn-based interactions and a Speculative Execution Cache that predicts and pre-computes the player's next likely moves.
  • Vision & Synthesis (Gemini 3 Flash Vision & 2.5 Flash Image): Extracts spatial telemetry from images and generates cinematic scene visuals for players without a camera.
  • Atmosphere (Gemini 2.5 Flash TTS): Transforms AI-generated descriptions into brisk, mysterious audio narration using the native "Charon" voice.
  • Frontend: A high-performance React 19 + TypeScript application styled with Tailwind CSS.
  • State Management: A custom IndexedDB storage layer to handle large Base64 "Reality Buffers" and game states without UI lag.

Challenges we ran into

One of our primary hurdles was managing the "Neural Sync" between high-level vision and logic. Performing deep vision analysis and complex logic synthesis simultaneously can occasionally hit rate limits or transient network errors. To solve this, we implemented a Robust Retry Wrapper with exponential backoff and a Sequential Task Queue that ensures the "Neural Sync" completes reliably before the game begins, providing a stable experience for the user.

Accomplishments that we're proud of

  • Speculative Navigation: We developed a "speculative hit" system that generates responses to suggested actions before the user even clicks them, creating a "zero-latency" feel in what is usually a high-latency LLM interaction.
  • Grounding Integrity: Successfully forcing the AI to use real-world citations (via Google Search) to solve puzzles, preventing the "hallucination" problem and making puzzles feel tangible.
  • Spatial UI: Creating an interactive "Reality Feed" with zoom, pan, and scan effects that makes the user feel like an advanced investigator.

What we learned

We learned the immense value of Model Orchestration. No single model is perfect for every task; by splitting the work between Gemini 3 Pro (for complex reasoning) and Gemini 3 Flash (for speed and vision extraction), we achieved a balance of intelligence and responsiveness. We also mastered the browser-side handling of raw PCM audio data, which was crucial for integrating Gemini’s native TTS capabilities.

What's next for Cognitive Oracle: Escape Room

  • Veo Integration: Transforming static photos into 60fps cinematic fly-throughs of the escape room to increase immersion.
  • Live API Co-op: Enabling multi-player sessions where a team can speak directly to the Oracle in real-time using native voice-to-voice interaction.
  • AR Overlay: A mobile version that uses the camera feed to overlay the escape room HUD directly onto the physical world in real-time.

Built With

Share this project:

Updates