The inspiration for Chronicles of the Unspoken was born from a simple observation: human communication is only $7\%$ verbal. Traditional AI interactions ignore the remaining $93\%$, which consists of tone of voice and body language. We wanted to move beyond the "text box" and create a world where NPCs don't just listen to your words, but actually feel your presence. We were inspired by the concept of "Biometric Intelligence"—using the camera and microphone as sensory organs for the AI, transforming Gemini 3 into a digital empath. How We Built ItThe project is built on a Native Multimodal architecture, leveraging Gemini 3’s ability to process interleaved data streams.The Sensory Engine: We used MediaPipe for local, high-speed hand tracking to minimize latency, combined with Gemini 3’s Vision API to interpret complex micro-expressions and physical objects held by the player.The Emotional State Engine: Gemini 3 acts as the "Game Master." It processes a continuous stream of audio and video frames to determine the player’s stress levels using a weighted probability model:$$P(Stress) = \omega_1 \cdot T + \omega_2 \cdot G + \omega_3 \cdot H$$Where $T$ is vocal tension, $G$ is erratic gesturing, and $H$ is logical hesitation.Frontend: A high-performance React application utilizing Glassmorphism and real-time Canvas rendering to provide a "futuristic OS" feel. What We LearnedWe discovered that low latency is the key to immersion. In our bomb-defusal module, even a $500\text{ms}$ delay in gesture recognition broke the tension. We learned how to optimize token usage by sending lower-resolution video frames for "motion detection" and high-resolution frames only when Gemini needed to "inspect" an object. Most importantly, we learned that AI can be trained to detect nuance—recognizing the difference between a player being "confused" versus "lying" during an interrogation. Challenges We FacedThe biggest challenge was Synchronous Multimodality. Managing a live audio stream, a video feed, and the model's reasoning logic without causing a bottleneck was difficult. We faced a "Data Flooding" issue where sending too many frames per second hit rate limits.We solved this by implementing a Contextual Sampling algorithm:The app tracks hand movement locally.If a specific "Intent Gesture" (like a pinch) is detected, it triggers a "High-Priority Frame" to be sent to Gemini 3.This reduced our data throughput by nearly $60\%$ without sacrificing the AI's perceived "intelligence." The FutureChronicles of the Unspoken is more than a game; it is a proof-of-concept for Intent-Based Accessibility. The same tech that lets a player defuse a bomb with a gesture can help someone with limited mobility control their environment through simple, intuitive micro-movements recognized by Gemini 3.

Built With

Share this project:

Updates