Inspiration
I’ve been helping my six-year-old daughter prepare for an international math competition. To practice, I search online for past sample papers, download the PDFs, print them out, and sit with her working through each question. When she gets stuck, I offer hints, guide her thinking, or help her read the problem so she can arrive at the solution herself.
Last month, though, I had a bit of a rude awakening. She told me—very directly—that she doesn’t like my teaching style. Among other things, she said I’m not friendly or polite when I teach her math. That stung. When I reflected on it, I realized I couldn’t remember ever being taught math by a teacher who was particularly warm or encouraging. Maybe that’s just a cultural difference from how I grew up. And since English isn’t my first language, I sometimes mispronounce words while helping her read the questions, which likely adds to her frustration.
The whole process was also inefficient. I had to sit beside her the entire time, unable to work on my own tasks because she frequently needed help with clues or problem-solving approaches. All of this pushed me to put on my “superhero Dad” hat and think of a better solution—one that could automate the coaching process using AI.
That’s how Coach Kangaroo was born.
What it does
Coach Kangaroo is an AI-powered, real-time math coach designed to help kindergarten and 1st-grade students prepare for international competitions like Math Kangaroo — while reducing the pressure on parents.
Instead of parents manually searching for past papers, printing PDFs, and sitting beside their child to provide hints, Coach Kangaroo transforms the experience into an interactive, AI-guided learning session.
Core Capabilities
- 📖 Understands math questions from competition-style practice papers.
- 🧠 Explains problems in child-friendly language, tailored to ages 5–7.
- 🪜 Provides step-by-step hints instead of direct answers, encouraging critical thinking.
- 🔁 Adapts dynamically based on the child’s responses.
- 😊 Maintains a polite, encouraging tone to build confidence and reduce math anxiety.
- 🔊 Offers reading assistance, helping children who struggle to read questions independently.
- ⏳ Enables independent practice, allowing parents to work in parallel instead of micromanaging each problem.
The Result
Coach Kangaroo acts like a patient, friendly math tutor — one that never gets tired, frustrated, or short on encouragement.
It transforms competition preparation from a stressful, parent-led drill into an engaging, confidence-building learning experience.
How we built it
Coach Kangaroo was built entirely using Google AI Studio and the Gemini 3 API.
Instead of writing traditional backend-heavy infrastructure, I leveraged AI Studio’s app-building environment to rapidly prototype and iterate using structured prompting.
Development Approach
- 🧩 Designed the app logic through a series of carefully engineered prompts
- 🎯 Tuned the model to respond in a child-friendly, polite, and encouraging tone
- 🪜 Structured prompts to ensure the AI gives hints instead of direct answers
- 🔁 Iteratively refined instructions based on real testing sessions with a 6-year-old user
- 🗣️ Configured the assistant to simplify language to a Kindergarten / Grade 1 comprehension level
- ⚡ Used AI Studio’s rapid iteration loop to test, adjust, and redeploy within minutes
Key Design Decisions
- Focused on guided reasoning rather than solution dumping
- Built guardrails to prevent the AI from immediately revealing answers
- Optimized responses to be short, clear, and confidence-building
- Designed the tone intentionally to feel like a patient, friendly tutor
By combining prompt engineering, structured reasoning flows, and live interaction capabilities inside AI Studio, Coach Kangaroo was developed quickly without needing complex infrastructure — allowing full focus on the learning experience.
Challenges we ran into
Creating Coach Kangaroo involved several high-level engineering challenges, specifically balancing pedagogical rigor with low-latency real-time interaction:
Deterministic vs. Generative Control: We had to prevent the LLM from simply blurting out answers. This required a deterministic Policy Engine that sits between the transcription and the model, forcing the coach into specific states (like COMPREHEND or PLAN) before it’s allowed to help with the math.
Gapless Audio Streaming: The Gemini Live API returns raw PCM chunks. Building a reliable playback queue using AudioContext with a precise nextStartTime cursor was critical to ensure the Kangaroo sounds like a continuous speaker rather than a series of disjointed clips.
Heuristic Reading Tracking: Mapping "messy" real-time transcriptions (with stutters or background noise) back to the static question text to detect exactly where a 6-year-old is stuck. We implemented a sliding-window fuzzy matcher to update the currentWordIndex without being thrown off by noise.
Race Conditions: Synchronizing the microphone stream with the websocket state. We utilized a sessionPromise pattern to ensure that no PCM data is sent before the session is fully resolved, preventing initialization errors.
Multimodal Snipping: Coordinating PDF.js for rendering and Gemini Vision for coordinate detection. We had to translate the model's 0-1000 normalized bounding boxes into exact pixel coordinates for the HTML5 Canvas to "snip" the math puzzles into clean visual quest cards.
Accomplishments that we're proud of
As a senior engineer, I’m particularly proud of how we pushed the boundaries of human-AI interaction design for a specific, high-needs demographic (6-year-olds).
Here are the specific accomplishments that make this app stand out:
The "Invisible" Reading Support:Most reading apps require a child to press a button for help. I'm proud of our Fuzzy Word Tracker. It uses a sliding-window matcher that can ignore stutters, "ums," and background noise to accurately pinpoint the child's progress. Triggering a sub-second, single-word audio prompt when a hesitation is detected makes the AI feel like a parent sitting right next to them.
Deterministic Pedagogy:Generative AI is notoriously "chatty" and prone to just giving the answer. I'm proud of the Policy Engine we built. By forcing the LLM to transition through states (READ → COMPREHEND → PLAN), we’ve essentially hard-coded a Socratic teaching method into a probabilistic model. It’s a hybrid approach that provides the best of both worlds: LLM flexibility and pedagogical discipline.
The PDF-to-Quest Pipeline:The "Snipping" feature is a major win. We aren't just doing OCR; we are using Gemini Vision to behave like a human with a pair of scissors. By translating normalized coordinates (0-1000) from the model into exact pixel-renders on a Canvas via PDF.js, we transform a static, intimidating test paper into a series of bite-sized, gamified "Visual Quests."
Latency-First Audio Engineering:We solved the "clash" of two speakers. By implementing a precise nextStartTime cursor and a [SYSTEM_INTERRUPT] signal, we created a conversation that feels natural. When the child speaks, the AI doesn't just stop; it clears its internal audio buffers and resets its thinking state, ensuring it’s always reacting to the current moment.
Multi-Modal Reinforcement:I love the visual feedback loops—like the text glowing green when reading is completed or the "Judge Panel" for debugging. It ensures that even if the audio is noisy, the child receives a clear, encouraging visual signal that they are on the right track.
What we learned
Working on Coach Kangaroo Live was a masterclass in building empathetic AI. When you're designing for a 6-year-old, the technical requirements shift from "efficiency" to "rhythm and encouragement."
Here’s what I learned during this build:
1. Pedagogy Must Be "Hard-Coded"
I learned that you can't rely on an LLM's "personality" alone to be a good teacher. Without our Policy Engine, the model would naturally gravitate toward being too helpful (giving the answer) or too vague. By mapping transcription events to a strict state machine (READ → COMPREHEND → PLAN), I learned how to use AI as a component within a larger, deterministic pedagogical framework.
2. Silence is Data
In adult apps, silence usually means "idle." For a child reading a math problem, silence is high-intent effort. I learned that tracking the "Reading Cursor" (the sliding window of words) is the only way to distinguish between a child who is thinking and a child who is stuck. Responding with exactly one word at the mark is a very specific sweet spot for supporting early readers without frustrating them.
3. Multi-Modal Reinforcement is Key
Audio is powerful, but for math, it isn't enough. I learned that the Visual-Audio loop is critical. When the Kangaroo says "Amazing reading!", the child also needs to see the green glow on the screen. This "double-hit" of dopamine helps anchor the learning and confirms the AI actually heard them correctly, which is vital for building trust in a voice interface.
4. LLM "Vibe" vs. Tool Use
I learned that the best way to handle correct answers is to take the decision-making out of the conversation. By using the signalCorrectAnswer tool, we moved the logic of "success" from the chat transcript into the UI layer. This allows the model to focus 100% on the celebration and energy, while the app handles the "boring" parts like switching questions and updating stats.
5. The "Snipping" Metaphor
I learned that the most effective way to handle complex documents (like Math Kangaroo PDFs) is to think like a human. Instead of trying to parse the whole page into text, using Vision-guided Snipping creates a much better user experience. A picture of a puzzle is less intimidating and more engaging for a child than a block of extracted text.
In short, I learned that real-time AI coaching isn't about the model's intelligence—it's about the model's timing.
What's next for Coach Kangaroo
To keep the momentum going for Coach Kangaroo, we have several exciting technical and pedagogical milestones on the roadmap. The goal is to evolve from a "smart tutor" to a fully immersive "Math Quest Companion."
Here’s what’s in the pouch for the future:
1. "Vision-Live" Hybrid Integration
Right now, the coach "sees" the PDF snippets via pre-processing. The next step is to integrate the Live API's image streaming capabilities. This would allow a child to hold up their own physical workbook or a drawing to the camera. Coach Kangaroo could then say, "Ooh, I see you drew three circles! Let’s count those together," creating a bridge between the physical and digital worlds.
2. Procedural Visual Explanations
While the audio is great for Socratic coaching, some 6-year-olds are visual learners. We plan to build a Visual Solution Engine that generates real-time animations based on the tutor's state. If the coach is helping with a "sequencing" problem, the UI could procedurally animate jumping kangaroos or moving blocks to mirror the verbal hints, making abstract logic concrete.
3. The "Hopping Map" (Long-term Mastery)
We already have the Skill Model tracking mastery of addition, logic, and geometry. We want to wrap this in a persistent world map. As children master skills, they "hop" to new islands (e.g., Addition Atoll or Pattern Plains). This adds a long-term RPG-style progression layer that rewards consistent practice over time.
4. Adaptive Difficulty Scaling
Using the thinkingBudget and maxOutputTokens more dynamically, we can adjust the Coach’s "intelligence." If a child is breezing through Level 2 problems, the app will automatically start snipping more complex Level 3 puzzles and prompt the Coach to ask more "Deep Thinking" questions that require multi-step reasoning.
5. Parental "Quest Log"
A dashboard for parents that doesn't just show "Right/Wrong" percentages, but highlights qualitative growth. For example: "Today, your child showed great persistence in reading 3 complex sentences and correctly identified a logic pattern they struggled with last week."
Coach Kangaroo is just getting started. We’re moving toward a future where every child has a world-class, high-energy math mentor in their pocket that actually listens and cheers for them.
Keep hopping! 🦘✨
Built With
- canvas
- es6
- gemini-2.5-flash-native-audio-preview-12-2025
- gemini-3-flash-preview
- google-ai-studio
- mediadevices
- pdf.js
- react
- tailwind-css
- typescript
- vercel
- web-audio
- websockets

Log in or sign up for Devpost to join the conversation.