Inspiration
It started with my niece. She absolutely refused to brush her teeth — no amount of asking, pleading, or explaining worked. But one day, while playing pretend phone calls with her stuffed bear, she suddenly announced: "Bear says I should brush my teeth! OK!" and ran to the bathroom.
That moment stuck with me. A third party — even an imaginary one — could reach her in a way her parents couldn't. This isn't a coincidence. Research on Parasocial Relationships (Horton & Wohl, 1956) shows that children form genuine emotional bonds with characters, making them more receptive to guidance. And Self-Determination Theory (Ryan & Deci, 2000) explains why: when autonomy is preserved — when the child chooses to act rather than being told — motivation becomes intrinsic and lasting.
I thought: what if we could combine the power of a beloved character with the intelligence of a real-time AI that listens, adapts, and guides through questions? That's how Monti was born.
What It Does
Monti is a real-time AI voice companion for children aged 2–8, built for the Live Agents category. A friendly AI character literally calls the child's phone, creating a magical moment — "Monty is calling you!"
The conversation is fully voice-driven with no buttons needed. The child just talks naturally, and Monti:
- Listens with Voice Activity Detection — no tap-to-talk
- Responds with natural, age-adaptive speech in real time
- Handles interruptions gracefully via barge-in support
- Guides through Socratic questioning — never commands, always asks
- Celebrates when the child decides to act — with fanfare and a "pinky promise"
Example flow for a tooth-brushing scenario:
🐻 "Heeey Yuu-chan! What yummy food did you eat today?" 👧 "Curry!" 🐻 "Ohhh! What do you think happens to curry bits on your teeth?" 👧 "They stay there?" 🐻 "Right! What could you do about that?" 👧 "Brush my teeth!" 🐻 "Yaaay! It's a pinky promise with Monty! Go brush! 🎉"
The child decided on their own. No commands. No fear. Just guided discovery.
The Science Behind It
This isn't just a chatbot with a cute voice. Every conversation design choice is backed by developmental psychology:
| Principle | Application in Monti | Evidence |
|---|---|---|
| Montessori Education | Child discovers, AI guides | Cognitive flexibility d=0.61 (Lillard & Else-Quest, 2006) |
| Socratic Method | Questions, not answers | Critical thinking gains in 5-6 year-olds (Reznitskaya et al., 2009) |
| Self-Determination Theory | Autonomy over control | Lasting behavior change vs. temporary compliance (Ryan & Deci, 2000) |
| Growth Mindset | Praise effort, not ability | Process praise increases persistence 30%+ (Mueller & Dweck, 1998) |
| Personalization | Always calls child by name | Name recognition activates unique brain regions (Mayer, 2005) |
| Age-Adaptive Pacing | Slower speech for younger kids | Slower rate improves comprehension in young children (Zangl & Mills, 2007) |
How We Built It — 5 Days, Start to Finish
Day 1-2: Foundation & Gemini Live API
Built the Flutter app foundation and connected to Gemini's Live API via WebSocket. Got real-time audio streaming working — 16kHz PCM in, 24kHz PCM out. The first time Monty spoke back was magical.
Day 3: The Audio Pipeline Battle
The hardest part of the entire project. Three major issues:
- Echo barge-in — The speaker's audio leaked into the microphone, causing Gemini to think the child was interrupting. Solved with
_isSpeakingflag + calculated drain timing based on buffered PCM bytes minus elapsed playback time - Function calling failure — The Google AI preview model never called
end_conversation. Switched to Vertex AI GA model (gemini-live-2.5-flash-native-audio) which resolved it immediately - Premature goal detection — The model sometimes called
end_conversationon the first turn. Added both prompt-level and code-level guards (minimum 2 completed turns)
Day 4: Cloud Run & Character Generation
- Built a Cloud Run token server so the app never needs hardcoded credentials
- Added AI character generation using Gemini 3.1 Flash Image Preview — parents describe any character ("a friendly purple dinosaur") and get a flat emoji-style avatar instantly
- This is what makes Monti truly AI-native: both the conversation AND the characters are AI-generated
Day 5: Polish & Submission
- Age-adaptive system prompts (3-4: baby-talk with stretched vowels, 5-6: moderate, 7+: natural)
- detail.design UI techniques: page transitions, celebration screen with floating emojis, talking pulse animation
- Character asset images, incoming call simulation, bilingual support (EN/JP)
Google Cloud Architecture
Flutter App ──WebSocket──► Vertex AI (Gemini Live API)
│ • Native audio processing
│ • VAD & barge-in
│ • Function calling
│
├──── HTTP ──────────► Cloud Run
│ • Token server (service account auth)
│ • Character image generation
│ • Gemini 3.1 Flash proxy
Google Cloud services used:
- Vertex AI — Gemini Live 2.5 Flash Native Audio (GA) for real-time voice conversation
- Cloud Run — Backend token server + AI character generation proxy
- Gemini 3.1 Flash Image Preview — On-demand character avatar generation
What We Learned
- Vertex AI GA vs Google AI Preview — Function calling only worked reliably on the GA model. This single switch fixed our biggest blocker
- Audio timing is everything — The gap between "server finished sending data" and "speaker finished playing" was the root cause of 80% of our UX bugs
- Designing for children ≠ designing for adults — Slower pace, shorter sentences, emotional warmth, and name-calling matter far more than response accuracy
- AI-generated characters change the product — Moving from template selection to "describe your character" transformed Monti from an app into a platform
What's Next
- Parent dashboard with AI-generated conversation insights and progress tracking
- Community scenario marketplace — parents share custom missions
- Speech development analytics based on conversation patterns
- Multi-language expansion — currently EN/JP, targeting 24 languages supported by Gemini
- Rive character animations — replacing static images with dynamic expressions
Built With
- cloud-run
- dart
- elevenlabs
- flask
- flutter
- gemini-3.1-flash-image-preview
- gemini-live-api
- go-router
- google-cloud
- python
- riverpod
- vertex-ai
- websocket
Log in or sign up for Devpost to join the conversation.