Drawing Buddy

Inspiration

Inspiration Every kid deserves a creative companion who's always excited to see what they make. We noticed that children light up when their favorite characters "talk" to them — so we asked: what if Moana could actually watch your child draw, react to their artwork in real time, and turn their creation into a bedtime story? Parents struggle to keep kids engaged creatively without screens. Drawing Buddy solves this — the child draws on real paper with real crayons while an AI character voice guides, encourages, and storytells. Physical play meets AI magic. What it does Drawing Buddy is an AI art companion that:

Talks as Moana (or any character) using Higgs Audio voice cloning from a short reference clip Watches the child draw through a live camera feed, recognizing shapes, colors, and new elements Has a real conversation — the child speaks, the AI listens and responds naturally, one sentence at a time Guides step by step — suggests what to draw next, asks questions, and celebrates every stroke Turns mistakes into magic — a wonky circle becomes "a mysterious portal," a crooked line becomes "a secret path" Creates a story at the end using everything the child drew, with character names they invented during the session Sends parents a summary — what was drawn, what was said, how engaged the child was

How we built it The architecture connects three AI systems in a real-time loop: Vision → Brain → Voice

Camera Feed → Gemini Vision: A live webcam captures the drawing. Every few seconds, a frame is sent to Google Gemini which analyzes what the child has drawn — detecting new shapes, colors, and elements. Conversation Brain → Gemini API: The story engine maintains a rolling conversation with strict rules: one short sentence at a time, always praise first, never talk over the child, follow their lead. It tracks drawing elements, character names, and session phases. Voice Output → Higgs Audio TTS v2.5 via Eigen AI: The AI's text response is sent to Higgs Audio for voice cloning. Using a 5-second reference clip of Moana's voice, Higgs generates expressive speech that sounds like the character — with excitement for big moments, gentleness for encouragement, and drama for storytelling. Microphone → Web Speech API: The child's speech is transcribed in real time using the browser's built-in speech recognition, then fed back into the conversation loop so the AI responds to what the child actually said. Web Interface: A FastAPI + WebSocket backend serves a responsive UI showing the live camera, conversation chat, session phases, and parent controls.

Challenges we faced

Turn-taking was the hardest problem. Getting the AI to say ONE thing and then shut up and wait was surprisingly difficult. Early versions would dump multiple sentences or keep talking over the child. We solved this with a strict locking mechanism — the AI cannot speak again until the child responds. Voice cloning quality depends heavily on the reference clip. Background music, other speakers, or noise in the reference audio degrades the clone. Finding a clean 5-second clip of Moana speaking without soundtrack was critical. Speech recognition with children's voices is imperfect. Kids speak softly, mumble, and use unexpected words. We had to filter out low-confidence transcriptions and show interim results for debugging without sending garbage to the AI. Camera detection of drawings is noisy. Small camera movements trigger false "drawing changed" events. We added frame differencing thresholds and cooldown timers so the AI only reacts to real changes on the paper. Latency matters more with kids. Adults tolerate 2-3 second delays. Kids don't. Every millisecond in the camera→Gemini→Higgs→speaker pipeline had to be optimized.

What we learned

Higgs Audio's voice cloning from just a few seconds of reference audio is remarkably good — the emotional expressiveness makes it feel alive, not robotic The system prompt is 80% of the product — the code is just plumbing to get a great prompt talking through a great voice Kids don't care about perfect AI — they care about feeling heard and celebrated The "show me your drawing" moment, where the AI describes specific things it sees, is pure magic

What's next

More character voices — let kids pick from Elsa, Spider-Man, Dora, or create original characters Story memory across sessions — "Remember yesterday when you drew Rex the dragon? Let's draw his castle today!" A dedicated tablet/camera device that clips onto a desk Collaborative drawing — two kids drawing together with the AI narrating their combined creation Publishing the stories as mini picture books with the child's actual drawings

Built with

Higgs Audio TTS v2.5 (voice cloning via Eigen AI) Google Gemini API (vision + conversation) Python, FastAPI, WebSockets OpenCV (camera capture) Web Speech API (child's voice input)

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Drawing Buddy

Built With

eigen-ai
fastapi
google-gemini-api
higgs-audio-tts-v2.5
opencv
python
web-speech-api
websockets

Updates

Sanjay Kumar started this project — Mar 22, 2026 03:53 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.