🥥 Coco — 2D Spatial Multimodal AI Learning Companion for Kids
Coco transforms passive screen time into active learning by turning videos into two-way, interactive experiences for kids — powered by 2D spatial input, speech, and AI feedback, all in a privacy-first, local-only environment.
🚀 Inspiration
"iPad kids" aren't the issue — passive content is.
Kids watch hours of videos every day, but rarely interact with what they see. There's no response loop, no thinking checkpoint, and no way for parents to know what was actually learned.
We wanted to build a bridge between entertainment and education.
What if the screen could pause, ask a question — and wait for the child's answer?
💡 What It Does
Coco turns flat videos into interactive classrooms.
At key moments:
- The video pauses
- A character asks a question
- The child responds by drawing on the video, speaking, or both
- Coco evaluates the response and gives instant feedback
Every moment becomes an opportunity to learn.
✨ Core Modules
🎬 Interactive Player
A safe, parent-controlled experience player.
How it works:
- Child selects an experience
- Video plays normally
- At checkpoints, video pauses
- A character asks a question
- Child:
- Draws or circles directly on the video frame
- Speaks into the microphone
- AI evaluates the response
- Feedback appears (correct / try again)
- Video continues
Key features:
- Transparent drawing canvas layered over video
- Speech input with live transcription
- Instant AI-based evaluation
- Kid-first UX (big buttons, high contrast, audio cues)
🎨 Creative Canvas (Studio)
A space for kids to create instead of consume.
- Draw scenes and characters
- Build simple story environments
- Generate AI-assisted storyboards
- Turn imagination into animated stories
This shifts kids from watching → making.
👨👩👧 Parent Portal
Not a gradebook — a guidance tool.
Includes:
- Watch time overview
- 7-day activity chart
- Experiences completed
- Concepts to reinforce (e.g., Counting, Fruits)
- Suggested recall prompts, like:
- "Can you show me which one is bigger?"
- "How many apples were there?"
All data is stored locally on-device using localStorage.
No cloud tracking. No behavioral profiling.
🧠 Why Coco Is Different
| Most kids' platforms are | Coco is |
|---|---|
| Passive | 2D Spatial — kids draw on the video itself |
| Button-tap based | Multimodal — drawing + speech together |
| One-way | AI-evaluated — understanding intent, not taps |
| — | Privacy-first — local-only storage |
The video frame becomes a live learning canvas.
🛠️ How We Built It
Frontend
- Next.js (App Router)
- React
- Tailwind CSS
- Radix UI
Key components:
- Interactive video player
- Transparent drawing canvas overlay
- Microphone recording + waveform UI
- Parent dashboard powered by local analytics
Backend APIs
POST /api/evaluate-checkpoint- Input: question, transcript, optional drawing image
- Output: correctness + feedback
POST /api/transcribe- Audio → text
All AI services are environment-configured and provider-agnostic.
No vendor lock-in.
2D Spatial Interaction Layer
The core technical innovation:
- Canvas precisely layered above video using controlled z-index
- Drawing coordinates mapped to video frame space
- Drawings exported as images for AI evaluation
- Works across touch and mouse devices without breaking video controls
🧩 Challenges We Faced
🎥 Video + Canvas Layering
Ensuring the drawing canvas felt native while preserving video controls required careful pointer-event and layering logic.
🌐 Latency & Sync
Checkpoint pauses had to align across devices. We implemented pre-buffering and deterministic pause timing.
🔒 Kid-Safe Embedding
Strict filtering ensures only approved, child-safe content appears in experiences.
🎨 Designing for Kids
Text alone doesn't work. We learned to prioritize:
- Audio instructions
- Visual cues
- Immediate positive feedback
📚 What We Learned
- Kids learn more when they do, not when they watch.
- Parents want insight, not surveillance.
- 2D spatial interaction is powerful — even without AR/VR.
🔮 What's Next
🤖 AI Teacher Avatars
Custom characters (e.g., animals or mascots) that adapt tone and encouragement style.
🧠 Computer Vision Recognition
Move beyond coordinates to detect shapes (e.g., "draw a triangle").
🌍 Multi-language Support
Expand to non-English regions with localized voice and feedback.
📦 Open-Source Interactive Overlay
Release Coco's interactive layer so educators and creators can turn videos into interactive lessons.
🥥 Why Coco Matters
We don't want to remove screens.
We want to upgrade them.
Coco turns:
- Watching → Responding
- Consuming → Creating
- Screen time → Learning time
Coco is 2D spatial, multimodal AI — built for the next generation of active learners.
Built With
- ai-evaluation-api-(provider-agnostic)
- app-router
- blob-api
- canvas-api-(2d)
- client-side-state-management
- component-based-architecture
- conditional-rendering
- content
- dynamic-routing
- environment-based-configuration
- fetch-api
- filereader-api
- filtering
- gemini
- html5-video-api
- indexeddb
- javascript-(es6+)
- json
- localstorage
- mediarecorder-api
- next.js
- node.js
- openai
- pointer-events-api
- progressive-enhancement
- radix-ui
- react
- responsive-design
- rest-apis
- secure-embedding
- serverless-api-routes
- speech-to-text-api-(env-configured)
- tailwind-css
- touch-events-api
- typescript
- web-audio-api
- web-speech-api
- z-index-layer-management

Log in or sign up for Devpost to join the conversation.