🥥 Coco — 2D Spatial Multimodal AI Learning Companion for Kids

Coco transforms passive screen time into active learning by turning videos into two-way, interactive experiences for kids — powered by 2D spatial input, speech, and AI feedback, all in a privacy-first, local-only environment.

🚀 Inspiration

"iPad kids" aren't the issue — passive content is.

Kids watch hours of videos every day, but rarely interact with what they see. There's no response loop, no thinking checkpoint, and no way for parents to know what was actually learned.

We wanted to build a bridge between entertainment and education.

What if the screen could pause, ask a question — and wait for the child's answer?

💡 What It Does

Coco turns flat videos into interactive classrooms.

At key moments:

The video pauses
A character asks a question
The child responds by drawing on the video, speaking, or both
Coco evaluates the response and gives instant feedback

Every moment becomes an opportunity to learn.

✨ Core Modules

🎬 Interactive Player

A safe, parent-controlled experience player.

How it works:

Child selects an experience
Video plays normally
At checkpoints, video pauses
A character asks a question
Child:
- Draws or circles directly on the video frame
- Speaks into the microphone
AI evaluates the response
Feedback appears (correct / try again)
Video continues

Key features:

Transparent drawing canvas layered over video
Speech input with live transcription
Instant AI-based evaluation
Kid-first UX (big buttons, high contrast, audio cues)

🎨 Creative Canvas (Studio)

A space for kids to create instead of consume.

Draw scenes and characters
Build simple story environments
Generate AI-assisted storyboards
Turn imagination into animated stories

This shifts kids from watching → making.

👨‍👩‍👧 Parent Portal

Not a gradebook — a guidance tool.

Includes:

Watch time overview
7-day activity chart
Experiences completed
Concepts to reinforce (e.g., Counting, Fruits)
Suggested recall prompts, like:
- "Can you show me which one is bigger?"
- "How many apples were there?"

All data is stored locally on-device using localStorage.
No cloud tracking. No behavioral profiling.

🧠 Why Coco Is Different

Most kids' platforms are	Coco is
Passive	2D Spatial — kids draw on the video itself
Button-tap based	Multimodal — drawing + speech together
One-way	AI-evaluated — understanding intent, not taps
—	Privacy-first — local-only storage

The video frame becomes a live learning canvas.

🛠️ How We Built It

Frontend

Next.js (App Router)
React
Tailwind CSS
Radix UI

Key components:

Interactive video player
Transparent drawing canvas overlay
Microphone recording + waveform UI
Parent dashboard powered by local analytics

Backend APIs

POST /api/evaluate-checkpoint
- Input: question, transcript, optional drawing image
- Output: correctness + feedback
POST /api/transcribe
- Audio → text

All AI services are environment-configured and provider-agnostic.
No vendor lock-in.

2D Spatial Interaction Layer

The core technical innovation:

Canvas precisely layered above video using controlled z-index
Drawing coordinates mapped to video frame space
Drawings exported as images for AI evaluation
Works across touch and mouse devices without breaking video controls

🧩 Challenges We Faced

🎥 Video + Canvas Layering

Ensuring the drawing canvas felt native while preserving video controls required careful pointer-event and layering logic.

🌐 Latency & Sync

Checkpoint pauses had to align across devices. We implemented pre-buffering and deterministic pause timing.

🔒 Kid-Safe Embedding

Strict filtering ensures only approved, child-safe content appears in experiences.

🎨 Designing for Kids

Text alone doesn't work. We learned to prioritize:

Audio instructions
Visual cues
Immediate positive feedback

📚 What We Learned

Kids learn more when they do, not when they watch.
Parents want insight, not surveillance.
2D spatial interaction is powerful — even without AR/VR.

🔮 What's Next

🤖 AI Teacher Avatars

Custom characters (e.g., animals or mascots) that adapt tone and encouragement style.

🧠 Computer Vision Recognition

Move beyond coordinates to detect shapes (e.g., "draw a triangle").

🌍 Multi-language Support

Expand to non-English regions with localized voice and feedback.

📦 Open-Source Interactive Overlay

Release Coco's interactive layer so educators and creators can turn videos into interactive lessons.

🥥 Why Coco Matters

We don't want to remove screens.
We want to upgrade them.

Coco turns:

Watching → Responding
Consuming → Creating
Screen time → Learning time

Coco is 2D spatial, multimodal AI — built for the next generation of active learners.

Built With

ai-evaluation-api-(provider-agnostic)
app-router
blob-api
canvas-api-(2d)
client-side-state-management
component-based-architecture
conditional-rendering
content
dynamic-routing
environment-based-configuration
fetch-api
filereader-api
filtering
gemini
html5-video-api
indexeddb
javascript-(es6+)
json
localstorage
mediarecorder-api
next.js
node.js
openai
pointer-events-api
progressive-enhancement
radix-ui
react
responsive-design
rest-apis
secure-embedding
serverless-api-routes
speech-to-text-api-(env-configured)
tailwind-css
touch-events-api
typescript
web-audio-api
web-speech-api
z-index-layer-management

Updates

Aaditi Singhal started this project — Feb 15, 2026 11:34 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.