🥥 Coco — 2D Spatial Multimodal AI Learning Companion for Kids

Coco transforms passive screen time into active learning by turning videos into two-way, interactive experiences for kids — powered by 2D spatial input, speech, and AI feedback, all in a privacy-first, local-only environment.


🚀 Inspiration

"iPad kids" aren't the issue — passive content is.

Kids watch hours of videos every day, but rarely interact with what they see. There's no response loop, no thinking checkpoint, and no way for parents to know what was actually learned.

We wanted to build a bridge between entertainment and education.

What if the screen could pause, ask a question — and wait for the child's answer?


💡 What It Does

Coco turns flat videos into interactive classrooms.

At key moments:

  1. The video pauses
  2. A character asks a question
  3. The child responds by drawing on the video, speaking, or both
  4. Coco evaluates the response and gives instant feedback

Every moment becomes an opportunity to learn.


✨ Core Modules

🎬 Interactive Player

A safe, parent-controlled experience player.

How it works:

  1. Child selects an experience
  2. Video plays normally
  3. At checkpoints, video pauses
  4. A character asks a question
  5. Child:
    • Draws or circles directly on the video frame
    • Speaks into the microphone
  6. AI evaluates the response
  7. Feedback appears (correct / try again)
  8. Video continues

Key features:

  • Transparent drawing canvas layered over video
  • Speech input with live transcription
  • Instant AI-based evaluation
  • Kid-first UX (big buttons, high contrast, audio cues)

🎨 Creative Canvas (Studio)

A space for kids to create instead of consume.

  • Draw scenes and characters
  • Build simple story environments
  • Generate AI-assisted storyboards
  • Turn imagination into animated stories

This shifts kids from watching → making.

👨‍👩‍👧 Parent Portal

Not a gradebook — a guidance tool.

Includes:

  • Watch time overview
  • 7-day activity chart
  • Experiences completed
  • Concepts to reinforce (e.g., Counting, Fruits)
  • Suggested recall prompts, like:
    • "Can you show me which one is bigger?"
    • "How many apples were there?"

All data is stored locally on-device using localStorage.
No cloud tracking. No behavioral profiling.


🧠 Why Coco Is Different

Most kids' platforms are Coco is
Passive 2D Spatial — kids draw on the video itself
Button-tap based Multimodal — drawing + speech together
One-way AI-evaluated — understanding intent, not taps
Privacy-first — local-only storage

The video frame becomes a live learning canvas.


🛠️ How We Built It

Frontend

  • Next.js (App Router)
  • React
  • Tailwind CSS
  • Radix UI

Key components:

  • Interactive video player
  • Transparent drawing canvas overlay
  • Microphone recording + waveform UI
  • Parent dashboard powered by local analytics

Backend APIs

  • POST /api/evaluate-checkpoint

    • Input: question, transcript, optional drawing image
    • Output: correctness + feedback
  • POST /api/transcribe

    • Audio → text

All AI services are environment-configured and provider-agnostic.
No vendor lock-in.

2D Spatial Interaction Layer

The core technical innovation:

  • Canvas precisely layered above video using controlled z-index
  • Drawing coordinates mapped to video frame space
  • Drawings exported as images for AI evaluation
  • Works across touch and mouse devices without breaking video controls

🧩 Challenges We Faced

🎥 Video + Canvas Layering

Ensuring the drawing canvas felt native while preserving video controls required careful pointer-event and layering logic.

🌐 Latency & Sync

Checkpoint pauses had to align across devices. We implemented pre-buffering and deterministic pause timing.

🔒 Kid-Safe Embedding

Strict filtering ensures only approved, child-safe content appears in experiences.

🎨 Designing for Kids

Text alone doesn't work. We learned to prioritize:

  • Audio instructions
  • Visual cues
  • Immediate positive feedback

📚 What We Learned

  • Kids learn more when they do, not when they watch.
  • Parents want insight, not surveillance.
  • 2D spatial interaction is powerful — even without AR/VR.

🔮 What's Next

🤖 AI Teacher Avatars

Custom characters (e.g., animals or mascots) that adapt tone and encouragement style.

🧠 Computer Vision Recognition

Move beyond coordinates to detect shapes (e.g., "draw a triangle").

🌍 Multi-language Support

Expand to non-English regions with localized voice and feedback.

📦 Open-Source Interactive Overlay

Release Coco's interactive layer so educators and creators can turn videos into interactive lessons.


🥥 Why Coco Matters

We don't want to remove screens.
We want to upgrade them.

Coco turns:

  • WatchingResponding
  • ConsumingCreating
  • Screen timeLearning time

Coco is 2D spatial, multimodal AI — built for the next generation of active learners.

Built With

  • ai-evaluation-api-(provider-agnostic)
  • app-router
  • blob-api
  • canvas-api-(2d)
  • client-side-state-management
  • component-based-architecture
  • conditional-rendering
  • content
  • dynamic-routing
  • environment-based-configuration
  • fetch-api
  • filereader-api
  • filtering
  • gemini
  • html5-video-api
  • indexeddb
  • javascript-(es6+)
  • json
  • localstorage
  • mediarecorder-api
  • next.js
  • node.js
  • openai
  • pointer-events-api
  • progressive-enhancement
  • radix-ui
  • react
  • responsive-design
  • rest-apis
  • secure-embedding
  • serverless-api-routes
  • speech-to-text-api-(env-configured)
  • tailwind-css
  • touch-events-api
  • typescript
  • web-audio-api
  • web-speech-api
  • z-index-layer-management
Share this project:

Updates