Inspiration

Most homework help tools are still text boxes. That works for adults, but not for young kids who can't type fast, struggle to describe what they're looking at, or just want to talk to someone. We built Magic Homework Buddy for those kids — a tutor that listens, sees the page, and responds out loud in real time.

What it does

Magic Buddy is a real-time multimodal AI tutor powered by the Gemini Live API. A child points their camera at their homework, starts talking, and Buddy — a warm, encouraging AI tutor — sees the worksheet, hears the question, and answers with live voice.

Key features:

  • Live voice tutoring — bidirectional audio conversation with no silence gaps
  • Camera homework scan — continuous 1 FPS video feed so Buddy sees what the child is holding up
  • Live transcript — word-by-word chat bubbles stream in real time for both child and Buddy
  • Buddy Vision — AI tool-call powered catalog of detected learning items
  • Buddy Studio — type a topic, get a cartoonish educational illustration (Gemini image generation)
  • Session summary — an encouraging written recap after each session
  • Deployed on Google Cloud Run — live at https://magic-buddy-998069837739.us-central1.run.app

How we built it

  • Frontend: Angular 21 with standalone components, Angular Signals, and OnPush change detection
  • Styling: Tailwind CSS 4 with a neobrutalist design language
  • Gemini Live API: ai.live.connect() opens a BidiGenerateContent WebSocket for real-time audio + video
  • Audio pipeline: Web AudioWorklet nodes handle 16kHz PCM mic input and 24kHz PCM speaker output — not just MediaRecorder. This keeps latency low and gives us precise control over the audio stream
  • Transcription: inputAudioTranscription and outputAudioTranscription stream words into the UI as they are spoken — before the turn ends
  • Tool calling: catalog_item(name, emoji, color) function lets Buddy update the Buddy Vision panel mid-conversation without interrupting audio
  • Image generation: gemini-2.5-flash-image generates illustrations via a separate REST call, completely independent of the live session
  • SSR + Cloud Run: Angular SSR on Express 5, containerised with Docker, deployed via Cloud Build to Google Cloud Run

Challenges we ran into

  • Angular 21's template compiler rejects single-quoted string literals inside @if expressions within @for loops — worked around with a component helper method
  • Buddy's transcript was only showing the last few words because each outputTranscription.text chunk was replacing the previous one — fixed by appending (+=) instead of assigning
  • AudioWorklet processors must be registered and loaded correctly across both the SSR and client bundles — required careful path handling for the worklet scripts
  • Getting the Gemini Live session to handle both audio and 1 FPS JPEG video frames simultaneously without dropping the connection required careful stream management

Accomplishments that we're proud of

  • True word-by-word streaming transcript using the Live API's inputAudioTranscription + outputAudioTranscription — both the child and Buddy's words appear in real time
  • A robust dual-AudioWorklet pipeline (not just MediaRecorder) that gives native-quality audio at correct sample rates
  • Tool calling mid-conversation: Buddy updates the UI catalog while still speaking, without any interruption to the audio stream
  • A complete, deployed, working app — live on Google Cloud Run — that a real child can use right now

What we learned

  • The Gemini Live API's BidiGenerateContent WebSocket is genuinely different from batch APIs — it requires rethinking the entire audio pipeline around streaming chunks, not complete utterances
  • Native audio output from the model (no separate TTS) makes conversations feel dramatically more natural
  • Angular Signals with OnPush change detection are a great fit for real-time streaming UI — minimal re-renders, fast updates
  • AudioWorklet is complex to set up but essential for low-latency audio at correct sample rates

What's next for Magic Homework Buddy

  • Better turn-taking for noisy environments
  • Parent/teacher session dashboard with progress tracking
  • Age and grade-band guided lesson plans
  • Stronger worksheet detection before the manual snapshot step

Built With

  • angular-21
  • angular-ssr
  • cloud
  • docker
  • express-5
  • gemini-flash-image-(gemini-2.5-flash-image)
  • gemini-live-api-(gemini-2.5-flash-native-audio-preview-12-2025)
  • google-cloud-run
  • tailwind-css-4
  • typescript
  • web-audioworklet
Share this project:

Updates