Inspiration

I've always struggled with morning routines. Every night I'd set ambitious goals — yoga, stretching, reading — but every morning I'd hit snooze and scroll my phone instead.

The problem wasn't motivation. It was accountability. Doing routines alone is just too hard.

That's when I thought: What if I had an AI coach who actually talks to me, sees what I'm doing, and keeps me on track? Not a silent app with notifications I ignore, but a real conversational partner.

When I discovered Gemini Live API with real-time audio AND vision capabilities, I knew this was possible.

What it does

SparkWake is an AI morning coach that:

Talks with you in real-time — natural voice conversation, not robotic commands Sees you through your camera — verifies you actually did the routine Takes action — plays YouTube videos, marks routines complete, all through voice commands Supports interruption — you can cut off the AI anytime (barge-in), just like talking to a real person Example flow:

AI: "Good morning! Ready for stretching?"
You: "Actually, can you play a yoga video?"
AI: "Sure!" → YouTube starts playing
You: *finishes yoga*
AI: "Show me a thumbs up!"
You: *thumbs up to camera*
AI: "Great job! Mission complete! "

How we built it

Tech Stack:

  • Gemini Live API (gemini-2.5-flash-native-audio-preview) — Real-time audio/video streaming via WebSocket
  • Next.js 15 — Frontend PWA with TypeScript
  • FastAPI — Backend for token management and data storage
  • Firebase — Auth + Firestore for user data
  • Cloud Run — Serverless backend hosting
  • Terraform — Infrastructure as Code

Key Implementation Details:

  • Real-time Audio: PCM16 audio streaming through WebSocket, with AudioPlayer class for playback queue management
  • Barge-in Handling: When user speaks, onInterrupted callback triggers AudioPlayer.clear() to immediately stop AI speech
  • Video Recognition: Canvas captures camera frames → Base64 JPEG → sent to Live API for action verification
  • Tool Calling: Defined play_youtube, complete_routine, skip_routine functions that AI can invoke based on conversation context

Challenges we ran into

  • Gemini Live API is bleeding-edge — Documentation changed frequently, so we always checked the latest SDK docs before implementing any feature.
  • Prompt engineering for consistent persona — Getting the AI to reliably act as a "friendly coach" and correctly trigger tool calls required many iterations.
  • Real-time cost optimization — Live API streaming is expensive; we optimized by connecting only during active routines and throttling video frames.
  • Rapid UI iteration — Used Stitch for AI-assisted UI design to quickly prototype and maintain a consistent design system.
  • Tool Response Flow: After AI calls a tool, you must send sendToolResponse() back, or the conversation hangs. Took a while to debug this WebSocket flow.
  • Ephemeral Token Management: Live API requires short-lived tokens. Built a backend endpoint to securely generate these without exposing API keys to the client.

Accomplishments that we're proud of

  • True real-time conversation — Not turn-based chat, but actual flowing dialogue with interruption support
  • AI that takes action — Tool calling makes it feel like a real assistant, not just a chatbot
  • Vision verification works — AI actually recognizes actions like hand waves, book covers, made beds
  • Built in 1 week — From idea to working demo, including infrastructure

What we learned

  • Gemini Live API is powerful but requires careful architecture — WebSocket state management, audio buffering, and tool response handling all need attention
  • Multimodal AI changes UX paradigms — When AI can see AND hear, the interaction model is completely different from text chat
  • Persona matters — A friendly, encouraging coach persona makes users actually want to use the app
  • Cost optimization is real — Real-time streaming APIs can get expensive fast. Need to be smart about when to connect/disconnect

What's next for SparkWake

  • Push notifications — Wake-up alarms that launch directly into AI coaching session
  • Weekly analytics — Gemini-powered insights on routine patterns and personalized recommendations
  • Social features — Morning routine challenges with friends
  • Multi-language support — Currently English-focused, planning Korean and more
  • Connection pooling — Optimize Live API usage for longer routines without continuous streaming

Built With

  • cloud-firestore
  • fastapi
  • firebase-authentication
  • firebase-hosting
  • gemini-2.5-flash
  • gemini-live-api
  • google-cloud-run
  • next.js-15
  • python-3.12
  • react-19
  • tailwind-css
  • terraform
  • typescript
  • websocket
Share this project:

Updates