Inspiration
I've always struggled with morning routines. Every night I'd set ambitious goals — yoga, stretching, reading — but every morning I'd hit snooze and scroll my phone instead.
The problem wasn't motivation. It was accountability. Doing routines alone is just too hard.
That's when I thought: What if I had an AI coach who actually talks to me, sees what I'm doing, and keeps me on track? Not a silent app with notifications I ignore, but a real conversational partner.
When I discovered Gemini Live API with real-time audio AND vision capabilities, I knew this was possible.
What it does
SparkWake is an AI morning coach that:
Talks with you in real-time — natural voice conversation, not robotic commands Sees you through your camera — verifies you actually did the routine Takes action — plays YouTube videos, marks routines complete, all through voice commands Supports interruption — you can cut off the AI anytime (barge-in), just like talking to a real person Example flow:
AI: "Good morning! Ready for stretching?"
You: "Actually, can you play a yoga video?"
AI: "Sure!" → YouTube starts playing
You: *finishes yoga*
AI: "Show me a thumbs up!"
You: *thumbs up to camera*
AI: "Great job! Mission complete! "
How we built it
Tech Stack:
- Gemini Live API (gemini-2.5-flash-native-audio-preview) — Real-time audio/video streaming via WebSocket
- Next.js 15 — Frontend PWA with TypeScript
- FastAPI — Backend for token management and data storage
- Firebase — Auth + Firestore for user data
- Cloud Run — Serverless backend hosting
- Terraform — Infrastructure as Code
Key Implementation Details:
- Real-time Audio: PCM16 audio streaming through WebSocket, with AudioPlayer class for playback queue management
- Barge-in Handling: When user speaks, onInterrupted callback triggers AudioPlayer.clear() to immediately stop AI speech
- Video Recognition: Canvas captures camera frames → Base64 JPEG → sent to Live API for action verification
- Tool Calling: Defined play_youtube, complete_routine, skip_routine functions that AI can invoke based on conversation context
Challenges we ran into
- Gemini Live API is bleeding-edge — Documentation changed frequently, so we always checked the latest SDK docs before implementing any feature.
- Prompt engineering for consistent persona — Getting the AI to reliably act as a "friendly coach" and correctly trigger tool calls required many iterations.
- Real-time cost optimization — Live API streaming is expensive; we optimized by connecting only during active routines and throttling video frames.
- Rapid UI iteration — Used Stitch for AI-assisted UI design to quickly prototype and maintain a consistent design system.
- Tool Response Flow: After AI calls a tool, you must send sendToolResponse() back, or the conversation hangs. Took a while to debug this WebSocket flow.
- Ephemeral Token Management: Live API requires short-lived tokens. Built a backend endpoint to securely generate these without exposing API keys to the client.
Accomplishments that we're proud of
- True real-time conversation — Not turn-based chat, but actual flowing dialogue with interruption support
- AI that takes action — Tool calling makes it feel like a real assistant, not just a chatbot
- Vision verification works — AI actually recognizes actions like hand waves, book covers, made beds
- Built in 1 week — From idea to working demo, including infrastructure
What we learned
- Gemini Live API is powerful but requires careful architecture — WebSocket state management, audio buffering, and tool response handling all need attention
- Multimodal AI changes UX paradigms — When AI can see AND hear, the interaction model is completely different from text chat
- Persona matters — A friendly, encouraging coach persona makes users actually want to use the app
- Cost optimization is real — Real-time streaming APIs can get expensive fast. Need to be smart about when to connect/disconnect
What's next for SparkWake
- Push notifications — Wake-up alarms that launch directly into AI coaching session
- Weekly analytics — Gemini-powered insights on routine patterns and personalized recommendations
- Social features — Morning routine challenges with friends
- Multi-language support — Currently English-focused, planning Korean and more
- Connection pooling — Optimize Live API usage for longer routines without continuous streaming
Built With
- cloud-firestore
- fastapi
- firebase-authentication
- firebase-hosting
- gemini-2.5-flash
- gemini-live-api
- google-cloud-run
- next.js-15
- python-3.12
- react-19
- tailwind-css
- terraform
- typescript
- websocket
Log in or sign up for Devpost to join the conversation.