SparkWake

main
link
profile
main
result
result
video

Inspiration

I've always struggled with morning routines. Every night I'd set ambitious goals — yoga, stretching, reading — but every morning I'd hit snooze and scroll my phone instead.

The problem wasn't motivation. It was accountability. Doing routines alone is just too hard.

That's when I thought: What if I had an AI coach who actually talks to me, sees what I'm doing, and keeps me on track? Not a silent app with notifications I ignore, but a real conversational partner.

When I discovered Gemini Live API with real-time audio AND vision capabilities, I knew this was possible.

What it does

SparkWake is an AI morning coach that:

Talks with you in real-time — natural voice conversation, not robotic commands Sees you through your camera — verifies you actually did the routine Takes action — plays YouTube videos, marks routines complete, all through voice commands Supports interruption — you can cut off the AI anytime (barge-in), just like talking to a real person Example flow:

AI: "Good morning! Ready for stretching?"
You: "Actually, can you play a yoga video?"
AI: "Sure!" → YouTube starts playing
You: *finishes yoga*
AI: "Show me a thumbs up!"
You: *thumbs up to camera*
AI: "Great job! Mission complete! "

How we built it

Tech Stack:

Gemini Live API (gemini-2.5-flash-native-audio-preview) — Real-time audio/video streaming via WebSocket
Next.js 15 — Frontend PWA with TypeScript
FastAPI — Backend for token management and data storage
Firebase — Auth + Firestore for user data
Cloud Run — Serverless backend hosting
Terraform — Infrastructure as Code

Key Implementation Details:

Real-time Audio: PCM16 audio streaming through WebSocket, with AudioPlayer class for playback queue management
Barge-in Handling: When user speaks, onInterrupted callback triggers AudioPlayer.clear() to immediately stop AI speech
Video Recognition: Canvas captures camera frames → Base64 JPEG → sent to Live API for action verification
Tool Calling: Defined play_youtube, complete_routine, skip_routine functions that AI can invoke based on conversation context

Challenges we ran into

Gemini Live API is bleeding-edge — Documentation changed frequently, so we always checked the latest SDK docs before implementing any feature.
Prompt engineering for consistent persona — Getting the AI to reliably act as a "friendly coach" and correctly trigger tool calls required many iterations.
Real-time cost optimization — Live API streaming is expensive; we optimized by connecting only during active routines and throttling video frames.
Rapid UI iteration — Used Stitch for AI-assisted UI design to quickly prototype and maintain a consistent design system.
Tool Response Flow: After AI calls a tool, you must send sendToolResponse() back, or the conversation hangs. Took a while to debug this WebSocket flow.
Ephemeral Token Management: Live API requires short-lived tokens. Built a backend endpoint to securely generate these without exposing API keys to the client.

Accomplishments that we're proud of

True real-time conversation — Not turn-based chat, but actual flowing dialogue with interruption support
AI that takes action — Tool calling makes it feel like a real assistant, not just a chatbot
Vision verification works — AI actually recognizes actions like hand waves, book covers, made beds
Built in 1 week — From idea to working demo, including infrastructure

What we learned

Gemini Live API is powerful but requires careful architecture — WebSocket state management, audio buffering, and tool response handling all need attention
Multimodal AI changes UX paradigms — When AI can see AND hear, the interaction model is completely different from text chat
Persona matters — A friendly, encouraging coach persona makes users actually want to use the app
Cost optimization is real — Real-time streaming APIs can get expensive fast. Need to be smart about when to connect/disconnect

What's next for SparkWake

Push notifications — Wake-up alarms that launch directly into AI coaching session
Weekly analytics — Gemini-powered insights on routine patterns and personalized recommendations
Social features — Morning routine challenges with friends
Multi-language support — Currently English-focused, planning Korean and more
Connection pooling — Optimize Live API usage for longer routines without continuous streaming

Built With

cloud-firestore
fastapi
firebase-authentication
firebase-hosting
gemini-2.5-flash
gemini-live-api
google-cloud-run
next.js-15
python-3.12
react-19
tailwind-css
terraform
typescript
websocket

Updates

Minji Kim started this project — Mar 15, 2026 03:41 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.