Cringe Alert - AI Music Performance Coach
Inspiration
Let's be honest. I'm a programmer, not a guitarist. I picked up guitar for one reason: to impress girls. I learned everything from YouTube, and I'm not about to spend money on a real coach just to post a 60-second cover on Instagram.
But here's the problem: I'd record myself playing, watch it back, and know something was off... but not what. Was it my pitch? My timing? My strumming? No idea.
I just wanted one simple thing: to know if my cover was good enough to post, or if I'd embarrass myself in front of my crush.
Then I saw the Gemini 3 hackathon. I learned that Gemini 3 can understand video at a spatial-temporal level, not just "there's a guitar" but "at 0:47, your wrist tensed and your chord buzzed." That's when it clicked: I could build an AI that watches my cover the way a real teacher would, pointing at exact moments and telling me how to fix them.
That's Cringe Alert.
What it does
Cringe Alert is a feedback-card-driven coaching loop powered by Gemini 3:
- Upload a cover video (guitar, vocals, or both)
- Gemini 3 Pro analyzes your performance and identifies the song, scores you 0-100 (is it cringe to post or not!), and generates specific feedback cards (pitch issues, timing problems, chord mistakes) with timestamps
- Fix issues one by one - each card has a "Fix this" button. Record a short clip targeting that issue, and Gemini 3 Flash judges whether you nailed it
- AI Coach guides you in real-time - a proactive Gemini 3 Flash coach connects via WebSocket, highlights your worst issues, seeks the video to problem spots, and cheers you on as you fix them
- Record a final take - Gemini 3 Pro compares your original vs. final performance and delivers a score improvement summary + an Instagram-worthiness verdict
Gemini 3 Integration
Gemini 3 is the entire brain of this app,
Gemini 3 is the entire brain of this app.
Gemini 3 Pro handles video analysis via the Files API with thinking mode enabled for deeper reasoning and Google Search grounding to identify the song being covered. It generates structured JSON feedback with timestamps, categories (guitar/vocals/timing), severity levels, and actionable fixes. Native thought signatures are captured and stored per analysis. Pro also powers the final comparison receiving the original performance context and judging improvement.
Gemini 3 Flash powers two real-time features:
- Fix evaluation: watches short fix clips with thinking enabled and judges whether a specific issue was resolved, returning structured verdicts with tips
- AI Coach chat: a multi-turn streaming conversation over WebSocket with function calling (5 UI-control tools). The coach maintains full conversation history and proactively guides users through feedback items — seeking to timestamps, highlighting cards, and opening fix modals
How I built it
- Backend: FastAPI (Python) handling video upload to Firebase Storage, format conversion (WebM to MP4), and streaming Gemini API calls via the google-genai SDK
- Frontend: React + TypeScript + Vite with Zustand for state management, TailwindCSS for UI, and TanStack Query for data fetching
- Storage: Firebase Storage for video blobs, Firestore for session persistence with full fix history
- Real-time: Server-Sent Events for streaming analysis results, WebSocket for bidirectional coach chat with tool calling
Note: This is a proof-of-concept built for the Gemini 3 hackathon. The architecture (temp file downloads, single-instance processing, simplified auth) is designed to demonstrate the concept, not for production scale.
Challenges
- Video format handling : Gemini seems to handle timestamps properly with MP4 file but browsers record WebM. I built a server-side conversion pipeline with ffmpeg that runs before every analysis call.
Gemini Live API pivot : Originally, I wanted users to communicate with the coach via voice using the Gemini Live API. But I hit persistent bugs with tool calling (likely because Live only have Gemini 2.5 available). After burning hours debugging, I made the call to pivot: Gemini 3 Flash over WebSocket with text-based function calling. It ended up being more reliable and gave me finer control over the conversation flow.
Session persistence - Restoring a full session (videos with expired signed URLs, feedback items with fix statuses, comparison results) required regenerating fresh download URLs from stored blob names on every load.
What I learned
Gemini 3's multimodal video understanding is genuinely impressive. It can identify songs, pinpoint timing issues at specific seconds, and distinguish between guitar and vocal problems. The combination of Pro for deep analysis and Flash for fast interactive feedback creates a coaching loop that actually feels responsive and useful.
Built With
- fastapi
- firebase
- firestore
- gemini-3-flash-preview
- gemini-3-pro-preview
- google-cloud-run
- google-genai
- python
- react
- sse
- tailwindcss
- typescript
- vite
- websocket
- zustand
Log in or sign up for Devpost to join the conversation.