Inspiration

As two young adults in our early 20s, we obviously love karaoke. However, we found it to be quite inaccessible. We also realized there was no objective way to measure performance in typical karaoke platforms, making it impossible to track improvement or compete with friends meaningfully. We wanted to make karaoke much more accessible by letting anyone upload ANY song, while turning vocal performance into something quantifiable that could be improved upon and compared as a fun, skill-based competition. People love chasing the thrill of achieving a new high score, and now, with KaraokeJam, you can do just that while singing your heart out!

What it does

KaraokeJam transforms any audio file into a complete karaoke experience by automatically separating vocals from instrumentals, extracting lyrics with word-level timestamps, and analyzing the vocal melody. When you sing, the app captures your voice in real-time, compares your pitch to the original using semitone-based accuracy scoring, and awards points with instant feedback. The system features a neon-themed arcade interface with word-by-word lyric highlighting, live pitch display, and persistent high score tracking.

How we built it

The frontend uses React with TypeScript and the Web Audio API to capture microphone input and stream it via WebSocket for real-time pitch analysis. The FastAPI backend processes songs through Demucs for vocal separation, OpenAI Whisper for lyrics extraction, and Librosa for pitch detection. Supabase handles PostgreSQL database storage, audio file hosting, and authentication, while our scoring algorithm uses logarithmic semitone calculations to judge the accuracy of a singer's performance.

Challenges we ran into

The biggest challenge we faced was implementing real-time pitch detection with low enough latency for instant feedback—initially, the lag was over 500ms, making the experience feel disconnected from singing. Getting microphone data from the browser to the backend, processing it, and returning pitch analysis fast enough required extensive experimentation with buffer sizes, WebSocket configurations, and encoding methods. After significant optimization, we achieved sub-100ms round-trip latency using 4096-sample audio buffers encoded as base64 float32 arrays sent via WebSocket.

Accomplishments that we're proud of

We're incredibly proud of how seamless the entire system feels despite the complexity, AI vocal separation, speech-to-text transcription, pitch analysis, real-time WebSocket streaming, database management, and storage orchestration all working together flawlessly in under 3 minutes per song. The real-time pitch detection delivers sub-100ms latency while maintaining high accuracy, making the feedback feel genuinely responsive and game-like. We also feel proud of the scoring system balance, challenging enough to be competitive, yet forgiving enough that beginners can still gather some points.

What we learned

We gained deep experience in full-stack development, building a seamless integration between React's state management for real-time UI updates and FastAPI's async capabilities for WebSocket communication and background task processing. Working with specialized music processing libraries like Demucs, Whisper, and Librosa taught us to process music files using AI models with computational efficiency. We also learned how to handle database and storage management with Supabase, especially when coordinating metadata, authentication, and large audio files across multiple storage buckets while maintaining data integrity throughout the processing pipeline.

What's next for KaraokeJam

We plan to establish a public music library of pre-processed songs available to all users, eliminating wait times and ensuring consistent quality across the platform. We want to integrate more accurate lyrics sources and improve word-level timestamp precision for even tighter synchronization. Implementing global leaderboards for each song will let users compete worldwide and see how they stack up against the best performers. We also envision adding pitch visualization graphs, practice mode with adjustable speed, mobile apps, and social features like recording, playback, and sharing. This would transform KaraokeJam from a personal tool into a full-fledged competitive karaoke platform.

Built With

Share this project:

Updates