Inspiration

Music has always been a powerful way to heal and connect, yet many hold back due to fear of judgment or lack of skill. We built Songwich to tear down these barriers, creating a digital sanctuary where anyone can destress and freely express themselves. By blending the fun of karaoke with cutting-edge AI, we turn "just singing" into a confidence-boosting creative journey where anxiety is replaced by joy. Whether you're releasing energy on a classic track or flowing over a freestyle beat, Songwich proves that your voice deserves to be heard. It’s more than just a game—it’s a reminder that sometimes, the best therapy is simply picking up the mic.

What it does

Songwich offers a wide vairety of AI-powered multiplayer singing party games designed to make musical expression fun, accessible, and stress-free. It features three distinct game modes: a classic karaoke experience, a creative freestyle rap challenge with real-time scoring, and a unique "Blind Mode" where players sing over AI-generated lyrics and beats. The platform uses advanced AI to enhance user vocals, generate custom backing tracks, and even transcribe lyrics in real-time for scoring. Built with a vibrant, modern interface, Songwich transforms your browser into a personal recording studio where friends can compete, collaborate, and create music together.

How we built it

Frontend:

  • React - Used to build all the interface on the webpage.
  • Typescript - Enforced type safety.
  • Tailwind - Styled the application with a modern, responsive, and "dark mode" aesthetic.
  • Socket.IO Client - Listens for real-time global states, syncs timers, and updates voting results instantly across clients.
  • Vite - Used for lightning-fast Hot Module Replacement (HMR) and optimized building.
  • Framer Motion - Powered the clean page transitions and micro-animations.

Backend:

  • Elevenlabs - Used for voice changing/cloning, beat creation, and song transcription leveraging the speech-to-text feature to obtain precise word timestamps.
  • Gemini - Generates random lyrics and fallback method for beats generation.
  • Google Speech Recognition - Used in conjunction with elevenlabs voice-to-speech feature for word recognition.
  • Flask - The server handling API routes and game logic orchestration.
  • Flask-SocketIO - Manages specific room channels, real-time messaging, and low-latency game state synchronization.
  • FFmpeg & Pydub - Handles the heavy lifting of audio processing: splicing vocals, mixing tracks, and format conversion.

Challenges we ran into

Multiplayer State Synchronization: Managing the delicate balance between real-time responsiveness and server consistency was difficult. We differentiated between "hard sync" (global game phases) and "soft sync" (individual player interactions) to prevent network jitter from ruining the gameplay experience.

Resilient AI Fallback Pipelines: External APIs like ElevenLabs and Gemini are prone to rate limits or latency. We engineered a "waterfall" fallback system that attempts highly dynamic generation first, then simplified generation, and finally defaults to hardcoded assets, ensuring the game is always playable.

Audio State Management & Autoplay Policies: Browsers aggressively block unprompted audio, which initially broke our seamless game starts. We developed a reactive audio engine that detects autoplay blocks and instantly presents a manual interaction trigger, guaranteeing the music always plays.

Mode 1 (Classic Karaoke): Phase Desynchronization. Players finishing songs at different times caused race conditions where the lobby would split phases. We solved this by implementing server-authoritative "waiting rooms" that hold early finishers until the backend confirms all clients are ready to transition.

Mode 2 (Blind Mode): Audio State & Autoplay Management. Coordinating AI-generated beats with browser security policies was difficult; audio would sometimes fail to play or loop unexpectedly. We built a reactive audio engine that tracks "is_playing" states globally and provides manual override triggers if the browser blocks the initial autoplay.

Mode 3 (Freestyle Rap): Network Latency vs. Scoring. The delay between a user speaking and the server analyzing the audio meant valid rhymes were often marked as "missed." We added a 5-second "Look-Ahead Buffer" to the detection logic, allowing the server to credit users for words spoken slightly before or after the visual target window.

Accomplishments that we're proud of

It's Actually Fun: We didn't just make a tech demo; we made something we want to keep playing with our friends. It has that chaotic "late-night Discord" energy we were aiming for.

Taming Audio (DSP): First time using FFmpeg! We went from knowing nothing about audio streams to building a full pipeline that stitches vocals, mixes beats, and adds effects on the fly.

Invisible AI: The AI features don't feel like gimmicks. Generating a beat or lyrics feels like a natural part of the game loop, not just a "look what we can do" button.

It Looks Good: We put a lot of love into the UI. The glassmorphism and smooth animations make it feel like a polished product, not just a hackathon prototype.

What we learned

  • Real-Time Game Networking: We gained deep hands-on experience with WebSockets (Socket.IO), learning that "real-time" isn't magic—it requires careful state management, event broadcasting, and latency handling to make multiple clients feel perfectly synchronized.

  • Modular Architecture for Speed: Working under a deadline taught us the value of clean, modular code. By decoupling our frontend components (like RecordingStage vs. VotingPhase), we could build features in parallel without breaking each other's work or causing massive merge conflicts.

  • The Power of Polish: We learned that good design is a feature. Spending extra time on smooth Framer Motion transitions and a cohesive design system made the app feel significantly more responsive and trustworthy than if we had left it "janky but functional."

What's next for Songwich

"Logistical" Ideas

  • Mobile app
  • Training mode where you change your pitch
  • Global leaderboards for best rendition of a song
  • We plan on publicly releasing the game after adding some more gamemodes, fine tuning some functionality, thinking through the logistics of the API usages etc.
  • Create an automatic pipeline that transcribes songs using Elevenlabs and populates the song data into a database.

"Feature-related" Ideas

  • Custom song integration where you can turn whatever song you wish to local song and play through it.
  • Custom voice models. (i.e. you can have a game with friends where you choose different politician voice models and decide on whoever sounds the best)
  • More gamemodes for diversity (i.e. band mode where you can sing as different AI band members and even as the instruments)

Built With

Share this project:

Updates