Inspiration
We were inspired by the gap between musical creativity and production barriers. Many people have melodies in their heads but lack the tools, knowledge, or time to turn them into finished songs. We wanted to democratize music creation—allowing anyone to hum a tune and instantly get a professionally produced track back.
What it does
UnShazam lets users:
- Record 5-8 seconds of humming, singing, or beatboxing
- Have the app analyze musical features (tempo, key, rhythm, energy, melodic contour)
- Generate genre-matched lyrics based on a theme they provide
- Create a full 30-second song with AI-generated vocals and instrumentation
- Generate Neo-Retro style album artwork to accompany the track
- Play back, preview, and download their creation
How we built it
We architected this as a React + Vite frontend application with a sophisticated API orchestration layer:
- Audio Capture & Analysis: Used Web Audio API's AnalyserNode to extract musical features from the recorded audio
- Lyrics Generation: Sent audio analysis + genre + theme to Claude API to generate contextual lyrics
- Music Generation: Combined all audio features and lyrics into a detailed prompt for ElevenLabs Music API
- Artwork Generation: Used Claude to create an image prompt from the lyrics, then sent to Stable Diffusion for Neo-Retro style album covers
- State Management: Built a state machine managing the generation pipeline (RECORDING → ANALYZING → GENERATING_LYRICS → GENERATING_MUSIC → GENERATING_ARTWORK → COMPLETE)
Challenges we ran into
- Real-time Audio Analysis: Accurately extracting tempo and key signatures from short, noisy recordings required implementing beat detection and pitch analysis algorithms
- API Coordination: Managing sequential and parallel API calls with proper error handling and retry logic across multiple services was complex
- Quality Control: Ensuring generated lyrics matched the intended genre style and that the music generation reflected the audio analysis required careful prompt engineering
- Browser Audio Limitations: Working within Web Audio API constraints while maintaining responsive UX during generation steps
- State Synchronization: Keeping UI in sync with the async generation pipeline without race conditions
Accomplishments that we're proud of
- End-to-End Automation: Created a seamless pipeline from voice input to professional-sounding output without user intervention between steps
- Intelligent Feature Extraction: Implemented music theory-aware audio analysis that translates raw audio into meaningful musical descriptors
- Smart Prompt Engineering: Designed prompts that effectively communicate nuanced musical and visual requirements to AI models
- Responsive User Experience: Built intuitive UI with clear visual feedback for each generation step
- Versatile Output: The system works across multiple genres (Lo-fi Hip Hop, EDM, Jazz, Rock, etc.) and adapts to different musical inputs
What we learned
- Audio processing in the browser is surprisingly powerful with Web Audio API, but accuracy requires algorithmic sophistication
- Effective AI generation depends heavily on prompt quality and specificity
- Building a multi-service orchestration system requires careful state management and error handling
- The "reverse Shazam" concept opens up interesting possibilities for creative tools powered by AI
- User experience matters as much as technical capability—clear feedback during long-running operations is crucial
What's next for UnShazam
- Cloud Processing: Implement a backend service for more advanced audio analysis and generation, enabling higher-quality output
- Collaborative Features: Allow users to save, share, and remix songs within a community platform
- Customization Controls: Let users fine-tune generation parameters (BPM, energy level, vocal style, artwork theme)
- Real-time Genre Adaptation: Expand genre library and implement intelligent genre recommendations based on hummed melody
- Social Integration: Add sharing to social media with one-click export and embedded playback
- Model Fine-tuning: Train custom models on user preferences for personalized music generation
- Multi-language Lyrics: Support lyric generation in multiple languages
- Live Performance Mode: Enable real-time music generation during live sessions or improvisations
Built With
- claudeapi
- elevenlabs
- javascript
- mongodb
- node.js
- stablediffusion
- vite
- webaudioapi
Log in or sign up for Devpost to join the conversation.