Inspiration
We have always been fascinated by synesthesia—the phenomenon where experiencing one sense involuntarily triggers another. We wanted to create an experience that lets anyone "see" their music and instantly visualize the mood of their environment, and following that thought, we questioned how this could be applied to help deaf people. The project also focuses on lowering the barrier of entry into music creation, allowing for users to be able to describe a song concept and create it even without being able to hear. Then, the user can examine their own song snippets to learn about how the song structure works and recognize the tempo.
How we built it
Our project is a full-stack application built using React and TypeScript, focusing on two primary engines:
- Audio Spectogram Visualizer: The flagship feature runs entirely on the client side using the Web Audio API. We process audio in real-time from three input modes: file uploads, speaker/tab capture, and microphone input. The engine extracts 25 distinct audio features, such as energy, brightness, tempo, Zero-Crossing Rate (ZCR), and Root Mean Square (RMS). These features are then graphed in different colors to signify the various parts of the audio.
- AI-Driven Music Generation Pipeline: We built a multi-step automated backend pipeline. When a user describes a mood, we use Vercel AI SDK to generate a song blueprint and per-section plans. This blueprint is sent to the ElevenLabs API to synthesize the actual audio.
- Backend & Storage: Once generated, the audio file is securely uploaded via UploadThing, and the track's metadata is persisted in a Neon PostgreSQL database managed by Drizzle ORM.
The challenges we faced
- Real-Time Digital Signal Processing (DSP): Extracting 25 features concurrently at 60 FPS without dropping frames was computationally intensive. We had to highly optimize our Web Audio API audio worklets.
- Pipeline Orchestration: Connecting multiple asynchronous services required robust error handling to ensure users wouldn't end up with dead audio files or broken database records.
What we learned
- Advanced Audio Analysis: We learned how to deeply analyze frequency bins and time-domain data using the Web Audio API, gaining a practical understanding of concepts like spectral flatness, tonnetz, and dynamic range.
- AI Tool Chaining: We discovered how to effectively chain different AI models together—using an LLM to "direct" an audio-generation model yields much more structured and coherent musical results than prompting the audio model directly.
Built With
- canvas-api
- drizzle-orm
- elevenlabs
- neon-postgresql
- openai
- playwright
- react
- scss-(openprops)
- typescript
- uploadthing
- vercel
- vercel-ai-sdk
- web-audio-api
- webspatial
Log in or sign up for Devpost to join the conversation.