Inspiration

One of our teammates produces music, and the rest of us always said we wanted to learn but never really knew how to start. Every time we opened a DAW, it felt overwhelming. Too many knobs, too many tracks, too many places to click. We had ideas, we had melodies in our heads, but the actual process of turning that into a song felt impossible.

We kept coming back to the same feeling: music shouldn’t feel inaccessible. It shouldn’t require years of technical training just to express a creative idea. We wanted something that gave people momentum instead of stopping them at the starting line.

That idea became the inspiration for this project. AI shouldn’t take over the creative process, it should support it, guide it and help people develop their own artistic voice.


What it does

Mixr acts like a co-producer. You start by talking to an AI agent about your ideas, genre, mood and references. If you have a track you like, you can upload it. The system analyzes it and extracts BPM, key, scale, energy, and genre using Essentia.js and TensorFlow.

From that conversation and the reference analysis, the agent creates a SongSpec, which becomes the blueprint for your track. Then you move into the Studio:

  • It auto-creates a DAW session with tracks, instruments, effects and structure.
  • You can generate MIDI or audio loops.
  • You can type natural language commands like “make the chords more melancholic” or “swap the drums to a trap kit”.
  • You can ask for production advice, arrangement tips, EQ suggestions or mix help.

It gives you a real starting point and a real workflow, without having to know everything beforehand.


How we built it

We built the front end using Next.js 15, React, Tailwind CSS and ShadCN UI, and used Tone.js to power synthesis, scheduling and audio playback directly in the browser.

For audio analysis, we used:

  • Essentia.js (WASM) to extract tempo, key, loudness, energy and timbre features
  • TensorFlow.js MusiCNN to classify genre and mood

The core brain of the system is a Google Gemini Agent, orchestrated with LangChain and LangGraph. It maintains state, handles audio insights, writes and reads SongSpec JSON files, and responds to natural language changes in the session.

For generative audio samples, we integrated ElevenLabs.

We designed the flow so the conversation naturally transitions into production, and the AI acts as a collaborative assistant rather than a replacement producer.


Challenges we ran into

  • Running both Essentia.js and TensorFlow.js in the browser created performance and memory challenges, especially with larger audio files.
  • Getting reliable tempo and key detection across very different songs required fallback logic and manual overrides.
  • Mapping natural language like “more warm” or “more melancholic” into actual production changes took a lot of iteration.
  • Integrating multiple AI tools while keeping latency low was harder than expected.
  • Designing a UI that feels like a DAW but stays simple enough for newcomers was a balancing act.

Accomplishments that we're proud of

  • We built a complete end-to-end flow: chat with an agent, analyze a reference track, generate a SongSpec, and automatically scaffold a full studio session.
  • We made music production feel approachable, even to team members who always wanted to try but didn’t know where to start.
  • We proved that a browser-based DAW with real analysis and generative tools is possible.
  • We created a system where creativity comes first and the technical work follows naturally.

What we learned

  • How to integrate advanced audio DSP and ML models directly in a web environment.
  • How to structure an AI agent that feels like a collaborator rather than an answer machine.
  • How hard it actually is to translate feelings into musical structure — and how rewarding it is when you get it right.
  • How much detail goes into building even the simplest parts of a DAW.
  • That lowering the barrier to entry doesn’t just help beginners, it inspires everyone in the room.

What's next for Mixr

  • Multi-track reference analysis (upload multiple songs and blend their features).
  • More AI-driven arrangement tools like automatic transitions, breakdowns and drops.
  • Deeper mixing tools that suggest EQ moves, compressors, spatial effects and gain staging.
  • Collaboration mode so multiple users can co-produce in the same session.
  • Exporting to professional DAWs like Ableton, FL Studio or Logic.
  • Mobile version for sketching ideas on the go.
  • A library of preset “production styles” created by real producers.

Built With

  • elevenlabs-api
  • essentia.js
  • github
  • google-gemini-api
  • javascript
  • json
  • langchain
  • langgraph
  • local-filesystem
  • lucide-react
  • musicnn
  • next.js
  • node.js
  • npm
  • react
  • shadcn-ui
  • tailwind-css
  • tensorflow.js
  • tensorflow/tfjs-node
  • tone.js
  • typescript
  • vercel
  • wasm
  • web-audio-api
Share this project:

Updates