Inspiration
Most AI music tools start with a text prompt. But that is not how real producers create music.
This idea came from our own experience trying to create music, and from watching close friends go through the same thing. Starting a beat or melody is exciting, but finishing it is much harder when you do not have access to mentors, engineers, expensive production courses, or experienced people who can give useful feedback. A lot of independent creators have ideas with real potential, but limited resources make the process slower, more confusing, and more frustrating than it should be.
Professional producers often have mentors, bandmates, engineers, and A&R teams helping them shape unfinished ideas into polished songs. Independent creators usually have to figure everything out alone.
That inspired us to build TrackSmith, an agentic AI music production assistant that lives inside a producer’s workflow. Instead of replacing creativity, TrackSmith supports it. It listens to music the user has already made, understands the track, gives feedback, generates ideas, separates stems, and writes results back into the production session.
Our goal was simple: give independent creators the kind of creative feedback loop that professional producers already have.
What it does
TrackSmith is an agentic AI music production assistant that lives inside a producer’s Digital Audio Workstation.
A user can drag in an audio or MIDI file, such as a beat, loop, sample, or unfinished idea. TrackSmith listens to the track, analyzes it, gives real-time creative feedback, generates musical ideas, separates stems, and brings the output back into the user’s workflow.
The system is designed to support the artist instead of replacing them. It helps producers understand what their track needs next, hear possible directions, and continue building on their own creative idea.
TrackSmith’s pipeline includes:
- Track analysis for BPM, key, chord progression, pitch movement, energy, and structure
- NVIDIA Nemotron reasoning to understand what those musical features mean creatively
- GWEN validation to check whether suggestions are musically useful
- MusicGen generation to create audible previews of new ideas
- Demucs stem separation to split tracks into drums, bass, melody, and harmony
- DAW workflow integration so the output can be used inside the producer’s actual session
How we built it
We built TrackSmith as a multi-stage agentic pipeline where each model has a specific job.
The user begins by uploading or dragging in an audio or MIDI file. From there, TrackSmith analyzes the musical structure of the file, including BPM, key, chord progression, energy, pitch contour, and how the track changes over time.
We use NVIDIA Nemotron as the reasoning layer. Instead of only detecting musical features, Nemotron helps interpret what those features mean together. For example, it can reason about the mood, tension, and direction of a fast minor-key track instead of only labeling its tempo and key.
After that, the analysis passes through GWEN, our internal validation layer. GWEN checks whether the analysis is coherent and whether the suggestions would actually make sense to a producer.
Once validated, the pipeline moves into MusicGen, which generates an audible preview of where the track could go next. We also integrated Demucs for stem separation, allowing users to break the track into individual layers like drums, bass, melody, and harmony.
We also built the system with security in mind using NemoClaw. TrackSmith handles unreleased music, samples, and project files, so the agent should not have uncontrolled access to the user’s computer. NemoClaw lets us define strict policy rules for filesystem access, network access, and execution. The agent can read and write only from approved locations and call only the local endpoints it needs.
Challenges we ran into
One of the biggest challenges was making multiple models work together as one coherent pipeline. Analysis, reasoning, validation, generation, and stem separation all produce different types of output, so we had to make sure each stage passed useful information to the next.
Latency was another major challenge. Music generation and source separation are computationally expensive. On consumer hardware, this type of workflow can take several minutes, which breaks the creative flow. A producer needs feedback quickly, otherwise the tool stops feeling like a collaborator and starts feeling like a slow batch processor.
We also had to think carefully about privacy and trust. Producers are protective of unreleased music, and for good reason. If an AI tool requires them to upload private tracks to a random cloud server, many would not use it. That pushed us to design TrackSmith around local execution, controlled access, and auditability.
Another challenge was explaining the product clearly. For musicians, TrackSmith is simple: drop in a track and get useful creative help. For technical judges, the value is the full agentic workflow: Nemotron reasoning, GWEN validation, MusicGen generation, Demucs separation, and NemoClaw-controlled execution.
Accomplishments that we're proud of
We are proud that TrackSmith is not just a prompt-to-music toy. It works with music the user has already made and helps them continue their own idea.
We are also proud of building an end-to-end pipeline where each step has a clear purpose. Nemotron analyzes and reasons, GWEN validates, MusicGen generates, Demucs separates stems, and NemoClaw keeps the agent inside safe boundaries.
Another accomplishment is that TrackSmith focuses on a creative domain that agentic AI often ignores. A lot of agentic AI projects are built around enterprise workflows like tickets, code review, documents, or operations. We wanted to apply agentic AI to something more personal and creative: helping independent artists finish music.
We are especially proud of the security design. OpenClaw gives an agent the ability to act, but NemoClaw makes those actions safe enough for a real studio. TrackSmith can assist the producer without gaining uncontrolled access to their files, system, or network.
What we learned
We learned that music production is not just about generating audio. A useful AI music assistant needs context, reasoning, validation, and control.
Generation by itself is not enough. Producers need feedback that understands what they already made. They need suggestions that fit the track. They need editable outputs like stems. And they need to trust that their private work stays protected.
We also learned how important speed is in creative tools. If a model takes too long, it interrupts the artist’s flow. Hardware acceleration is not just a technical improvement here, it changes whether the product feels usable.
Most importantly, we learned that the best creative AI tools should not take over the process. They should help artists stay inside it.
What's next for TrackSmith
Next, we want to make TrackSmith even more deeply integrated into DAWs like FL Studio, Ableton, and Logic Pro.
We want the assistant to become more conversational and context-aware, so producers can ask questions like:
- “How can I make this chorus hit harder?”
- “What should I add after this drop?”
- “Can you separate the drums and give me a variation?”
- “Why does this loop feel repetitive?”
- “How can I make this sound more cinematic or more club-ready?”
We also want to improve genre-specific feedback, real-time arrangement suggestions, and more controllable generation, so producers can guide the AI without giving up creative control.
Long term, TrackSmith could become a full creative assistant for independent artists: secure, local, fast, and built directly into the way music is actually made.
Built With
- asus-dgx10
- cuda
- demucs
- fl-studio
- javascript
- midi.js
- musicgen
- nemoclaw
- nemotron
- nvidia
- openclaw
- python
- react
- typescript
- yaml
Log in or sign up for Devpost to join the conversation.