MixGenie: A Mixing Assistant for Audiotools

Inspiration

Most assistive mixing systems are web-based, requiring users to submit tracks and receive a mix, which lacks control and an effective workflow. We want to integrate one that applies gain and pan settings per track to create a demo mix into a DAW. We chose Audiotools for our project and use the recently released SDK for the task.

Impact

Democratises music production by giving creators an AI mixing assistant within their workflow that reduces friction and speeds up the process.

How we built it

This script automates the process of applying audio mixing settings (post-gain and panning) to the first eight channels of an Audiotool project. It uses an ONNX AI model (AImix_model.onnx) to generate these settings based on audio samples.

The biggest challenge in our case is that the system needs to access all tracks and apply back effects to all the tracks. Hence, a typical VST solution for one track wouldn't have worked in our case. We list our approach and some of the solutions to circumvent SDK limitations here.

1. Polyfill `localStorage`

Since the script runs in a Node.js environment, which lacks the browser's localStorage object, a simple in-memory mock is created. This is necessary to satisfy the Audiotool SDK, which may have a dependency on localStorage.

2. Environment Variable Loading

The script robustly loads configuration from a .env file.

It searches for the .env file in both the current working directory and the script's parent directory.
It uses the dotenv library for initial loading.
A custom function, manualEnvInject, provides a fallback to parse the .env file manually. This function is designed to handle various text encodings and formats, ensuring that variables like AT_PAT (Audiotool Personal Access Token) and AT_PROJECT_URL are loaded correctly.

3. Argument Parsing

This section prepares the parameters required for the script's operation.

It sets a TARGET_COUNT of 8, meaning it will only modify the first eight mixer channels.
Helper functions are used to parse comma-separated strings from environment variables into arrays of numbers. These functions are tolerant of different delimiters and whitespace.
It retrieves the AT_PAT, AT_PROJECT_URL, and an optional AT_NEXUS_MODULE path from the environment variables and validates that the required ones are present.

4. Audiotool SDK Loading and Project Connection

This part of the script handles connecting to the Audiotool platform.

discoverAudiotoolCandidates: This function intelligently locates the installed Audiotool SDK (@audiotool/nexus) by checking common package names and the project's package.json.
loadAudiotoolModule: It imports the discovered SDK module.
openSyncedProject: It establishes a connection to the Audiotool project specified by AT_PROJECT_URL. The function is compatible with multiple versions of the SDK's API for authentication and project synchronisation.

5. Applying the Mix

This is the core logic where the script modifies the Audiotool project.

applyMixToFirstEight: This function queries the project for all mixer channels, sorts them by their index, and selects the first eight.
It then iterates through these channels and applies the postGain and pan values generated by the AI model.
The modifications are performed using the nexus.modify function, which ensures changes are synced with the Audiotool server.
The script includes detailed logging for each step, indicating which channel is being modified and the values being applied.

6. Main Process

The main function orchestrates the entire workflow:

It reads the audio sample file paths from the downloads directory.
It calls the runModel function (from run-model.ts), passing the AI model, a music genre, the audio file paths, and a list of instruments. This function returns the POSTGAINS and PANS arrays.
It loads the Audiotool SDK and connects to the project.
It calls applyMixToFirstEight to apply the generated settings.
Finally, it ensures the connection to the project is properly closed.

The script is executed by calling main(), with top-level error handling to catch any exceptions that may occur during the process.

We train the AI model from https://github.com/csteinmetz1/automix-toolkit/tree/main on MEdleyDB and mixing secrets research datasets. Once it is trained, we generate an ONNX checkpoint. This enables easy integration with JavaScript and is much smaller in size.

Challenges we ran into

Adapting backend and frontend

We first found it difficult to keep track of the order in which samples appear in the project because of the API limitations, and also to import and track instrument names.

Accomplishments that we're proud of

We successfully developed the backend by night and had a great team with varied skills who could each take on independent tasks.

What we learned

We learnt how to work with the Audiotools API and discovered that we ended up being a very good team to work with.

What's next for Mixing Assistant

We want to add support for text conditioning within the model and UI. We also want to expand the backend to dynamically support any number of tracks (limitation per the model to 16). Further, a bit more challenging, we would like to support more audio effects beyond gain and panning.