soundslike

✨ Inspiration

For every music producer, game developer, or video editor, there's a universally shared pain point: the endless hunt for that "perfect" sound. The one you can hear in your head—a specific "whoosh," a "thud," or a "bleep"—but can't find by digging through countless folders or typing keyword after keyword online.

We believed the solution was within us. The person who knows the sound best is you, and the best tool to express it is your voice. What if you could simply attempt to imitate what you're thinking of with your voice, and AI could match it to real sounds from existing sound collections?

What it does

soundslike is a desktop app (and website: thatsoundslike.me) that turns your voice into an audio search engine. It's incredibly simple:

Imitate: Open the app and imitate the sound you're looking for into your microphone.
Discover: soundslike instantly analyzes your voice and finds the most similar sounds from a massive library. No more manual searching!

🛠️ How we built it

We believe in the power of AI to enable new applications and lower the barrier of entry to music production. However, we've been concerned about two things:

There has been an overfixation on typed language as the only way to interface with AI.
There has been a trend of opaqueness with regard to the training data of AI and the consideration of the interests of musicians.

We seek to develop a solution that 1) is more appropriate and natural in the realm of music production: the human voice, and 2) respects the privacy of the user and the licensing of audio data.

Two main parts comprise soundslike:

AI processing in the frontend: We adapted our award-winning query-by-vocal-imitation model to a portable, lightweight format fast enough to run on any platform, including on the browser. This allows us to extract a vector from the user recording on-device, without having to process the voice on a server. The vector contains information about what the user might be imitating, without containing characteristics that might reveal their identity.
Similarity Search in the backend: We indexed tens of thousands of sound vectors from the Freesound FSD50K dataset, containing permissively licensed, user-submitted sounds and their corresponding licenses. Our backend handles searching for the vectors that are most similar to the one submitted by the user. It then returns Freesound URLs, which are displayed for the user to listen to and, potentially, download for their own use. Optionally, the user can decide to submit anonymized feedback about the quality of the retrieved sounds, which helps us improve the model used!

🧑‍🏫 What we learned

We learned a lot of things! As researchers, we were quite unfamiliar with all the complexities of getting an app deployed. Not only are web technologies very complex, there are many variables that we didn't even know could be an issue, like the delay for engaging a microphone in the browser, how noisy recordings can be, and how many fallbacks were needed to make sure the app is functional in real circumstances.