When we heard about the capabilities of AssemblyAI, we immediately started thinking about applications that do not support voice commands. We thought it would be really cool to take an application and allow for you to control it with your voice to make it more accessible.

What it does

Hey Spotify is a new front end that runs in addition to Spotify that allows you to control it with your voice. You can play songs, playlists, skip and go back in your song queue, add songs to your queue, etc. Essentially, it allows one to use Spotify in a much easier way by just using your voice. It also provides a clean easy to read interface that should be much more accessible to everyone versus the cluttered and small design featured in Spotify.

How we built it

We built it completely in C# using WPF for the front end. For the speech to text, we used AssemblyAI's real-time streaming transcription which required us to hook in with WebSockets. We also used NAudio in order to get a stream from the microphones in order to send audio in realtime to AssemblyAI for processing. Finally, we wrote up quite a few commands that allowed us to hook into Spotify's REST API so that we could control and manipulate spotify using our voice. This also required us to come up with ways to parse text given intelligently so that our system could reliably respond to different forms of text.

Challenges we ran into

Getting hooked into the AssemblyAI api was difficult at first. Because we did it in C# which had no documentation for real time, we had to figure it out ourselves. This involved hooking in with WebSockets and setting the correct data. Another issue was getting the realtime microphone data. For this, we had to use NAudio and set the correct audio rate, channels, etc. in order to get buffers that we then encoded and properly packaged up as JSON for AssemblyAI to process.

Also, when recording the video, it turned out that recording takes a lot of resources on my laptop. So, the application is actually slower in the video than when it runs real-time (which is pretty much instant). So, we spent as much time as possible reducing the impact of video recording in order to show off our product closer to what is real.

Accomplishments that we're proud of

We were really proud of our overall algorithm for processing the text given to us by AssemblyAI. While not shown in the video due to time constraints, it can handle quite different forms of input and respond reliably. Also, we were really proud of our algorithm for figuring out what song you wanted when things went wrong. This algorithm could take a list of songs we thought might be it, and generally the first result wasn't always right. So, we wrote some code that added tolerance to help figure out what you really wanted ("This means that saying "play around the world by that funk" would still get you the right song even though you said "that funk" instead of "daft punk").

We were also really proud of the UI and interaction. Everything is really clean and very accessible which is what we sought out to do.

Finally, we were really proud that we only listen for input when requested (I.e. we are not always recording). This is a major privacy concern that we have experienced with devices such as the Amazon Alexa, and we always to promote privacy. So, our application only records and sends data to be processed when you have explicitly pushed the button. Otherwise, the microphone is doing nothing and we cannot "spy" on you.

What we learned

We got a better understanding of working with different API's in C# which is sometimes not the most fun (especially as you get into async). We also learned how to read a microphone input in real time which is trickier in C# than you would think. Finally, we had never hooked into an API using a websocket, so that was really neat to find success with that.

What's next for Hey Spotify

We would really like to flesh out the app even more. First off, we want to add the ability to create, modify, and add/remove songs from playlists. We also want to add more displays to our UI such as a nice and prominent display queue to make it easy to see what you have coming up.

Built With

Share this project: