Inspiration
We may still remember the Talking Tom App which ignited the internet in 2010. Instead of a cat voice, what if we try to mimic the human voice with the sound of a piano. Of course, the idea of a piano talking back to you might be creepy. As a group of four out-going musicians in quarantine, we are now comfortable with this idea of talking to a piano. In fact, a lot of composers got their music ideas from the way people and nature talk.
For example, the French composer Messiaen was even inspired to convert bird songs into the piano. In the era of digital music today, we now try to raise this analogy of sound with a more rigorous and algorithmic level to convert human speech into a piano composition.
What it does
We designed an elegant, modern, user-friendly web-based app that convert audio to piano music. The user first uploads their voice and sets the parameters. Then in real-time, our system processes their voice and converts it into piano music. After the conversion, the user can access three different representations of their new piano voice: they can listen to the piano audio, watch an animation of the piano roll, and view the sheet music.
How I built it
We use a technique called short window fast Fourier Transform to map the human voice into a spectrogram. A spectrogram separates sound intensity (bright/dark colors), and frequency/pitch (y-axis), and maps them as a function of time (x-axis). We then extract the optimal frequency combinations into keys on a piano keyboard to produce piano music. Now let’s look at the structural design of this software.
We coded the algorithm in Python. From the spectrogram, we take a time window and pick the most prominent frequencies. Then we convert these frequencies to piano keys, and we assign the key’s volume based on the sound intensity. We repeat this process for each time step in the audio. As an output, we generate a midi file, mp3, pdf of sheet music, and a cool piano roll visualization of the music.
Challenges I ran into
There were many challenges that we had to overcome, including picking a suitable time window, number of frequency bins, and conversion of intensity to key volume. One interesting example of a challenge is here, where we see many repeated notes. To make the music more playable by a human, we group similar repeated notes into a single long note. The amount of note grouping and the number of simultaneous notes are tunable parameters that the user can interact with on our website.
Accomplishments that I'm proud of
We accomplish most of what we've set out to do and learned a lot in the process. As passionate musicians, we are also proud of connecting music with technology.
What I learned
We've learned about audio signal processing in-depth. We've also learned about connecting front-end to back-end web server.
What's next for Talking Piano
For future development, we plan to have real-time online recording, more instruments besides the piano, and more visualizations like a piano keyboard. We hope our project will inspire composers and other musicians to explore novel methods of music composition.


Log in or sign up for Devpost to join the conversation.