I've always had a side hobby of composing music, but have never really had the resources to get my works performed. Since I've got the technical background, I thought it might be cool to try and generate synthetic performances from scratch.
What it does
It reads sheet music for english choral pieces (MusicXML) and will generate a wav file performance of the piece of music
How I built it
It uses python with the music21 library to parse the MusicXML files. Google Cloud Text-to-Speech is used to create speech for each word in the piece, and then matlab is used to autotune the TTS samples to fit the music. Additionally, the Montreal-Forced-Aligner is used to generate timing information about the phonemes in each audio sample, which is then used to create a recipe that matlab can use to perform the auto-tuning
Challenges I ran into
working with sound is incredibly complicated, and so all of the voices I have sound very distorted and chipmunky. Additionally, trying to glue together all the different components in this project had lots of problems, and for a while, most pieces of music I tested would fail outright. At this point, I've maybe got 70% of the big problems worked out, but there's still a ton more to work on.
Accomplishments that I'm proud of
I made a computer sing music from scratch
What's next for Ensemble
In the future I'd like to implement Ensemble as a pure WaveNet, like Google TTS. The main barrier is that the dataset for such an architecture does not exist, so I plan on constructing the dataset myself. This ought to lead to extremely natural sounding performances.