I've always had a side hobby of composing music, but have never really had the resources to get my works performed. Since I've got the technical background, I thought it might be cool to try and generate synthetic performances from scratch.

What it does

It reads sheet music for english choral pieces (MusicXML) and will generate a wav file performance of the piece of music

How I built it

It uses python with the music21 library to parse the MusicXML files. Google Cloud Text-to-Speech is used to create speech for each word in the piece, and then matlab is used to autotune the TTS samples to fit the music. Additionally, the Montreal-Forced-Aligner is used to generate timing information about the phonemes in each audio sample, which is then used to create a recipe that matlab can use to perform the auto-tuning

Challenges I ran into

working with sound is incredibly complicated, and so all of the voices I have sound very distorted and chipmunky. Additionally, trying to glue together all the different components in this project had lots of problems, and for a while, most pieces of music I tested would fail outright. At this point, I've maybe got 70% of the big problems worked out, but there's still a ton more to work on.

Accomplishments that I'm proud of

I made a computer sing music from scratch

What's next for Ensemble

In the future I'd like to implement Ensemble as a pure WaveNet, like Google TTS. The main barrier is that the dataset for such an architecture does not exist, so I plan on constructing the dataset myself. This ought to lead to extremely natural sounding performances.

Built With

Share this project: