What it does
Our app contains a guided tutorial that allows users to record themselves pronouncing phenomes used in the English language, which can then be stitched together to create speech in the user's own voice.
How we built it
We first created a script in python to convert text to phenomes and stitch together audio recordings of individual sounds to create a single coherent and fluent clip. We played with different methods of overlapping, cross-fading, and trimming silences to flow as naturally as possible through sounds within words. Then, we created a web-based interface to allow users to create and upload voice profiles very quickly and easily, where they can then form sentences using their own voice as well as those that have been uploaded by other users.
Challenges we ran into
In the beginning we were unsure how to map all the letters into sounds, but then we realized that we could use the International Phonetic Alphabet (IPA) to narrow all of English down into less than 50 sound clips. After that, the main challenge was just creating a web interface to make the program more usable. We wanted to turn it into a social media platform for sound bytes, but unfortunately ran out of time.
Accomplishments that we're proud of
We are proud that we have a program running that allows us to generate unique voices from our speech, and that we have a fully functional web app that we could scale into a bigger platform. We played with a lot of new technologies while trying to learn this, and am proud that we learned some new development tools.
What we learned
We learned about the MERN stack, and also we learned how to make compromises in a limited time setting. Though we wanted to add more features, we decided to prioritize having a working product.
What's next for speaker
We want to add proper authentication and the ability to search for and add friends. From there individuals would be able to make voice packs of their friends and would allow them to dictate a familiar voice.