WaveGlow Architecture
WaveGlow Constrained Cost Function
Tacotron 2 Architecture
WaveGlow Sample Distribution

TallTale

As kids we always loved having a story read to us, and the importance of reading a good book has never faded. However, between schoolwork, social lives, and other obligations, time for reading has become secondary to everything else. Even with the rise of audiobooks, complaints often come up that the books are either too expensive, or the voice reading the book is annoying. We developed TallTale to fix these problems, increase the spread of knowledge, and bring back the love of a good story.

To achieve our goal of cloning human voices, we implemented NVIDIA's Tacotron2. A sophisticated RNN sequence-to-sequence predictor, this maps text to a mel-spectrograms. This output is then fed into a a customized "WaveNet" known as "WaveGlow" [https://github.com/NVIDIA/waveglow]. WaveGlow is responsible for synthesizing the mel-spectograms into high-fidelity audio clips.

Built With

Updates

Jorge Nario started this project — Apr 14, 2019 01:53 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.