As kids we always loved having a story read to us, and the importance of reading a good book has never faded. However, between schoolwork, social lives, and other obligations, time for reading has become secondary to everything else. Even with the rise of audiobooks, complaints often come up that the books are either too expensive, or the voice reading the book is annoying. We developed TallTale to fix these problems, increase the spread of knowledge, and bring back the love of a good story.

To achieve our goal of cloning human voices, we implemented NVIDIA's Tacotron2. A sophisticated RNN sequence-to-sequence predictor, this maps text to a mel-spectrograms. This output is then fed into a a customized "WaveNet" known as "WaveGlow" [https://github.com/NVIDIA/waveglow]. WaveGlow is responsible for synthesizing the mel-spectograms into high-fidelity audio clips.

