Public TTS services are not so good. We couldn't find big public data sets to train state-of-the-art algorithms. Aspects of Google Voice are trained on 1900 hours of private data. Power to the people!

What it does

Uses YouTube and handmade Closed Captions to create audio snippets with correct English text.

How we built it

Python, Google YouTube API, PyDub

Challenges we ran into

Python's type safety, python in general, lack of video decoder capabilities on lab machines, sleep, annoying people

Accomplishments that we're proud of

running and easy to use product, 7000 sentences with audio snippet and accurate English text.after just 3 hours of running (on one slow ass machine).

What we learned

The Google API is powerful, this is actually totally possible

What's next for SpeechFrenzy

DRAIN YOUTUBE!! Minor post-processing of the data, error evaluation, build a huge data set and publish it, expand to audio books (highly illegal) and movies (even more highly illegal) and other languages (totally possible as well)

Built With

  • uglypython
Share this project: