Inspiration
Public TTS services are not so good. We couldn't find big public data sets to train state-of-the-art algorithms. Aspects of Google Voice are trained on 1900 hours of private data. Power to the people!
What it does
Uses YouTube and handmade Closed Captions to create audio snippets with correct English text.
How we built it
Python, Google YouTube API, PyDub
Challenges we ran into
Python's type safety, python in general, lack of video decoder capabilities on lab machines, sleep, annoying people
Accomplishments that we're proud of
running and easy to use product, 7000 sentences with audio snippet and accurate English text.after just 3 hours of running (on one slow ass machine).
What we learned
The Google API is powerful, this is actually totally possible
What's next for SpeechFrenzy
DRAIN YOUTUBE!! Minor post-processing of the data, error evaluation, build a huge data set and publish it, expand to audio books (highly illegal) and movies (even more highly illegal) and other languages (totally possible as well)
Built With
- uglypython
Log in or sign up for Devpost to join the conversation.