Inspiration
The main inspiration was actually to explore GANs due to rise of chatgpt. Later I found out GANs like DCGANs, Pix2Pix which are designed for Images. Tacotron is such generative model used for voice. The project helps the people for creating voice over content. It tremendously reduces the manual effort required to record the voice. Also the cloned audio sounds natural.
What it does
It is basically a web tool that can be used by people who work with voice over digital content creation. A client may wish to create a video that requires human voice. But they themselves are not very good at voiceover. So they have two options. (1) to hire a voiceover actor, (2) to perform tts. Now hiring actor can be expensive both in terms of cost and time. And TTS is not natural sounding.
Thus using our application a voiceover actor can actually clone his own natural sounding voice in our tacotron model and it can be used by the client. The one time effort is required by the interested voice artist to record 50-100 given sentences on webiste and submit the voice.
Later the model will be available on website for generation. Client can generate the voice using same website and their coins will be debited and credited to voice artist every time they generate their voice.

How we built it
- The model used is NVIDIA's tacotron implementation and I have trained by own voice by preprocessing it
- The authentication of user is done by Auth0
- The frontend is in react and backend is in Flask
- Database is managed by MongoDB, the users, their balance is stored there
Challenges we ran into
- The biggest challenge was requirement of CUDA in the system so I couldn't train model in pc and I had to use google colab. Since I am using google colab I cannot train model in my local flask api, in future I will deploy this on some cloud infrastructure so anyone can access it
Built With
- ai
- flask
- gans
- javascript
- ml
- python
- react
Log in or sign up for Devpost to join the conversation.