Use Text to Speech or Speech to Text

Inspiration

Recent development of agents mainly voice agents inspired us to explore the main backbone of the system i.e. speech recognition and its accuracy is important for the overall accuracy of the system.

What it does

It transcribes the audio provided utilizing a microphone.

How we built it

We utilized transfer learning and loaded whisper-large-v3 from hugging face and utilized gradio for showing UI.

Challenges we ran into

Not many challenges but taking in the audio from the microphone and passing the array and sampling rate to the model was the most challenging one.

Accomplishments that we're proud of

We were able to build gradio app that takes audio from a microphone and finally transcribes that audio.

What we learned

Transfer Learning, ASR, gradio apps

What's next for Use Text to Speech or Speech to Text

Adding text-to-speech and LLM capabilities to build voice-to-voice assistant.

Built With

gradio
transformer
whisper

Updates

Siddhartha Shrestha started this project — Aug 15, 2024 07:59 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.