Inspiration

Recent development of agents mainly voice agents inspired us to explore the main backbone of the system i.e. speech recognition and its accuracy is important for the overall accuracy of the system.

What it does

It transcribes the audio provided utilizing a microphone.

How we built it

We utilized transfer learning and loaded whisper-large-v3 from hugging face and utilized gradio for showing UI.

Challenges we ran into

Not many challenges but taking in the audio from the microphone and passing the array and sampling rate to the model was the most challenging one.

Accomplishments that we're proud of

We were able to build gradio app that takes audio from a microphone and finally transcribes that audio.

What we learned

Transfer Learning, ASR, gradio apps

What's next for Use Text to Speech or Speech to Text

Adding text-to-speech and LLM capabilities to build voice-to-voice assistant.

Built With

  • gradio
  • transformer
  • whisper
Share this project:

Updates