Inspiration
Recent development of agents mainly voice agents inspired us to explore the main backbone of the system i.e. speech recognition and its accuracy is important for the overall accuracy of the system.
What it does
It transcribes the audio provided utilizing a microphone.
How we built it
We utilized transfer learning and loaded whisper-large-v3 from hugging face and utilized gradio for showing UI.
Challenges we ran into
Not many challenges but taking in the audio from the microphone and passing the array and sampling rate to the model was the most challenging one.
Accomplishments that we're proud of
We were able to build gradio app that takes audio from a microphone and finally transcribes that audio.
What we learned
Transfer Learning, ASR, gradio apps
What's next for Use Text to Speech or Speech to Text
Adding text-to-speech and LLM capabilities to build voice-to-voice assistant.
Built With
- gradio
- transformer
- whisper
Log in or sign up for Devpost to join the conversation.