Inspiration

Inspiration The inspiration for this project came from the growing need for natural and accessible human-computer interaction. With the rise of voice assistants like Alexa and Siri, I wanted to explore the possibility of creating an AI-powered chatbot that could engage users in seamless voice-to-voice communication. My goal was to make conversations feel intuitive and dynamic, breaking the barrier between typed input and natural speech.

Additionally, I was motivated by the advancements in ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) technologies, coupled with the power of modern language models. The idea of combining these elements into a single application to create meaningful, real-time interactions fascinated me.

What it does

The idea of combining these elements into a single application to create meaningful, real-time interactions fascinated me.

How we built it

Installed required libraries (gradio, openai-whisper, gtts, etc.) and dependencies like ffmpeg. Configured the Groq API key and tested initial API interactions. Speech-to-Text Module: Implemented ASR using OpenAI’s Whisper model. Preprocessed audio input files for Whisper to ensure compatibility and accuracy.

Natural Language Processing: Used Groq's conversational AI (Llama-based model) to generate meaningful and context-aware responses.

Text-to-Speech Module: Converted the AI-generated text responses into audio files using gTTS. Utilized BytesIO for efficient in-memory audio file handling. User Interface: Designed a user-friendly interface with Gradio to accept audio inputs and return text and voice outputs. Configured live mode for real-time interaction.

Testing and Debugging: Ran extensive tests to ensure accurate transcriptions and meaningful responses. Debugged audio file conversion issues by leveraging FFmpeg for robust handling.

Deployment: Launched the application on Google Colab with a public gradio.live URL.

Challenges we ran into

Colab’s transient environment made debugging more challenging, especially for persistent file storage and sharing.

Initial problems occurred with certain audio formats and sample rates, requiring FFmpeg configuration for compatibility.

Accomplishments that we're proud of

We didi it. This project was an exciting journey into the world of voice-enabled AI. I am proud of the seamless integration of ASR, NLP, and TTS to create a fully functional voice-to-voice chatbot.

What we learned

Automatic Speech Recognition (ASR): I explored OpenAI's Whisper model for its capabilities in transcribing speech with remarkable accuracy, even in noisy environments. Text-to-Speech (TTS): Using Google Text-to-Speech (gTTS), I learned how to convert text responses into audio that feels natural and engaging. API Integration: Working with the Groq API to leverage advanced language models taught me about efficient API integration and managing API keys securely.

What's next for the Branickes

Next step is to make wireless homes dependent and voice-sensitive.

Built With

Share this project:

Updates