Inspiration
I was inspired by Deepgram's AI assistant, which allows for easy and simple interaction with an AI. It also allows for multiple interaction points or nodes, bit-based audio capture, and sending the file for Deepgram to process.
What it does
It transforms the audio into text, reads what the user wants to say, and returns the same text response. Then, it converts the text to speech and returns it to the user. All of this happens in real time.
How we built it
I used Next.js and TypeScript. Using these tools and the ease of organizing the components, I was able to capture the audio and play it back on screen using browser functions. All of this was done locally.
Challenges we ran into
It was difficult for me to get the audio back and test the connection with the Deepgram API. When I tried to send the audio back, it generated an error. I managed to open the microphone, but getting a real-time response was complicated.
Accomplishments that we're proud of
Understanding how a voice agent API works. First, it captures audio, transforms it into text, and understands what the text says. Then, it returns the text response, converts it back to speech, and plays it back. All in record time.
What we learned
Connecting to and consuming an API was achieved, although not 100%. Browser tools were used. Simple code was used for less complexity and greater speed and understanding.
What's next for TalkMeNow
The ideal is to continue and do much better. Especially to make the code and processing algorithm more efficient. To be able to consume the API correctly.
Log in or sign up for Devpost to join the conversation.