Inspiration

Social anxiety only gets better by being exposed to more social interactions. Yet, people with social anxiety fear social interactions. This vicious loop is something that is hard to overcome, and existing solutions like cognitive behavioral therapy are expensive and demanding at $100-$250/session. We thought AI could help.

What it does

BanterBox allows you to speak with a human-like AI voice agent in predefined, 'anxiety-inducing' scenarios. These scenarios are often awkward or challenging situations for people with social anxiety - for instance, accidentally spilling coffee on someone. When you're done with the conversation, BanterBox offers detailed line-by-line feedback on your conversation skills and presents you with detailed scores on a radar plot.

How we built it

Architecture: Microphone → WebSocket → Deepgram STT →
Transcript Accumulation → GPT-4o Orchestrator → Response Generation → ElevenLabs TTS → Audio Playback

We separated our AI avatar into 3 separate models, using Deepgram speech to text, GPT-4o for the "brain" of the model, and ElevenLabs text-to-speech for a humanlike voice. We first tried using OpenAI's multimodality for speech-to-text, but we realized it was very inconsistent, leading us to pivot to use Deepgram STT. We also used WebSockets to stream audio in real time to simulate real-life conversations.

Challenges we ran into

The main challenge was to create an optimal way to simulate real-life conversations without having the AI model interrupt us in the middle. We have tried various ways, such as using the inbound Deepgram endpoint feature, but the sentences kept cutting off midway, where GPT generated two responses. Thus, we have added an "append" feature in Deepgram, where we add all our voices together until a 2-second silence has been reached. While it makes the AI model slower than actual human conversations, the difference was negligible enough. Another challenge was that we tried to integrate the HeyGen real-life avatar, but they required TURN/ICE setup, which we did not have time for, pivoting to dynamic pre-generated images instead.

Accomplishments that we're proud of

We are proud of creating a complex multimodal system that allows human-to-ai conversations seamlessly. We are also proud that our projects work end-to-end without any severe issues.

What we learned

We learned how to use limited resources, such as the free-tier ElevenLabs API and DeepGram API, to deliver an end-to-end product that serves its purpose. We also learned how to architect systems and design complex backends.

What's next for BanterBox

We will expand BanterBox to help socially anxious people overcome more common and anxiety-inducing situations, such as a job interview or a first date. We will also plan to add a custom prompting feature where socially-anxious users can create custom simulations of situations they want to practice in.

We are a good-willed, disruptive company. We plan to change how human communication training happens globally. Instead of boring, expensive sessions with therapists and coaches, we deliver inexpensive, AI-based simulations with line-by-line feedback.

Built With

Share this project:

Updates