Banter Room

Deck

Inspiration

Ever wished you could chat with Elon Musk or swap stories with Dwayne Johsnon? Learn from Zuckerberg, or have Bezos on your personal board of advisors?

They say, you're the average of the five people around you? What if you could curate your five people, learn and have fun with whoever you wish!

Yesterday, while thinking of ideas, we were wondering what a simulated world would look like. What would virtual curated communities look like?

Well, we made It real.

Banter Room is a place where you can jump on and facetime with your new besties!

Can't wait to show it to y'all!

Here's a demo video on Loom(since we couldn't attach it in DevPost): https://www.loom.com/share/788b19f92d31439ea3957a1d343fd5dd

What it does

Creates audio and video streams with AI avatars of your choice, with cloned voices! Have fun with your besties, learn from the best, Just like a Zoom meeting!

How we built it

Groq+Llama for the reasoning/responses Famous personalities and their cloned voices into Eleven Labs Whisper(on local) for Transcription Octo AI for fun AI avatars Streamlit for the UI

Challenges we ran into

Latency of video generation+render, primarily due to limitations in ElevenLabs(up to 3 second latency) and Streamlit.
Lip Sync(using the wave2lip library) is still not state-of-the-art to cross the uncanny valley
We wanted to get the hack running on Zoom and Google Meets, but popular Meeting API platforms like Recall.ai do not offer streaming responses
- Ideally, there should be voice-to-voice models, which understand emotions, tone and enunciation

Accomplishments that we're proud of

We've built out an end-to-end pipeline that allows you to any real or fictional character, with a sub-1-second latency of voice-to-voice.
In a group call setting, orchestration engine to decide the relevant speaker and responses via Llama
We harnessed the capabilities of Llama3+Groq for lightning-speed latencies in text generation
Lip Sync in Video latencies: achieved sub-1 second latency for short videos

What we learned

You can bring down effective latencies in video & audio generation by breaking down your content into packet chunks, enabling a near-real-time generation to streaming.
- Llama 7B models are still not as robust as 80B, and we had to rely on 80B models for our app

What's next for Banter Room

Enable video-to-video streaming
Build your favorite character on the fly, by feeding in thirty seconds of their speaking video
Port this application to Meta Quest for VR and AR environments