Inspiration

The inspiration for Live Agent came from a simple question: Why do AI interactions still feel like a series of "turns" rather than a continuous flow? We wanted to move away from the "type-and-wait" paradigm and create an assistant that feels truly present—one that can hear your tone, see your environment, and remember your history just like a human collaborator would.

What it does

Talk Naturally: Engage in low-latency voice conversations with Gemini, featuring natural interruptions and high-fidelity audio. Share Vision: Stream a live camera feed so the AI can analyze visual context in real-time. Hybrid Interface: Switch seamlessly between voice, video, and text input depending on the context of the task.

How we built it

python fast api -websocket react-

Challenges we ran into

Latency Management: Synchronizing high-frequency audio chunks with video frames while maintaining a "live" feel required careful optimization of the WebSocket stream.

Transcription Flow: We initially struggled with "word-by-word" replacement in the UI, which we solved by implementing a cumulative buffering system that handles smart spacing and turn-completion logic.

Accomplishments that we're proud of

Polished UX: Creating a UI that feels "technical yet approachable"—balancing dense data (like audio levels and transcripts) with a clean, modern layout.

What we learned

We gained deep insights into the Web Audio API and the nuances of real-time PCM streaming. We also learned how to design "interruption-friendly" AI interfaces, where the system must gracefully handle being cut off by the user and immediately pivot its reasoning.

What's next for Live Agent with voice and video

File Upload Integration: Adding the ability to drop documents and images directly into the chat for deep analysis alongside voice/video context. Tool Grounding: Integrating Google Search and Maps grounding to allow the agent to discuss real-world events and locations with live data. Collaborative Sessions: Allowing multiple users to join a single Live Agent session for team-based brainstorming and troubleshooting.

Built With

Share this project:

Updates