Inspiration
We wanted to build a voice agent that feels less like a chatbot and more like a real operating layer between people, Google AI, and the machine they are using.
Most assistants are strong at conversation or strong at automation, but rarely both at the same time. They can answer questions, but they often lose context when a task becomes long-running, local, or interruptible. We were especially interested in the gap between a natural live conversation and grounded execution on a real device.
That led to Relay: a live voice agent designed for the Google ecosystem and the local desktop. Our goal was to create an experience where a user can speak naturally, interrupt mid-response, redirect work, and still receive grounded updates from a real execution path instead of a guessed answer.
What it does
Relay is a live voice agent infrastructure that connects a cloud-hosted Gemini session with grounded execution on the user's local machine.
A user can start speaking naturally, ask Relay to perform a task, clarify details if needed, interrupt the assistant while it is talking, redirect the task, and continue the same conversation without losing state. Relay keeps the live session in the cloud, while local machine work is executed through a connected Gemini CLI runtime on the desktop.
In practice, that means Relay is built for flows like these:
- Start a task in voice
- Ask follow-up questions in the same session
- Interrupt the assistant and pivot immediately
- Resume or inspect an existing task
- Receive a grounded completion briefing after real execution finishes
Instead of pretending it already knows what happened on the local machine, Relay routes local work through a single execution path and reports back using structured task results. That makes the experience feel more trustworthy and more agentic.
How we built it
We designed Relay as a split architecture with a thin desktop client and a cloud-hosted agent core.
The desktop app is built with Electron and React and acts as the live surface for microphone input, assistant audio, session UI, and local executor connectivity. The hosted agent core runs separately as a Cloud Run-ready service and owns the live session, task orchestration, and canonical state.
For the AI layer, we used the Google GenAI SDK. Gemini Live powers the real-time voice session, while Gemini models are also used for intent resolution, task intake, and task routing. Persistent session and task state are stored in Postgres, mapped to a Google Cloud deployment topology using Cloud SQL. We also built a deployment path for Cloud Run, Artifact Registry, secret-based configuration, and schema migration checks.
A key design decision was to keep local execution grounded through a single delegation path. Relay does not invent local machine results. Instead, the hosted agent delegates local work to a connected Gemini CLI executor on the desktop and then turns the verified output into a live conversational briefing.
Challenges we ran into
The hardest part was not connecting to a model. It was designing a system that stays coherent while a real-time conversation, a cloud-hosted agent session, and local task execution are all happening at once.
One challenge was interruption handling. A live agent should not feel turn-based. It needs to handle partial transcripts, final transcripts, assistant speech, user barge-in, and task state changes without confusing the user.
Another challenge was keeping the boundary between cloud intelligence and local execution trustworthy. We did not want the assistant to hallucinate what happened on a machine it does not control directly. That forced us to build a clearer execution contract, structured result handling, and explicit error paths.
We also spent significant effort on canonical state and recovery. Sessions, tasks, events, intake state, and memory all needed to survive reconnects and remain understandable for judging, debugging, and future extension.
Accomplishments that we're proud of
We are proud that Relay is not just a voice UI layered on top of a generic assistant. It is a real live-agent runtime with explicit architecture decisions behind it.
Highlights we are especially proud of:
- A server-owned live Gemini session instead of a purely client-side demo
- Interruptible voice interaction with task continuity
- Grounded local execution through a single delegation path
- Canonical task, event, and memory state persisted in Postgres
- A cloud deployment path built for Cloud Run and Cloud SQL
- A thin desktop experience that can stay conversational while real work happens in the background
We are also proud that the system is designed to fail honestly. If the hosted reasoning or routing path is unavailable, Relay surfaces that clearly instead of guessing.
What we learned
We learned that building a live agent is fundamentally different from building a chat experience.
A good live agent needs strong boundaries: what the model can decide, what the runtime can verify, what belongs in persistent state, and what must come from a grounded executor. We also learned that trust comes from architecture, not just prompt quality.
Another big lesson was that voice makes poor system design obvious very quickly. In text, latency, ambiguity, and state drift can be tolerated. In live conversation, they become immediately visible. That pushed us to think much more carefully about routing, clarification, interruption, and recovery.
Finally, we learned that the most compelling AI experiences are often not the ones that say the most, but the ones that can responsibly do more.
What's next for gemini voice agent
Relay is currently a strong foundation for a much broader Google ecosystem agent.
Next, we want to expand the number of supported integrations and actions so Relay can work across more Google services, more local operating system capabilities, and more user environments. Because the runtime is built around Gemini CLI plus extensible tools, MCP connections, and future extensions, the ceiling is much higher than the current prototype.
Our next priorities are:
- Deeper Google ecosystem workflows
- More robust permissioned account handling
- Richer task memory and cross-session continuity
- Better judge and user onboarding flows
- Expanded support for additional surfaces and platforms
Our long-term vision is simple: Relay becomes a voice-first agent layer that lets users work across Google AI, Google Cloud, and their own machine as one continuous live system.
Built With
- artifact-registry
- cloud-build
- cloud-sql
- docker
- gemini-live-api
- gemini-models
- javascript
- node.js-electron
- react
- secret-manager-postgresql
- tailwind-css-google-genai-sdk
- typescript
- vertex-ai-google-cloud-run
- vite
- websockets-gemini-cli-vitest
Log in or sign up for Devpost to join the conversation.