Relay: The Voice Agent for the Google Ecosystem

service screenshot 1
service screenshot 2
architecture diagram
service screenshot 3
simple architecture diagram

Inspiration

We wanted to build a voice agent that feels less like a chatbot and more like a real operating layer between people, Google AI, and the machine they are using.

Most assistants are strong at conversation or strong at automation, but rarely both at the same time. They can answer questions, but they often lose context when a task becomes long-running, local, or interruptible. We were especially interested in the gap between a natural live conversation and grounded execution on a real device.

That led to Relay: a live voice agent designed for the Google ecosystem and the local desktop. Our goal was to create an experience where a user can speak naturally, interrupt mid-response, redirect work, and still receive grounded updates from a real execution path instead of a guessed answer.

What it does

Relay is a live voice agent infrastructure that connects a cloud-hosted Gemini session with grounded execution on the user's local machine.

A user can start speaking naturally, ask Relay to perform a task, clarify details if needed, interrupt the assistant while it is talking, redirect the task, and continue the same conversation without losing state. Relay keeps the live session in the cloud, while local machine work is executed through a connected Gemini CLI runtime on the desktop.

In practice, that means Relay is built for flows like these:

Start a task in voice
Ask follow-up questions in the same session
Interrupt the assistant and pivot immediately
Resume or inspect an existing task
Receive a grounded completion briefing after real execution finishes

Instead of pretending it already knows what happened on the local machine, Relay routes local work through a single execution path and reports back using structured task results. That makes the experience feel more trustworthy and more agentic.

How we built it

We designed Relay as a split architecture with a thin desktop client and a cloud-hosted agent core.

The desktop app is built with Electron and React and acts as the live surface for microphone input, assistant audio, session UI, and local executor connectivity. The hosted agent core runs separately as a Cloud Run-ready service and owns the live session, task orchestration, and canonical state.

For the AI layer, we used the Google GenAI SDK. Gemini Live powers the real-time voice session, while Gemini models are also used for intent resolution, task intake, and task routing. Persistent session and task state are stored in Postgres, mapped to a Google Cloud deployment topology using Cloud SQL. We also built a deployment path for Cloud Run, Artifact Registry, secret-based configuration, and schema migration checks.

A key design decision was to keep local execution grounded through a single delegation path. Relay does not invent local machine results. Instead, the hosted agent delegates local work to a connected Gemini CLI executor on the desktop and then turns the verified output into a live conversational briefing.

Challenges we ran into

The hardest part was not connecting to a model. It was designing a system that stays coherent while a real-time conversation, a cloud-hosted agent session, and local task execution are all happening at once.

One challenge was interruption handling. A live agent should not feel turn-based. It needs to handle partial transcripts, final transcripts, assistant speech, user barge-in, and task state changes without confusing the user.

Another challenge was keeping the boundary between cloud intelligence and local execution trustworthy. We did not want the assistant to hallucinate what happened on a machine it does not control directly. That forced us to build a clearer execution contract, structured result handling, and explicit error paths.

We also spent significant effort on canonical state and recovery. Sessions, tasks, events, intake state, and memory all needed to survive reconnects and remain understandable for judging, debugging, and future extension.

Accomplishments that we're proud of

We are proud that Relay is not just a voice UI layered on top of a generic assistant. It is a real live-agent runtime with explicit architecture decisions behind it.

Highlights we are especially proud of:

A server-owned live Gemini session instead of a purely client-side demo
Interruptible voice interaction with task continuity
Grounded local execution through a single delegation path
Canonical task, event, and memory state persisted in Postgres
A cloud deployment path built for Cloud Run and Cloud SQL
A thin desktop experience that can stay conversational while real work happens in the background

We are also proud that the system is designed to fail honestly. If the hosted reasoning or routing path is unavailable, Relay surfaces that clearly instead of guessing.

What we learned

We learned that building a live agent is fundamentally different from building a chat experience.

A good live agent needs strong boundaries: what the model can decide, what the runtime can verify, what belongs in persistent state, and what must come from a grounded executor. We also learned that trust comes from architecture, not just prompt quality.

Another big lesson was that voice makes poor system design obvious very quickly. In text, latency, ambiguity, and state drift can be tolerated. In live conversation, they become immediately visible. That pushed us to think much more carefully about routing, clarification, interruption, and recovery.

Finally, we learned that the most compelling AI experiences are often not the ones that say the most, but the ones that can responsibly do more.

What's next for gemini voice agent

Relay is currently a strong foundation for a much broader Google ecosystem agent.

Next, we want to expand the number of supported integrations and actions so Relay can work across more Google services, more local operating system capabilities, and more user environments. Because the runtime is built around Gemini CLI plus extensible tools, MCP connections, and future extensions, the ceiling is much higher than the current prototype.

Our next priorities are:

Deeper Google ecosystem workflows
More robust permissioned account handling
Richer task memory and cross-session continuity
Better judge and user onboarding flows
Expanded support for additional surfaces and platforms

Our long-term vision is simple: Relay becomes a voice-first agent layer that lets users work across Google AI, Google Cloud, and their own machine as one continuous live system.

Built With

artifact-registry
cloud-build
cloud-sql
docker
gemini-live-api
gemini-models
javascript
node.js-electron
react
secret-manager-postgresql
tailwind-css-google-genai-sdk
typescript
vertex-ai-google-cloud-run
vite
websockets-gemini-cli-vitest

Submitted to

Gemini Live Agent Challenge

Created by

My contributions included product direction, system architecture, and core implementation across Relay.

I worked on the Gemini Live API integration, real-time voice interaction flow, tool calling, task orchestration, and the connection between the cloud-hosted agent and the grounded local execution runtime. I also handled key backend and infrastructure work on Google Cloud, including Cloud Run deployment, Cloud SQL-backed persistence, and the session and task state model needed for continuity and recovery.

On the product side, I shaped the interaction design around interruption handling, response quality, and making the voice experience feel natural and usable. I also prepared the demo, submission materials, and overall presentation of the project.

JONGWOO LEE (이종우)
민민 민

Updates

JONGWOO LEE (이종우) started this project — Mar 16, 2026 03:59 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.