VoiceOps-AI

Inspiration

During real-world incident response, engineers often operate under extreme pressure while switching between dashboards, logs, terminals, and alerting tools. This constant context switching increases cognitive load and slows down decision-making, directly impacting Mean Time to Recovery (MTTR).

VoiceOps AI was inspired by a simple question: What if system operations felt more like a conversation than a dashboard?

We wanted to explore whether a voice-first interface, powered by modern AI reasoning and cloud telemetry, could allow engineers to keep their focus while still understanding and operating complex infrastructure. The goal was not to replace existing tools, but to create a calm, conversational “mission control” layer on top of them.

What it does

VoiceOps AI is a voice-first SRE assistant that allows engineers to monitor system health, inspect incidents, and trigger safe operational actions entirely through natural speech.

The system fetches live infrastructure state from Google Cloud, reasons over it using Gemini, and responds using natural, low-latency voice powered by ElevenLabs. All interactions are hands-free and optimized for spoken clarity, making the experience suitable for high-pressure operational scenarios.

Key capabilities include:

Voice-based system health checks
Incident and deployment status summaries
Real-time latency and error insights
Voice-triggered, simulated operational actions
A synchronized “computer use” HUD that visually reflects system state

How we built it

VoiceOps AI is built entirely on Google Cloud and ElevenLabs, following a clean, modular architecture.

Google Cloud Firestore acts as a real-time Digital Twin of the system state, storing health, latency, error rate, deployment status, and incident flags.
Cloud Run hosts a stateless HTTP backend that exposes this system state via a public API.
Gemini (Vertex AI) serves as the reasoning layer, converting raw telemetry JSON into grounded, voice-optimized explanations while being strictly constrained to avoid hallucinations.
ElevenLabs powers both speech-to-text and text-to-speech, enabling natural, low-latency conversational interaction.
A lightweight frontend HUD synchronizes voice commands with visual feedback, representing safe “computer use” without directly controlling the operating system.

All components are designed to be reproducible, scalable, and hackathon-ready.

4. Tech Stack

Google Cloud Platform:
- Vertex AI / Gemini 3 Flash: The reasoning core for processing technical queries and generating grounded operational advice.
- Cloud Run: Hosts the operational backend and the synthetic action executor.
- Firestore: Serves as the real-time "Digital Twin" of the infrastructure state.
ElevenLabs:
- Scribe (v1): High-accuracy technical speech-to-text.
- TTS (Turbo v2.5): Low-latency, lifelike voice synthesis for mission-critical feedback.

5. Implementation Reference

Detailed mapping of hackathon-required technologies to the project source code:

Google Cloud Platform (GCP)

Component	Implementation File	Role in Project
Vertex AI / Gemini 3 Flash	`services/geminiService.ts`	Reasoning engine, tool definitions, and voice-optimized system instructions. Orchestrated in `App.tsx`.
Cloud Run	`App.tsx`	The application interfaces with the `SYSTEM_STATUS_API` endpoint (deployed on Cloud Run) to fetch live infrastructure telemetry.
Firestore	`types.ts` & `App.tsx`	The `SystemStatus` interface defines the "Digital Twin" schema, which is mirrored in Firestore for real-time dashboard updates.

ElevenLabs

Component	Implementation File	Role in Project
Scribe (v1)	`services/elevenLabsService.ts`	Handled by the `speechToText` function. Provides high-precision technical transcription of SRE commands.
TTS (Turbo v2.5)	`services/elevenLabsService.ts`	Handled by the `textToSpeech` function using the `eleven_turbo_v2_5` model for ultra-low latency vocal feedback.

Challenges we ran into

One of the biggest challenges was preventing AI hallucination in an operational context. Early versions of the assistant tended to invent incidents or root causes when prompts were too open-ended. This was solved by grounding Gemini strictly to Cloud Run data and enforcing hard rules in system prompts.

Another challenge was designing responses that felt voice-native, not like spoken dashboards. Raw metrics and lists had to be transformed into short, calm, spoken explanations that made sense when heard aloud.

Finally, aligning real-time cloud data with a smooth voice experience required careful coordination between backend latency, AI reasoning speed, and speech synthesis.

Accomplishments that we're proud of

Successfully built a fully working, end-to-end voice-first application
Integrated Google Cloud + Gemini + ElevenLabs into a single conversational system
Created a grounded AI assistant that avoids hallucination in a critical domain
Designed a premium, minimal UI that supports voice instead of competing with it
Delivered a reproducible, open-source project with clear architecture and documentation

What we learned

This project reinforced that voice interfaces require fundamentally different design principles than text or dashboards. Brevity, tone, grounding, and latency matter far more when users are listening instead of reading.

We also learned that AI agents become significantly more trustworthy when they are tightly coupled to real system state and constrained by deterministic backends, especially in operational and safety-critical domains.

What's next for VoiceOps-AI

Future enhancements include:

Integration with real Google Cloud Monitoring and Logging APIs
Multi-agent workflows for network, database, and security domains
Multi-language voice support
Role-based voice authentication
Advanced remediation playbooks and approval workflows

VoiceOps AI is a step toward calmer, more human-centric system operations — where engineers talk to their infrastructure instead of fighting it.

Built With

Updates

Private user started this project — Dec 31, 2025 01:18 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.