Inspiration
During real-world incident response, engineers often operate under extreme pressure while switching between dashboards, logs, terminals, and alerting tools. This constant context switching increases cognitive load and slows down decision-making, directly impacting Mean Time to Recovery (MTTR).
VoiceOps AI was inspired by a simple question: What if system operations felt more like a conversation than a dashboard?
We wanted to explore whether a voice-first interface, powered by modern AI reasoning and cloud telemetry, could allow engineers to keep their focus while still understanding and operating complex infrastructure. The goal was not to replace existing tools, but to create a calm, conversational “mission control” layer on top of them.
What it does
VoiceOps AI is a voice-first SRE assistant that allows engineers to monitor system health, inspect incidents, and trigger safe operational actions entirely through natural speech.
The system fetches live infrastructure state from Google Cloud, reasons over it using Gemini, and responds using natural, low-latency voice powered by ElevenLabs. All interactions are hands-free and optimized for spoken clarity, making the experience suitable for high-pressure operational scenarios.
Key capabilities include:
- Voice-based system health checks
- Incident and deployment status summaries
- Real-time latency and error insights
- Voice-triggered, simulated operational actions
- A synchronized “computer use” HUD that visually reflects system state
How we built it
VoiceOps AI is built entirely on Google Cloud and ElevenLabs, following a clean, modular architecture.
- Google Cloud Firestore acts as a real-time Digital Twin of the system state, storing health, latency, error rate, deployment status, and incident flags.
- Cloud Run hosts a stateless HTTP backend that exposes this system state via a public API.
- Gemini (Vertex AI) serves as the reasoning layer, converting raw telemetry JSON into grounded, voice-optimized explanations while being strictly constrained to avoid hallucinations.
- ElevenLabs powers both speech-to-text and text-to-speech, enabling natural, low-latency conversational interaction.
- A lightweight frontend HUD synchronizes voice commands with visual feedback, representing safe “computer use” without directly controlling the operating system.
All components are designed to be reproducible, scalable, and hackathon-ready.
4. Tech Stack
- Google Cloud Platform:
- Vertex AI / Gemini 3 Flash: The reasoning core for processing technical queries and generating grounded operational advice.
- Cloud Run: Hosts the operational backend and the synthetic action executor.
- Firestore: Serves as the real-time "Digital Twin" of the infrastructure state.
- ElevenLabs:
- Scribe (v1): High-accuracy technical speech-to-text.
- TTS (Turbo v2.5): Low-latency, lifelike voice synthesis for mission-critical feedback.
5. Implementation Reference
Detailed mapping of hackathon-required technologies to the project source code:
Google Cloud Platform (GCP)
| Component | Implementation File | Role in Project |
|---|---|---|
| Vertex AI / Gemini 3 Flash | services/geminiService.ts |
Reasoning engine, tool definitions, and voice-optimized system instructions. Orchestrated in App.tsx. |
| Cloud Run | App.tsx |
The application interfaces with the SYSTEM_STATUS_API endpoint (deployed on Cloud Run) to fetch live infrastructure telemetry. |
| Firestore | types.ts & App.tsx |
The SystemStatus interface defines the "Digital Twin" schema, which is mirrored in Firestore for real-time dashboard updates. |
ElevenLabs
| Component | Implementation File | Role in Project |
|---|---|---|
| Scribe (v1) | services/elevenLabsService.ts |
Handled by the speechToText function. Provides high-precision technical transcription of SRE commands. |
| TTS (Turbo v2.5) | services/elevenLabsService.ts |
Handled by the textToSpeech function using the eleven_turbo_v2_5 model for ultra-low latency vocal feedback. |
Challenges we ran into
One of the biggest challenges was preventing AI hallucination in an operational context. Early versions of the assistant tended to invent incidents or root causes when prompts were too open-ended. This was solved by grounding Gemini strictly to Cloud Run data and enforcing hard rules in system prompts.
Another challenge was designing responses that felt voice-native, not like spoken dashboards. Raw metrics and lists had to be transformed into short, calm, spoken explanations that made sense when heard aloud.
Finally, aligning real-time cloud data with a smooth voice experience required careful coordination between backend latency, AI reasoning speed, and speech synthesis.
Accomplishments that we're proud of
- Successfully built a fully working, end-to-end voice-first application
- Integrated Google Cloud + Gemini + ElevenLabs into a single conversational system
- Created a grounded AI assistant that avoids hallucination in a critical domain
- Designed a premium, minimal UI that supports voice instead of competing with it
- Delivered a reproducible, open-source project with clear architecture and documentation
What we learned
This project reinforced that voice interfaces require fundamentally different design principles than text or dashboards. Brevity, tone, grounding, and latency matter far more when users are listening instead of reading.
We also learned that AI agents become significantly more trustworthy when they are tightly coupled to real system state and constrained by deterministic backends, especially in operational and safety-critical domains.
What's next for VoiceOps-AI
Future enhancements include:
- Integration with real Google Cloud Monitoring and Logging APIs
- Multi-agent workflows for network, database, and security domains
- Multi-language voice support
- Role-based voice authentication
- Advanced remediation playbooks and approval workflows
VoiceOps AI is a step toward calmer, more human-centric system operations — where engineers talk to their infrastructure instead of fighting it.
Built With
- css3
- elevenlabs
- gcp
- gemini
- html5
- javascript
- node.js
- react
- typescript
Log in or sign up for Devpost to join the conversation.