-
-
Landing page — ARIA IDLE, 3-column layout, empty state
-
ARIA LISTENING · P1 · ⏱ DORA 3:59:33 · Executive Mode close-up
-
HERO SHOT — ARIA LISTENING + P1 + DORA 3:57:37 + Executive Mode + TRIGGERED
-
Incident created — ARIA ACTIVE, INC-ID, first transcript card
-
ARIA LISTENING + AUDIO LIVE + all 6 sections + transcript
-
Report building — Timeline + Blast Radius + Root Cause appearing
Inspiration
It's 3 AM. Your payment gateway is down. 73,000 customers can't transact. Engineers are in a war room, screaming service names and error codes at each other over a call. You fix it by 4 AM.
Now comes the other clock.
Under EU DORA Article 11.1(a), you have exactly 4 hours from incident classification to notify your competent authority. That means someone — usually the most senior compliance officer — has to reconstruct everything that happened from memory, Slack threads, and Grafana screenshots, and produce a structured regulatory report. By 5 AM. Still exhausted.
That report currently takes 4+ hours. We built ARIA to do it in under 8 minutes, live, while the incident is still happening.
What it does
ARIA (Automated Regulatory Incident Analyst) joins the incident call as a silent AI agent. It:
- 🎙️ Listens to the war room in real time via the Gemini Live API — hearing every service name (
payment-gateway-v2), error code (503,POOL_EXHAUSTED), and impact figure (73,000 users,7.3% failure rate) - 🖥️ Watches engineers' screens every 5 seconds — reading Grafana dashboards, kubectl output, and alert panels via Gemini Vision
- ⏱️ Triggers a 4-hour DORA countdown clock the moment the threshold is crossed (>5% transaction failure rate)
- 📋 Builds the DORA Article 11 report live, section by section — Timeline, Blast Radius, Root Cause, Regulatory Obligations (with exact clause citations), Remediation, Executive Summary
- 🗣️ Switches persona based on who's in the room: Engineering mode (technical), Compliance mode (regulatory clauses), Executive mode (business impact, no jargon)
By the time the incident is resolved, the compliance report is already written.
How we built it
Hybrid two-model architecture on Gemini Live API:
The Gemini Live API on AI Studio only exposes bidiGenerateContent on native-audio models (gemini-2.5-flash-native-audio-latest). These support real-time bidirectional audio and produce inputTranscription of participant speech — but cannot emit structured JSON directly. We solved this with a two-model pipeline:
- Gemini Live (
gemini-2.5-flash-native-audio-latest,responseModalities: ['AUDIO']) — permanent bidirectional audio stream, firesinputTranscriptionper speech turn - Gemini Flash (
gemini-2.5-flash,generateContent) — receives the transcript, returns a validatedIncidentEventJSON object
Once structured, each incident event flows through a 5-agent ADK pipeline via Cloud Pub/Sub:
Gemini Live → inputTranscription → generateContent → IncidentEvent (Zod validated)
↓
Pub/Sub: incident-events
↓
[Analyst Agent] → root cause, blast radius, severity
↓
[Compliance Agent] → DORA Art. 11.1(a/b/c), SOX Section 404 mapping
↓
[Reporter Agent] → 6 report sections → Firestore → SSE → browser live
The frontend uses AudioWorklet to capture PCM 16kHz audio, streams it via WebSocket, and receives report sections via Server-Sent Events as they're generated — creating the "report building before your eyes" effect.
All infrastructure is Terraform-provisioned on GCP: Cloud Run (min-instances=1, CPU always-on for WebSocket longevity), Cloud Pub/Sub with 4 DLQ topics, Firestore, Cloud Build CI/CD, Artifact Registry, and Secret Manager for the API key.
Challenges we ran into
The Gemini Live API model problem was the hardest. We went through 8 failed architectures:
| Attempt | Model | Config | Result |
|---|---|---|---|
| v1–v2 | gemini-2.0-flash-live-001 |
correct config | 1008 — not found on AI Studio key |
| v3–v4 | various flash models | bidiGenerateContent | 1008 — not available |
| v5 | native-audio | TEXT modality | 1007 — "Cannot extract voices" |
| v6 | native-audio | AUDIO+TEXT + systemInstruction | 1007 — "Invalid argument" |
| v7 | native-audio | AUDIO only | ✅ Session stable |
The breakthrough: gemini-2.5-flash-native-audio-latest is a voice-to-voice model. It rejects TEXT modality and rejects systemInstruction in the live config entirely. The correct pattern is to use it purely for transcription and call a second model for structured reasoning.
We also fought a reconnect storm bug — onclose was firing retryWithBackoff even after intentional stop() calls, hammering the API. Fixed with an isStopped flag checked in every reconnect path including the catch block.
Accomplishments that we're proud of
- The Gemini Live session stays permanently open for the duration of an incident — no disconnects, no reconnect loops
- The full pipeline — speech → transcript → JSON → Pub/Sub → 3 ADK agents → SSE → browser — completes in under 10 seconds from when someone finishes speaking
- The DORA Article 11 report cites exact regulatory clause numbers (Art. 11.1(a), Art. 11.1(b), Art. 11.1(c)) and calculates the exact notification deadline timestamp
- Persona switching is automatic — ARIA detects vocabulary context and adapts its voice without any user intervention
- Everything is production-grade: Zod schema validation on every message, DLQ topics for failed events, circuit breaker middleware, Cloud Monitoring metrics, and full Cloud Logging structured output
What we learned
The Gemini Live API is fundamentally different from a text model with audio input. It's a voice-to-voice system designed for conversational agents. Treating it like a text model — adding TEXT modality, systemInstruction, structured JSON prompts — breaks the session within milliseconds.
The correct pattern for agentic pipelines: use the Live model purely for what it excels at (real-time transcription of natural speech), then hand off to a reasoning model for structured output. The two-model hybrid is not a workaround — it's the right architecture.
We also learned that WebSocket longevity on Cloud Run requires cpu-throttling: false and min-instances: 1 — without these, the instance idles and drops the long-lived connections.
What's next for RegGuardian — ARIA: Live AI Incident Compliance Agent
- Auto-notification draft: ARIA generates the actual regulator notification email, ready to send with one click — within the 4-hour window
- Multi-jurisdiction support: extend beyond DORA/SOX to MAS TRM (Singapore), FFIEC (US banking), and PSD2
- Historical RAG: pull from past incident post-mortems to identify recurring patterns and suggest permanent fixes
- SIEM integration: ingest PagerDuty, OpsGenie, and Datadog webhooks directly — no manual incident creation
- Audit-ready export: one-click PDF export of the full DORA Article 11 report, cryptographically signed and timestamped for regulatory submission

Log in or sign up for Devpost to join the conversation.