RegGuardian — ARIA: Live AI Incident Compliance Agent

Landing page — ARIA IDLE, 3-column layout, empty state
ARIA LISTENING · P1 · ⏱ DORA 3:59:33 · Executive Mode close-up
HERO SHOT — ARIA LISTENING + P1 + DORA 3:57:37 + Executive Mode + TRIGGERED
Incident created — ARIA ACTIVE, INC-ID, first transcript card
ARIA LISTENING + AUDIO LIVE + all 6 sections + transcript
Report building — Timeline + Blast Radius + Root Cause appearing

Inspiration

It's 3 AM. Your payment gateway is down. 73,000 customers can't transact. Engineers are in a war room, screaming service names and error codes at each other over a call. You fix it by 4 AM.

Now comes the other clock.

Under EU DORA Article 11.1(a), you have exactly 4 hours from incident classification to notify your competent authority. That means someone — usually the most senior compliance officer — has to reconstruct everything that happened from memory, Slack threads, and Grafana screenshots, and produce a structured regulatory report. By 5 AM. Still exhausted.

That report currently takes 4+ hours. We built ARIA to do it in under 8 minutes, live, while the incident is still happening.

What it does

ARIA (Automated Regulatory Incident Analyst) joins the incident call as a silent AI agent. It:

🎙️ Listens to the war room in real time via the Gemini Live API — hearing every service name (payment-gateway-v2), error code (503, POOL_EXHAUSTED), and impact figure (73,000 users, 7.3% failure rate)
🖥️ Watches engineers' screens every 5 seconds — reading Grafana dashboards, kubectl output, and alert panels via Gemini Vision
⏱️ Triggers a 4-hour DORA countdown clock the moment the threshold is crossed (>5% transaction failure rate)
📋 Builds the DORA Article 11 report live, section by section — Timeline, Blast Radius, Root Cause, Regulatory Obligations (with exact clause citations), Remediation, Executive Summary
🗣️ Switches persona based on who's in the room: Engineering mode (technical), Compliance mode (regulatory clauses), Executive mode (business impact, no jargon)

By the time the incident is resolved, the compliance report is already written.

How we built it

Hybrid two-model architecture on Gemini Live API:

The Gemini Live API on AI Studio only exposes bidiGenerateContent on native-audio models (gemini-2.5-flash-native-audio-latest). These support real-time bidirectional audio and produce inputTranscription of participant speech — but cannot emit structured JSON directly. We solved this with a two-model pipeline:

Gemini Live (gemini-2.5-flash-native-audio-latest, responseModalities: ['AUDIO']) — permanent bidirectional audio stream, fires inputTranscription per speech turn
Gemini Flash (gemini-2.5-flash, generateContent) — receives the transcript, returns a validated IncidentEvent JSON object

Once structured, each incident event flows through a 5-agent ADK pipeline via Cloud Pub/Sub:

Gemini Live → inputTranscription → generateContent → IncidentEvent (Zod validated)
    ↓
Pub/Sub: incident-events
    ↓
[Analyst Agent]    → root cause, blast radius, severity
    ↓
[Compliance Agent] → DORA Art. 11.1(a/b/c), SOX Section 404 mapping
    ↓
[Reporter Agent]   → 6 report sections → Firestore → SSE → browser live

The frontend uses AudioWorklet to capture PCM 16kHz audio, streams it via WebSocket, and receives report sections via Server-Sent Events as they're generated — creating the "report building before your eyes" effect.

All infrastructure is Terraform-provisioned on GCP: Cloud Run (min-instances=1, CPU always-on for WebSocket longevity), Cloud Pub/Sub with 4 DLQ topics, Firestore, Cloud Build CI/CD, Artifact Registry, and Secret Manager for the API key.

Challenges we ran into

The Gemini Live API model problem was the hardest. We went through 8 failed architectures:

Attempt	Model	Config	Result
v1–v2	`gemini-2.0-flash-live-001`	correct config	1008 — not found on AI Studio key
v3–v4	various flash models	bidiGenerateContent	1008 — not available
v5	native-audio	TEXT modality	1007 — "Cannot extract voices"
v6	native-audio	AUDIO+TEXT + systemInstruction	1007 — "Invalid argument"
v7	native-audio	AUDIO only	✅ Session stable

The breakthrough: gemini-2.5-flash-native-audio-latest is a voice-to-voice model. It rejects TEXT modality and rejects systemInstruction in the live config entirely. The correct pattern is to use it purely for transcription and call a second model for structured reasoning.

We also fought a reconnect storm bug — onclose was firing retryWithBackoff even after intentional stop() calls, hammering the API. Fixed with an isStopped flag checked in every reconnect path including the catch block.

Accomplishments that we're proud of

The Gemini Live session stays permanently open for the duration of an incident — no disconnects, no reconnect loops
The full pipeline — speech → transcript → JSON → Pub/Sub → 3 ADK agents → SSE → browser — completes in under 10 seconds from when someone finishes speaking
The DORA Article 11 report cites exact regulatory clause numbers (Art. 11.1(a), Art. 11.1(b), Art. 11.1(c)) and calculates the exact notification deadline timestamp
Persona switching is automatic — ARIA detects vocabulary context and adapts its voice without any user intervention
Everything is production-grade: Zod schema validation on every message, DLQ topics for failed events, circuit breaker middleware, Cloud Monitoring metrics, and full Cloud Logging structured output

What we learned

The Gemini Live API is fundamentally different from a text model with audio input. It's a voice-to-voice system designed for conversational agents. Treating it like a text model — adding TEXT modality, systemInstruction, structured JSON prompts — breaks the session within milliseconds.

The correct pattern for agentic pipelines: use the Live model purely for what it excels at (real-time transcription of natural speech), then hand off to a reasoning model for structured output. The two-model hybrid is not a workaround — it's the right architecture.

We also learned that WebSocket longevity on Cloud Run requires cpu-throttling: false and min-instances: 1 — without these, the instance idles and drops the long-lived connections.

What's next for RegGuardian — ARIA: Live AI Incident Compliance Agent

Auto-notification draft: ARIA generates the actual regulator notification email, ready to send with one click — within the 4-hour window
Multi-jurisdiction support: extend beyond DORA/SOX to MAS TRM (Singapore), FFIEC (US banking), and PSD2
Historical RAG: pull from past incident post-mortems to identify recurring patterns and suggest permanent fixes
SIEM integration: ingest PagerDuty, OpsGenie, and Datadog webhooks directly — no manual incident creation
Audit-ready export: one-click PDF export of the full DORA Article 11 report, cryptographically signed and timestamped for regulatory submission

Built With

artifact-registry
audioworklet
cloud-build
cloud-pub/sub
cloud-run
docker
firestore
gemini-2.5-flash
gemini-live-api
google-adk
node.js
secret-manager
server-sent-events
terraform
websocket
zod

Updates

Manoj Mallick started this project — Mar 16, 2026 08:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.