Chorus: The Immune System for AI Agents

Predict agent conflicts before they cascade.


🎯 Problem Statement

As AI agents increasingly operate in decentralized environments—autonomous trading bots, smart city infrastructure, robotic swarms—they create unpredictable feedback loops and cascading failures.

Consider this scenario: Agent A detects low inventory and orders supplies. Agent B, seeing the same signal, does the same. Agent C observes the sudden demand spike and raises prices. The system spirals into a deadlock—or worse, a market crash.

Current solutions fail because:

  • Traditional monitoring is reactive, not predictive
  • Centralized orchestrators become single points of failure
  • No existing system applies Game Theory to multi-agent conflict detection

The cost of inaction: Cascading failures in autonomous systems can cause millions in damages, safety incidents, and complete system collapse.


💡 Solution

Chorus is a real-time AI safety layer that acts as an "immune system" for multi-agent networks. It:

  • Observes agent interactions via high-throughput event streaming
  • Predicts conflicts using Game Theory analysis powered by Google Gemini
  • Intervenes automatically by quarantining risky agents before failures cascade
  • Alerts operators with voice notifications for critical incidents

Unlike traditional monitoring, Chorus is proactive—it predicts and prevents failures rather than just observing them.


🛠️ Services Used

Google Gemini 3 Pro ⭐ (Core Intelligence)

  • Role: Primary conflict prediction engine
  • Implementation: Direct API integration via google-generativeai SDK
  • How it works: Batched agent intentions are sent to Gemini for Game Theory analysis. The model calculates Nash Equilibria and detects non-cooperative behaviors (resource hoarding, deadlocks) in <50ms.
  • Key Feature: Generates quantitative risk scores (0-100) that drive automated quarantine decisions

Confluent Kafka ⭐ (Event Streaming Backbone)

  • Role: High-throughput message bus for agent communication
  • Implementation:
    • agent-messages-raw: Agents publish intentions
    • agent-decisions-processed: Backend publishes intervention decisions
    • system-alerts: Critical notifications
  • Throughput: 1,000+ messages/second
  • Why Confluent: Decouples high-velocity agent streams from analysis. Enables Event Sourcing for post-mortem failure analysis.

Datadog ⭐ (Observability & Trust Verification)

  • Role: Real-time monitoring and alerting
  • Implementation:
    • Custom metrics: agent.trust_score, system.conflict_risk, intervention.count
    • APM tracing for Conflict Prediction Engine latency
    • Live dashboards for swarm health visualization
  • Why Datadog: Provides a "trust verification layer"—proving to operators that the system is functioning correctly and enabling root-cause analysis.

ElevenLabs ⭐ (Voice-First Incident Response)

  • Role: Voice alerts for critical failures
  • Implementation: Converts structured alert JSON into natural language narrations using eleven_multilingual_v2 model
  • Voice ID: 21m00Tcm4TlvDq8ikWAM (Rachel)
  • Why ElevenLabs: Critical failures in autonomous systems require immediate attention. Voice alerts reduce operator reaction time by explaining exactly why an agent was quarantined.

🏗️ Architecture

Agent Network → Kafka Streaming → Gemini Analysis → Trust Scoring → Intervention → Voice Alerts
     ↓              ↓                 ↓               ↓              ↓            ↓
 Simulation    Event Sourcing    Risk Scoring    Redis Store    Quarantine    ElevenLabs

Data Flow:

  1. Agents publish actions to Confluent Kafka (agent-messages-raw)
  2. Backend batches intentions and sends to Gemini for Game Theory analysis
  3. Trust scores updated in Redis with sub-millisecond latency
  4. Metrics pushed to Datadog on every prediction cycle
  5. Critical alerts trigger ElevenLabs voice synthesis
  6. Real-time state pushed to React dashboard via WebSockets

💭 Inspiration

We were inspired by the human immune system—a decentralized network that detects and neutralizes threats without a central controller. As AI agent systems grow in complexity (autonomous vehicles, DeFi bots, industrial automation), we realized they need the same kind of self-regulating safety mechanism.

The question that drove us: "What happens when AI agents start working together—and against each other?"


📚 What We Learned

  • Game Theory is powerful for AI safety: Nash Equilibrium calculations can predict agent conflicts before they manifest. Gemini's reasoning capabilities made this tractable in real-time.
  • Event Sourcing is essential: Confluent Kafka's immutable log allows us to "replay" failures for post-mortem analysis—crucial for understanding emergent behaviors.
  • Voice alerts reduce cognitive load: In high-stress situations, operators respond faster to spoken explanations than dashboards full of metrics.
  • Trust must be dynamic: Static access control fails in multi-agent systems. Continuous trust scoring based on behavior is the only scalable approach.

🔨 How We Built It

Backend (Python/FastAPI):

  • Conflict Prediction Engine with Gemini 3 Pro integration
  • Trust Management System with Redis persistence
  • Intervention Engine with automated quarantine logic
  • WebSocket server for real-time dashboard updates

Frontend (React/TypeScript):

  • Real-time trust visualization with color-coded agent cards
  • Conflict alerts as toast notifications and dashboard panels
  • System health monitoring (Redis, Gemini, Kafka status)
  • Cyberpunk "Glassmorphism" aesthetic with neon accents

Infrastructure:

  • Dockerized deployment with single-command launch
  • Kubernetes-ready with Helm charts
  • Comprehensive test suite (260+ tests, 92.7% pass rate)
  • Property-based testing with Hypothesis for correctness invariants

🚧 Challenges We Faced

  • Sub-50ms Prediction Latency: Getting Gemini to return Game Theory analysis fast enough for real-time intervention required careful prompt engineering and request batching.
  • Trust Score Consistency: In a distributed system, maintaining consistent trust scores across components was challenging. We solved this with Redis as a single source of truth.
  • False Positive Quarantines: Early versions quarantined too aggressively. We tuned confidence thresholds and added manual override capabilities.
  • Voice Alert Timing: Generating voice alerts added latency. We made ElevenLabs calls asynchronous so they don't block critical intervention actions.

📊 Validation & Results

  • 90.9% system validation success rate
  • 260+ automated tests
  • <50ms conflict prediction latency
  • 1,000+ agents tested concurrently
  • 10,000+ events/second throughput

🔗 Links & Resources

  • Live Demo: ./run_frontend_demo.sh
  • Full Documentation: /docs/
  • Tech Stack: Python, FastAPI, React, TypeScript, Redis, Confluent Kafka, Google Gemini, Datadog, ElevenLabs

Chorus Team — December 2025

Built With

Share this project:

Updates