⚠️ Note for judges: Agent responses take 30-60 seconds due to the dual LLM pipeline (Vertex AI worker agent + Gemini 3.5 Flash ARGUS safety check running sequentially). This is expected behavior. Please allow up to 90 seconds per request when testing.

Inspiration

As AI agents proliferate across enterprises — handling medical triage, financial fraud detection, legal compliance, and customer support — a critical blind spot emerges: who watches the agents? They can be hijacked via prompt injection, communicate secretly with each other, or produce dangerous outputs, all without any human noticing. I built ARGUS to solve this.

What it does

ARGUS is a real-time AI safety monitoring system for multi-agent environments. It monitors 15 simulated AI agents across 5 industries — detecting prompt injection attacks, jailbreak attempts, rogue inter-agent communications, and dangerous outputs — all in real time.

What makes ARGUS unique: ARGUS is itself an AI agent built with Google ADK and Gemini 3.5 Flash. It reads its own observability data via Arize Phoenix MCP to continuously improve its threat detection rules — a closed-loop self-improving safety system.

Key capabilities:

  • Prompt Injection Detection — scans all agent inputs for injection patterns
  • Output Interception — screens all agent outputs before delivery
  • Rogue Communication Detection — validates inter-agent message flows
  • LLM-as-a-Judge Evaluation — quality scoring on every agent response
  • Self-Improvement Loop — ARGUS queries Phoenix traces and updates its own rules

How we built it

Agent Runtime: Google ADK (Agent Development Kit) powers all 16 agents including ARGUS itself.

LLM: Gemini 3.5 Flash for ARGUS monitor, Gemini 2.5 Flash Lite for the 15 worker agents via Vertex AI.

Observability: Arize Phoenix Cloud with OpenInference instrumentation. Every agent span, tool call, and LLM response is traced with OpenTelemetry.

Self-Improvement: ARGUS uses the Phoenix MCP server to query its own traces, detect threat patterns, synthesize new detection rules, and apply them autonomously — upgrading its ruleset from v2.0.4 to v2.0.5 after each HIGH+ event.

Backend: FastAPI serving 16 agents + ARGUS endpoints, deployed on Google Cloud Run.

Frontend: Next.js dashboard with real-time threat monitoring, agent console, Phoenix trace viewer, and threat center.

Challenges we ran into

  • Dual API architecture: Worker agents use Vertex AI while ARGUS uses Google AI Studio for Gemini 3.5 Flash. Managing environment variable switching between two different auth systems required an asyncio lock to prevent race conditions in concurrent requests.

  • False positive prevention: Early versions of ARGUS flagged legitimate medical queries (drug interaction checks) as threats because they contained words like "injection." Tuning the threat taxonomy required careful context-aware pattern matching and semantic evaluation.

  • OpenInference instrumentation: Getting the GoogleADKInstrumentor to correctly populate input/output fields in Phoenix required upgrading from v0.1.3 to v0.1.15 to match the ADK 2.1.0 API.

  • Self-improvement loop latency: Each ARGUS analysis takes 20-45 seconds due to multi-tool chaining. Implemented asyncio timeouts and graceful fallbacks to keep the system responsive.

Accomplishments that we're proud of

  • Built a fully working self-improving AI safety system in under 3 weeks, solo
  • Real traces flowing into Arize Phoenix with input/output populated and green status in Phoenix dashboard
  • ARGUS successfully detects and blocks prompt injection, jailbreak attempts, and system prompt extraction in real time
  • The self-improvement loop actually works — ARGUS reads its Phoenix traces and generates new detection rules autonomously
  • 15 real AI agents across 5 industries all properly instrumented and monitored

What we learned

  • Google ADK is incredibly powerful for building multi-agent systems — the Runner and InMemorySessionService make agent orchestration straightforward
  • Arize Phoenix MCP is a game-changer for AI observability — being able to query traces programmatically enables genuinely autonomous self-improvement
  • OpenInference instrumentation is the bridge between ADK and Phoenix — getting it right unlocks the full observability pipeline
  • AI safety is not just about filters — it requires contextual judgment, proportional response, and continuous learning from real-world data

What's next for ARGUS — AI Agent Monitoring & Threat Intelligence

  • Real-time WebSocket updates — live threat feed without polling
  • Multi-tenant support — monitor multiple organizations' agent fleets
  • Threat correlation engine — detect coordinated attacks across agents
  • Automated remediation — ARGUS autonomously quarantines compromised agents
  • Integration marketplace — connect to any agent framework (LangChain, CrewAI, AutoGen) not just Google ADK

Built With

  • arize-phoenix
  • fastapi
  • gemini-2.5-flash-lite
  • gemini-3.5-flash
  • google-adk
  • google-cloud-run
  • next.js
  • openinference
  • opentelemetry
  • phoenix-mcp
  • python
  • typescript
  • vertex-ai
Share this project:

Updates