AI agents are being deployed in production at an accelerating rate — trading stocks, managing infrastructure, interacting with customers. But there's a critical gap: nobody is governing them in real time. An
agent stuck in a loop can drain thousands of dollars in API tokens. An agent that hallucinates can execute unauthorized trades. An agent generating manipulative language can expose companies to legal liability.
Existing tools like LangSmith and Arize are observability platforms — they watch and log. They'll show you that your agent made a bad trade after the fact. We wanted to build something that prevents the bad trade from happening in the first place. Argus is the difference between a dashcam and an airbag.

What it does

Argus is an Autonomous Governance PaaS — middleware that sits between AI agents and the real world. Every action an agent takes flows through a four-stage, fail-closed pipeline:

  1. Loop Detection — Neo4j graph pattern matching detects when an agent repeats the same action 3+ times
  2. Entity Extraction — Fastino GLiNER pulls structured data (price, ticker, quantity) from unstructured agent logs
  3. Policy Check — Senso semantic search validates actions against organizational policies written in plain English
  4. Safety Check — Detects toxic language, manipulation, and coercive patterns in agent outputs

If any check fails, the agent is immediately halted via a webhook circuit breaker — not just logged, but actively stopped. The system includes a real-time React dashboard with WebSocket updates, dynamic policy management, cross-agent causal chain tracking, a warning tier system, and full compliance audit reporting.

How we built it

  • Backend: FastAPI (async, low-latency) with a fail-closed governance pipeline
  • Graph Database: Neo4j for reasoning chains, loop detection via Cypher queries, and cross-agent INFLUENCES edges
  • Entity Extraction: Fastino GLiNER API with regex fallback for graceful degradation
  • Policy Engine: Senso semantic search API with a local mock policy store as fallback
  • Safety: Keyword-based text safety filter (with Modulate ToxMod stubbed for voice agents)
  • Dashboard: React + Vite + Tailwind CSS + Recharts with real-time WebSocket updates
  • Demo Agents: Python agents using Tavily for real web search, demonstrating 8 scenarios across all violation types
  • Active Circuit Breaker: Webhook registry that fires HTTP callbacks to agent control planes on HALT decisions

Every layer was designed with graceful degradation — if Fastino is down, regex takes over; if Senso is down, local policies activate; if Neo4j is down, in-memory fallback kicks in. The system is always operational.

Challenges we ran into

Loop detection ordering was the trickiest problem. We needed to check for loops before logging the current step to Neo4j — otherwise the query would compare the step against itself and produce false positives. Getting the pipeline ordering right (check → decide → log) required careful thought about when state mutations happen.

Cross-agent causal chains required designing a protocol where one agent includes the parent_step_id from a completely different agent's execution. Building the INFLUENCES edges in Neo4j so that causal chains like ResearchAgent → TradeAgent → RiskAgent are fully traceable took multiple iterations.

Senso API compliance parsing was another challenge — the API returns natural language answers about policy compliance, and we had to build robust parsing that correctly distinguishes "compliant" from "non-compliant" without false matches (e.g., the word "compliant" appearing inside "non-compliant").

What we learned

  • Fail-closed beats fail-open — in governance systems, the safe default is always HALT. A false positive costs a delay; a false negative costs real money.
  • Graph databases are the right abstraction for agent behavior — reasoning chains, loop detection, and causal tracing are all native graph problems.
  • Graceful degradation is non-negotiable — in production middleware, you can't have a single external API failure take down your entire governance layer.
  • Active governance > passive observability — the webhook circuit breaker that pushes halt signals to agents is fundamentally more valuable than a dashboard you have to watch.

What's next

  • Claude Judge Agent — replace pattern matching with Claude reasoning about whether an agent's action is truly safe
  • Modulate ToxMod — real-time voice agent safety monitoring
  • Policy-as-Code SDK — define governance policies in Python/YAML with version control and rollback

Built With

Share this project:

Updates