Inspiration
AI agents are getting more capable, but they're also becoming more expensive and unpredictable.
While building and experimenting with agentic systems, we kept running into the same problem: when an agent gets stuck, it doesn't always fail obviously. Instead, it enters a loop of slightly different tool calls, retries, and reasoning paths while continuing to consume tokens and API credits.
The frustrating part is that most observability tools only tell you what went wrong after the run is already over. By then, the cost has already been incurred and the failure has already happened.
We wanted something that could actively watch an agent while it was running and step in before things spiraled out of control.
That idea became Vigil.
What it does
Vigil is a transparent LLM proxy that sits between an AI agent and its model provider.
Integrating it requires only a single change: point your existing OpenAI or Anthropic client to Vigil's local endpoint.
Once connected, Vigil:
- Captures every agent step in real time
- Detects semantic loops using embedding similarity rather than exact string matching
- Tracks trajectory health using entropy and behavioral signals
- Automatically triggers a graduated circuit breaker when an agent appears stuck
- Downgrades expensive models when appropriate
- Restricts write operations during recovery attempts
- Halts runaway executions before costs explode
- Provides live dashboards, replay, and forensic debugging tools
Unlike traditional observability platforms, Vigil is designed to intervene, not just observe.
How we built it
Vigil consists of several major components:
Real-Time Proxy Layer
A FastAPI-based proxy supports OpenAI-compatible and Anthropic-compatible APIs. Existing agents connect through a single base URL change.
Semantic Watchdog
We compute embeddings for agent actions and compare them against recent history to detect semantic repetition. This allows Vigil to catch loops even when the wording changes between steps.
Circuit Breaker
Inspired by distributed systems reliability patterns, Vigil uses a graduated response model:
- Normal operation
- Recovery mode
- Restricted mode
- Full halt
Instead of immediately killing an agent, Vigil attempts recovery first.
Effort Governor
Not every step needs a frontier model. Vigil routes simpler work to smaller models and reserves expensive models for harder tasks.
Context Compression
We reduce redundant context accumulation and eliminate repeated information that unnecessarily inflates token usage.
Replay & Forensics
Every trajectory can be replayed and inspected, making debugging significantly easier than digging through logs.
Live Dashboard
A React dashboard streams agent activity in real time, including costs, token usage, breaker state, similarity scores, and intervention events.
Challenges we ran into
One of the hardest challenges was distinguishing genuine progress from repetition.
Many agents naturally revisit the same concepts while solving a problem, so naive duplicate detection produces too many false positives. We had to combine multiple signals—including semantic similarity, entropy-based behavioral analysis, and state-change awareness—to build a detector that is useful in practice.
Another challenge was ensuring that analysis never slowed down the agent itself. Vigil performs monitoring asynchronously so that observability does not become a bottleneck.
Finally, we wanted the system to be framework-agnostic. Supporting existing OpenAI and Anthropic workflows without requiring developers to rewrite their applications became a key design constraint.
What we learned
Building Vigil taught us that the biggest challenge in agent systems is no longer model quality alone—it's runtime reliability.
We learned how quickly costs can grow when context accumulates, how difficult semantic loop detection is compared to simple duplicate detection, and how valuable intervention mechanisms are compared to passive monitoring.
We also learned that developers strongly prefer solutions that integrate with existing workflows rather than requiring framework-specific rewrites.
What's next
We want to expand Vigil into a complete agent control plane with:
- Distributed deployment support
- Team-wide observability
- Advanced anomaly detection
- Security-focused policy enforcement
- Multi-provider routing
- Enterprise integrations
Our long-term vision is to make Vigil the reliability layer that sits between every AI agent and every model provider.
Built With
- anthropic
- arize
- css
- fastapi
- openai
- phoenix
- pydantic
- python
- react
- redis
- sentence
- sentry
- sqlite
- tailwind
- transformers
- typescript
- uvicorn
- vite
- websockets
Log in or sign up for Devpost to join the conversation.