Inspiration
The rise of AI agents has transformed how organizations build software, automate workflows, and interact with customers. However, once these agents are deployed, teams often have little visibility into when they begin to fail, hallucinate, drift from expected behavior, or degrade in performance.
We noticed that while companies are rapidly adopting AI agents, there is a growing trust gap. Existing monitoring solutions primarily focus on displaying metrics, leaving engineers responsible for manually investigating failures and identifying root causes.
This inspired us to build AgentGuard — the AI that watches your AI.
AgentGuard acts as an autonomous AI Reliability Engineer, continuously monitoring deployed AI agents, detecting behavioral drift, investigating failures, and providing actionable recommendations before users are impacted.
What it does
AgentGuard is an autonomous AI observability and reliability platform designed for organizations running AI agents in production.
The platform continuously:
- Monitors AI agent behavior
- Detects prompt drift, model drift, and retrieval drift
- Identifies hallucinations and safety violations
- Performs root cause analysis
- Predicts future degradation and failures
- Generates intelligent recommendations
- Provides enterprise-grade reliability dashboards
Key features include:
Agent Monitoring Center
Monitor multiple AI agents from a single dashboard with real-time health metrics.
Drift Detection Engine
Detect prompt drift, retrieval drift, model drift, and user intent drift before performance significantly declines.
Hallucination Detection
Identify unsupported claims, fabricated information, and reasoning inconsistencies.
Root Cause Analysis
Automatically investigate failures and explain why they occurred.
Agent Trust Score™
A proprietary reliability score that measures the trustworthiness of every deployed AI agent.
Failure Prediction Engine
Forecast future degradation and identify agents at risk before failures happen.
AI Investigation Copilot
An AI-powered assistant that answers questions about incidents, failures, and agent behavior.
Drift Injection Simulator
A live demonstration environment where users can intentionally break AI agents and watch AgentGuard detect, investigate, and explain failures in real time.
How we built it
AgentGuard was built as a modern full-stack AI observability platform.
Frontend
- Next.js
- TypeScript
- Tailwind CSS
- shadcn/ui
- Framer Motion
- Recharts
Backend
- FastAPI
- Python
AI Layer
- Gemini 2.5
- LangGraph
Observability
- Arize Phoenix
Data Layer
- PostgreSQL
- Redis
Deployment
- Google Cloud Run
- Vertex AI
The platform ingests telemetry, traces, evaluations, and feedback from deployed AI systems. Gemini-powered investigation workflows analyze anomalies, identify root causes, generate incident reports, and recommend corrective actions.
Challenges we ran into
One of the biggest challenges was moving beyond traditional observability.
Most monitoring platforms stop at displaying dashboards and alerts. We wanted AgentGuard to autonomously investigate failures and explain why they happened.
Other challenges included:
- Designing meaningful drift detection workflows
- Building explainable root cause analysis
- Correlating multiple reliability signals
- Creating realistic failure simulations
- Designing enterprise-grade dashboards that remain intuitive
- Balancing technical depth with user experience
We also needed to ensure that the platform could monitor many different types of AI agents while maintaining a unified experience.
Accomplishments that we're proud of
We are especially proud of:
- Building an AI system that monitors other AI systems
- Creating the Agent Trust Score™ framework
- Developing autonomous root cause analysis workflows
- Designing the Drift Injection Simulator for live demonstrations
- Creating a platform that feels like a production-ready SaaS product
- Integrating AI observability with autonomous investigation capabilities
Most importantly, we transformed observability from passive monitoring into active AI governance.
What we learned
Through building AgentGuard, we learned that deploying AI agents is only the first step.
The real challenge begins after deployment.
We discovered that:
- AI systems require continuous monitoring
- Drift can occur silently over time
- Hallucinations often emerge from multiple interacting factors
- Root cause analysis is critical for enterprise trust
- Observability becomes significantly more valuable when combined with autonomous reasoning
We also gained deeper experience with AI evaluation, observability, reliability engineering, and agentic workflows.
What's next for AgentGuard
Our vision is to become the reliability layer for the agent economy.
Future plans include:
Autonomous Remediation
Allow AgentGuard to automatically execute approved corrective actions instead of only recommending them.
Multi-Platform Integrations
Support:
- GitHub
- GitLab
- Slack
- Jira
- ServiceNow
- Datadog
- OpenTelemetry
Advanced Governance
Provide enterprise governance features including:
- Compliance monitoring
- Risk assessment
- Audit trails
- AI policy enforcement
Agent Benchmarking
Compare reliability metrics across different models and agent architectures.
Predictive Reliability Intelligence
Use historical behavior patterns to forecast reliability risks before they emerge.
Enterprise Deployment
Support large-scale monitoring for organizations managing hundreds or thousands of AI agents.
Our long-term goal is simple:
Ensure every AI agent remains trustworthy, reliable, and safe throughout its lifecycle.
Built With
- arize-phoenix
- clerk-authentication
- fastapi
- framer-motion
- gemini-2.5
- google-cloud-run
- langgraph
- next.js-15
- postgresql
- python
- recharts
- redis
- shadcn/ui
- tailwind-css
- tanstack-table
- typescript
- vertex-ai
- zustand
Log in or sign up for Devpost to join the conversation.