Inspiration

The rise of AI agents has transformed how organizations build software, automate workflows, and interact with customers. However, once these agents are deployed, teams often have little visibility into when they begin to fail, hallucinate, drift from expected behavior, or degrade in performance.

We noticed that while companies are rapidly adopting AI agents, there is a growing trust gap. Existing monitoring solutions primarily focus on displaying metrics, leaving engineers responsible for manually investigating failures and identifying root causes.

This inspired us to build AgentGuard — the AI that watches your AI.

AgentGuard acts as an autonomous AI Reliability Engineer, continuously monitoring deployed AI agents, detecting behavioral drift, investigating failures, and providing actionable recommendations before users are impacted.


What it does

AgentGuard is an autonomous AI observability and reliability platform designed for organizations running AI agents in production.

The platform continuously:

  • Monitors AI agent behavior
  • Detects prompt drift, model drift, and retrieval drift
  • Identifies hallucinations and safety violations
  • Performs root cause analysis
  • Predicts future degradation and failures
  • Generates intelligent recommendations
  • Provides enterprise-grade reliability dashboards

Key features include:

Agent Monitoring Center

Monitor multiple AI agents from a single dashboard with real-time health metrics.

Drift Detection Engine

Detect prompt drift, retrieval drift, model drift, and user intent drift before performance significantly declines.

Hallucination Detection

Identify unsupported claims, fabricated information, and reasoning inconsistencies.

Root Cause Analysis

Automatically investigate failures and explain why they occurred.

Agent Trust Score™

A proprietary reliability score that measures the trustworthiness of every deployed AI agent.

Failure Prediction Engine

Forecast future degradation and identify agents at risk before failures happen.

AI Investigation Copilot

An AI-powered assistant that answers questions about incidents, failures, and agent behavior.

Drift Injection Simulator

A live demonstration environment where users can intentionally break AI agents and watch AgentGuard detect, investigate, and explain failures in real time.


How we built it

AgentGuard was built as a modern full-stack AI observability platform.

Frontend

  • Next.js
  • TypeScript
  • Tailwind CSS
  • shadcn/ui
  • Framer Motion
  • Recharts

Backend

  • FastAPI
  • Python

AI Layer

  • Gemini 2.5
  • LangGraph

Observability

  • Arize Phoenix

Data Layer

  • PostgreSQL
  • Redis

Deployment

  • Google Cloud Run
  • Vertex AI

The platform ingests telemetry, traces, evaluations, and feedback from deployed AI systems. Gemini-powered investigation workflows analyze anomalies, identify root causes, generate incident reports, and recommend corrective actions.


Challenges we ran into

One of the biggest challenges was moving beyond traditional observability.

Most monitoring platforms stop at displaying dashboards and alerts. We wanted AgentGuard to autonomously investigate failures and explain why they happened.

Other challenges included:

  • Designing meaningful drift detection workflows
  • Building explainable root cause analysis
  • Correlating multiple reliability signals
  • Creating realistic failure simulations
  • Designing enterprise-grade dashboards that remain intuitive
  • Balancing technical depth with user experience

We also needed to ensure that the platform could monitor many different types of AI agents while maintaining a unified experience.


Accomplishments that we're proud of

We are especially proud of:

  • Building an AI system that monitors other AI systems
  • Creating the Agent Trust Score™ framework
  • Developing autonomous root cause analysis workflows
  • Designing the Drift Injection Simulator for live demonstrations
  • Creating a platform that feels like a production-ready SaaS product
  • Integrating AI observability with autonomous investigation capabilities

Most importantly, we transformed observability from passive monitoring into active AI governance.


What we learned

Through building AgentGuard, we learned that deploying AI agents is only the first step.

The real challenge begins after deployment.

We discovered that:

  • AI systems require continuous monitoring
  • Drift can occur silently over time
  • Hallucinations often emerge from multiple interacting factors
  • Root cause analysis is critical for enterprise trust
  • Observability becomes significantly more valuable when combined with autonomous reasoning

We also gained deeper experience with AI evaluation, observability, reliability engineering, and agentic workflows.


What's next for AgentGuard

Our vision is to become the reliability layer for the agent economy.

Future plans include:

Autonomous Remediation

Allow AgentGuard to automatically execute approved corrective actions instead of only recommending them.

Multi-Platform Integrations

Support:

  • GitHub
  • GitLab
  • Slack
  • Jira
  • ServiceNow
  • Datadog
  • OpenTelemetry

Advanced Governance

Provide enterprise governance features including:

  • Compliance monitoring
  • Risk assessment
  • Audit trails
  • AI policy enforcement

Agent Benchmarking

Compare reliability metrics across different models and agent architectures.

Predictive Reliability Intelligence

Use historical behavior patterns to forecast reliability risks before they emerge.

Enterprise Deployment

Support large-scale monitoring for organizations managing hundreds or thousands of AI agents.

Our long-term goal is simple:

Ensure every AI agent remains trustworthy, reliable, and safe throughout its lifecycle.

Built With

  • arize-phoenix
  • clerk-authentication
  • fastapi
  • framer-motion
  • gemini-2.5
  • google-cloud-run
  • langgraph
  • next.js-15
  • postgresql
  • python
  • recharts
  • redis
  • shadcn/ui
  • tailwind-css
  • tanstack-table
  • typescript
  • vertex-ai
  • zustand
Share this project:

Updates