Agent Guardian - A Trust Layer for AI Agents

Inspiration

As AI agents become more autonomous and capable of acting on our behalf—creating GitHub issues, sending emails, managing Slack channels—a critical question emerged: How do we trust them with our digital lives?

The inspiration came from observing the gap between AI capability and user control. Current AI agents operate in binary modes: either fully manual (requiring approval for everything) or fully autonomous (hoping nothing goes wrong). We needed a nuanced trust model that adapts to action sensitivity.

Think of it like banking: small purchases auto-approve, medium ones send notifications, and large transfers require additional verification. Why shouldn't AI agents work the same way?

What It Does

Agent Guardian is a middleware trust layer that sits between AI agents and third-party APIs. It:

  1. Classifies every action into three tiers:

    • AUTO - Safe actions (reading data) execute immediately
    • NUDGE - Sensitive actions (creating issues) wait for 60-second approval window
    • STEP_UP - High-risk actions (merging to main, deleting) require MFA-backed confirmation
  2. Enforces approval flows before execution, with real-time notifications via Socket.IO

  3. Provides a dashboard for users to:

    • Connect OAuth services (GitHub, Gmail, Slack, Notion)
    • Tune per-action permissions
    • Approve or deny pending actions
    • Audit complete action history
  4. Securely manages tokens using Auth0's Token Vault pattern—provider tokens are never stored in our database

  5. Includes a CLI agent powered by OpenRouter that demonstrates the full flow

How We Built It

Architecture

┌─────────────┐         ┌──────────────┐         ┌─────────────┐
│   CLI Agent │────────▶│   Guardian   │────────▶│  Providers  │
│  (OpenAI)   │  M2M    │     API      │  OAuth  │ (GH, Gmail) │
└─────────────┘  Token  └──────────────┘  Tokens └─────────────┘
                              │
                              ▼
                        ┌──────────────┐
                        │  Dashboard   │
                        │ (React + WS) │
                        └──────────────┘

Tech Stack

Frontend:

  • React 18 + Vite + Tailwind CSS
  • TanStack Query for server state
  • Zustand for client state
  • Auth0 React SDK for authentication
  • Socket.IO client for real-time updates

Backend:

  • Node.js + Express + TypeScript
  • Prisma ORM with PostgreSQL
  • Redis + BullMQ for job queues
  • Socket.IO for WebSocket connections
  • Zod for runtime validation
  • Winston for structured logging

Agent:

  • OpenAI SDK → OpenRouter (GPT-4o-mini)
  • TypeScript with tool-calling support
  • Git integration for repository context

Auth & Security:

  • Auth0 Universal Login
  • Auth0 Management API for Token Vault retrieval
  • Machine-to-Machine (M2M) credentials for agent
  • Custom Auth0 Action for user binding

Key Implementation Details

1. Tier Classification Engine

The heart of Guardian is classifyTier() in apps/api/src/services/tierClassifier.ts:

function classifyTier(
  service: string,
  actionType: string,
  userOverrides?: Record<string, ActionTier>
): ActionTier {
  // User overrides take precedence
  if (userOverrides?.[actionType]) {
    return userOverrides[actionType];
  }

  // Fall back to system defaults
  return DEFAULT_ACTION_TIERS[actionType] || 'STEP_UP';
}

This simple but powerful function enables user-customizable trust boundaries.

2. Token Vault Pattern

We never store raw OAuth tokens. Instead, we fetch them on-demand from Auth0:

async function getServiceToken(userId: string, service: string) {
  const connection = await prisma.connection.findUnique({
    where: { userId_service: { userId, service } }
  });

  // Fetch fresh token from Auth0 Management API
  const token = await auth0Management.users.getAccessToken(
    userId,
    connection.auth0ConnectionId
  );

  return token;
}

This approach minimizes attack surface—if our database is compromised, no provider tokens are exposed.

3. Real-Time Approval Flow

When a NUDGE or STEP_UP action is requested:

  1. Job is queued in BullMQ with status pending
  2. Socket.IO emits event to user's dashboard
  3. User approves/denies in UI
  4. Job status updates, worker executes or cancels
  5. Result streams back to agent via polling or WebSocket

The 60-second timeout for NUDGE actions is enforced at the job level:

await actionQueue.add('execute', payload, {
  timeout: 60000, // 60 seconds
  removeOnComplete: false // Keep for audit
});

4. Agent User Resolution

In development, the agent acts as the most recently active user (determined by updatedAt timestamp). In production, an Auth0 M2M Action injects a custom claim. The API middleware extracts this claim to bind the agent to a specific user.

5. Repository-Aware GitHub Context

The CLI agent reads .git/config to infer the current repository:

const gitConfig = execSync('git config --get remote.origin.url')
  .toString()
  .trim();
const match = gitConfig.match(/github\.com[:/](.+?)\/(.+?)(\.git)?$/);
if (match) {
  context.repo = { owner: match[1], name: match[2] };
}

This enables natural language like "create an issue in this repo" without explicit naming.

Challenges We Faced

1. Auth0 Token Vault Complexity

Challenge: Auth0's Token Vault (now "Token Storage") requires precise Management API permissions and connection configuration. Initial attempts returned empty tokens.

Solution: We discovered that:

  • Token Vault must be explicitly enabled per social connection
  • Management API needs read:user_idp_tokens scope
  • Connection must be linked to the user's identity

We built a robust error-handling flow that guides users to reconnect when tokens are missing.

2. Async Approval UX

Challenge: How do you make an agent "wait" for user approval without blocking the entire process?

Solution: We implemented a hybrid polling + WebSocket approach:

  • Agent polls GET /api/v1/agent/action/:jobId/status every 2 seconds
  • Dashboard receives instant Socket.IO notifications
  • BullMQ handles job lifecycle and timeouts

This gives the illusion of synchronous execution while maintaining scalability.

3. Tier Classification Granularity

Challenge: Defining the "right" default tier for each action. Is github.comment_issue safe (AUTO) or sensitive (NUDGE)?

Solution: We started with conservative defaults (most actions are NUDGE or STEP_UP) and built a user override system. Users can tune each action's tier in the dashboard, and their preferences persist in the database.

The formula for trust:

$$ \text{Trust}(a) = \begin{cases} \text{UserOverride}(a) & \text{if exists} \ \text{DefaultTier}(a) & \text{otherwise} \ \text{STEP_UP} & \text{if unknown} \end{cases} $$

4. Agent Framework Integration

Challenge: Making Guardian work with existing AI frameworks (LangChain, CrewAI, AutoGPT) without requiring major rewrites.

Solution: We designed Guardian as a drop-in REST API. Any agent that can make HTTP requests can integrate:

// Before: Direct API call
await octokit.issues.create({ owner, repo, title, body });

// After: Via Guardian
await fetch(`${GUARDIAN_API_URL}/api/v1/agent/action`, {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${token}` },
  body: JSON.stringify({
    service: 'github',
    actionType: 'github.create_issue',
    payload: { owner, repo, title, body }
  })
});

This middleware approach means Guardian can protect any agent, regardless of implementation. This balances developer experience with production security.

Bonus Blog Post

What We Learned

Technical Insights

  1. Auth0 is incredibly powerful when you understand its primitives. The combination of Universal Login, Management API, and Actions creates a flexible identity platform.

  2. Token security is hard. The Token Vault pattern adds complexity but dramatically reduces risk. We learned to treat tokens as ephemeral—fetch on demand, never persist.

  3. Real-time UX requires careful orchestration. Socket.IO, BullMQ, and polling must work in harmony. We learned to embrace eventual consistency.

  4. AI agents need guardrails. Even with GPT-4, agents make mistakes. A trust layer isn't just about security—it's about user confidence.

Design Insights

  1. Three tiers is the sweet spot. We experimented with more granular levels (5+ tiers) but found it overwhelming. AUTO, NUDGE, STEP_UP maps cleanly to user mental models.

  2. Defaults matter. Most users won't customize tiers, so conservative defaults (bias toward NUDGE/STEP_UP) build trust.

  3. Audit logs are non-negotiable. Users need to see what their agent did, when, and why. Transparency builds trust.

Auth0-Specific Learnings

  1. Management API is the secret weapon. Token retrieval, user metadata, connection management—it's all there.

  2. Actions are underutilized. The credentials-exchange hook is perfect for agent-user binding.

  3. Social connections + Token Vault = OAuth without the pain. We didn't have to implement GitHub/Google/Slack OAuth flows ourselves.

What's Next

Short-Term

  • More providers: Linear, Jira, Google Calendar, Stripe
  • Richer approval UI: Show diffs for code changes, preview email content
  • Team support: Shared agents with role-based approval workflows
  • Audit analytics: Visualize agent behavior over time

Long-Term

  • Policy engine: Define approval rules in code (e.g., "auto-approve issues with < 100 chars")
  • Agent marketplace: Pre-configured agents for common workflows (e.g., "PR reviewer", "standup bot")
  • Federated trust: Allow agents to act across organizations with delegated permissions
  • ML-based tier tuning: Learn from user approvals to suggest tier adjustments

Research Questions

  • Can we use anomaly detection to flag suspicious agent behavior?
  • How do we handle multi-step workflows where later actions depend on earlier approvals?
  • What's the right UX for bulk approvals (e.g., "approve all read actions")?

Conclusion

Agent Guardian proves that AI autonomy and user control aren't mutually exclusive. By introducing a nuanced trust model, we enable agents to be both powerful and safe.

Auth0 was instrumental in making this possible. Without Token Vault, we'd be storing sensitive credentials. Without the Management API, we'd be building OAuth flows from scratch. Without Actions, agent-user binding would require custom infrastructure.

We believe Guardian represents a new primitive for the AI agent ecosystem—a trust layer that any agent can adopt, regardless of framework or implementation.

The future of AI isn't fully autonomous agents or fully manual tools. It's collaborative intelligence where agents act, users approve, and trust is earned through transparency.


Share this project:

Updates