Agent Guardian - A Trust Layer for AI Agents
Inspiration
As AI agents become more autonomous and capable of acting on our behalf—creating GitHub issues, sending emails, managing Slack channels—a critical question emerged: How do we trust them with our digital lives?
The inspiration came from observing the gap between AI capability and user control. Current AI agents operate in binary modes: either fully manual (requiring approval for everything) or fully autonomous (hoping nothing goes wrong). We needed a nuanced trust model that adapts to action sensitivity.
Think of it like banking: small purchases auto-approve, medium ones send notifications, and large transfers require additional verification. Why shouldn't AI agents work the same way?
What It Does
Agent Guardian is a middleware trust layer that sits between AI agents and third-party APIs. It:
Classifies every action into three tiers:
AUTO- Safe actions (reading data) execute immediatelyNUDGE- Sensitive actions (creating issues) wait for 60-second approval windowSTEP_UP- High-risk actions (merging to main, deleting) require MFA-backed confirmation
Enforces approval flows before execution, with real-time notifications via Socket.IO
Provides a dashboard for users to:
- Connect OAuth services (GitHub, Gmail, Slack, Notion)
- Tune per-action permissions
- Approve or deny pending actions
- Audit complete action history
Securely manages tokens using Auth0's Token Vault pattern—provider tokens are never stored in our database
Includes a CLI agent powered by OpenRouter that demonstrates the full flow
How We Built It
Architecture
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ CLI Agent │────────▶│ Guardian │────────▶│ Providers │
│ (OpenAI) │ M2M │ API │ OAuth │ (GH, Gmail) │
└─────────────┘ Token └──────────────┘ Tokens └─────────────┘
│
▼
┌──────────────┐
│ Dashboard │
│ (React + WS) │
└──────────────┘
Tech Stack
Frontend:
- React 18 + Vite + Tailwind CSS
- TanStack Query for server state
- Zustand for client state
- Auth0 React SDK for authentication
- Socket.IO client for real-time updates
Backend:
- Node.js + Express + TypeScript
- Prisma ORM with PostgreSQL
- Redis + BullMQ for job queues
- Socket.IO for WebSocket connections
- Zod for runtime validation
- Winston for structured logging
Agent:
- OpenAI SDK → OpenRouter (GPT-4o-mini)
- TypeScript with tool-calling support
- Git integration for repository context
Auth & Security:
- Auth0 Universal Login
- Auth0 Management API for Token Vault retrieval
- Machine-to-Machine (M2M) credentials for agent
- Custom Auth0 Action for user binding
Key Implementation Details
1. Tier Classification Engine
The heart of Guardian is classifyTier() in apps/api/src/services/tierClassifier.ts:
function classifyTier(
service: string,
actionType: string,
userOverrides?: Record<string, ActionTier>
): ActionTier {
// User overrides take precedence
if (userOverrides?.[actionType]) {
return userOverrides[actionType];
}
// Fall back to system defaults
return DEFAULT_ACTION_TIERS[actionType] || 'STEP_UP';
}
This simple but powerful function enables user-customizable trust boundaries.
2. Token Vault Pattern
We never store raw OAuth tokens. Instead, we fetch them on-demand from Auth0:
async function getServiceToken(userId: string, service: string) {
const connection = await prisma.connection.findUnique({
where: { userId_service: { userId, service } }
});
// Fetch fresh token from Auth0 Management API
const token = await auth0Management.users.getAccessToken(
userId,
connection.auth0ConnectionId
);
return token;
}
This approach minimizes attack surface—if our database is compromised, no provider tokens are exposed.
3. Real-Time Approval Flow
When a NUDGE or STEP_UP action is requested:
- Job is queued in BullMQ with status
pending - Socket.IO emits event to user's dashboard
- User approves/denies in UI
- Job status updates, worker executes or cancels
- Result streams back to agent via polling or WebSocket
The 60-second timeout for NUDGE actions is enforced at the job level:
await actionQueue.add('execute', payload, {
timeout: 60000, // 60 seconds
removeOnComplete: false // Keep for audit
});
4. Agent User Resolution
In development, the agent acts as the most recently active user (determined by updatedAt timestamp). In production, an Auth0 M2M Action injects a custom claim.
The API middleware extracts this claim to bind the agent to a specific user.
5. Repository-Aware GitHub Context
The CLI agent reads .git/config to infer the current repository:
const gitConfig = execSync('git config --get remote.origin.url')
.toString()
.trim();
const match = gitConfig.match(/github\.com[:/](.+?)\/(.+?)(\.git)?$/);
if (match) {
context.repo = { owner: match[1], name: match[2] };
}
This enables natural language like "create an issue in this repo" without explicit naming.
Challenges We Faced
1. Auth0 Token Vault Complexity
Challenge: Auth0's Token Vault (now "Token Storage") requires precise Management API permissions and connection configuration. Initial attempts returned empty tokens.
Solution: We discovered that:
- Token Vault must be explicitly enabled per social connection
- Management API needs
read:user_idp_tokensscope - Connection must be linked to the user's identity
We built a robust error-handling flow that guides users to reconnect when tokens are missing.
2. Async Approval UX
Challenge: How do you make an agent "wait" for user approval without blocking the entire process?
Solution: We implemented a hybrid polling + WebSocket approach:
- Agent polls
GET /api/v1/agent/action/:jobId/statusevery 2 seconds - Dashboard receives instant Socket.IO notifications
- BullMQ handles job lifecycle and timeouts
This gives the illusion of synchronous execution while maintaining scalability.
3. Tier Classification Granularity
Challenge: Defining the "right" default tier for each action. Is github.comment_issue safe (AUTO) or sensitive (NUDGE)?
Solution: We started with conservative defaults (most actions are NUDGE or STEP_UP) and built a user override system. Users can tune each action's tier in the dashboard, and their preferences persist in the database.
The formula for trust:
$$ \text{Trust}(a) = \begin{cases} \text{UserOverride}(a) & \text{if exists} \ \text{DefaultTier}(a) & \text{otherwise} \ \text{STEP_UP} & \text{if unknown} \end{cases} $$
4. Agent Framework Integration
Challenge: Making Guardian work with existing AI frameworks (LangChain, CrewAI, AutoGPT) without requiring major rewrites.
Solution: We designed Guardian as a drop-in REST API. Any agent that can make HTTP requests can integrate:
// Before: Direct API call
await octokit.issues.create({ owner, repo, title, body });
// After: Via Guardian
await fetch(`${GUARDIAN_API_URL}/api/v1/agent/action`, {
method: 'POST',
headers: { 'Authorization': `Bearer ${token}` },
body: JSON.stringify({
service: 'github',
actionType: 'github.create_issue',
payload: { owner, repo, title, body }
})
});
This middleware approach means Guardian can protect any agent, regardless of implementation. This balances developer experience with production security.
Bonus Blog Post
What We Learned
Technical Insights
Auth0 is incredibly powerful when you understand its primitives. The combination of Universal Login, Management API, and Actions creates a flexible identity platform.
Token security is hard. The Token Vault pattern adds complexity but dramatically reduces risk. We learned to treat tokens as ephemeral—fetch on demand, never persist.
Real-time UX requires careful orchestration. Socket.IO, BullMQ, and polling must work in harmony. We learned to embrace eventual consistency.
AI agents need guardrails. Even with GPT-4, agents make mistakes. A trust layer isn't just about security—it's about user confidence.
Design Insights
Three tiers is the sweet spot. We experimented with more granular levels (5+ tiers) but found it overwhelming.
AUTO,NUDGE,STEP_UPmaps cleanly to user mental models.Defaults matter. Most users won't customize tiers, so conservative defaults (bias toward
NUDGE/STEP_UP) build trust.Audit logs are non-negotiable. Users need to see what their agent did, when, and why. Transparency builds trust.
Auth0-Specific Learnings
Management API is the secret weapon. Token retrieval, user metadata, connection management—it's all there.
Actions are underutilized. The credentials-exchange hook is perfect for agent-user binding.
Social connections + Token Vault = OAuth without the pain. We didn't have to implement GitHub/Google/Slack OAuth flows ourselves.
What's Next
Short-Term
- More providers: Linear, Jira, Google Calendar, Stripe
- Richer approval UI: Show diffs for code changes, preview email content
- Team support: Shared agents with role-based approval workflows
- Audit analytics: Visualize agent behavior over time
Long-Term
- Policy engine: Define approval rules in code (e.g., "auto-approve issues with < 100 chars")
- Agent marketplace: Pre-configured agents for common workflows (e.g., "PR reviewer", "standup bot")
- Federated trust: Allow agents to act across organizations with delegated permissions
- ML-based tier tuning: Learn from user approvals to suggest tier adjustments
Research Questions
- Can we use anomaly detection to flag suspicious agent behavior?
- How do we handle multi-step workflows where later actions depend on earlier approvals?
- What's the right UX for bulk approvals (e.g., "approve all read actions")?
Conclusion
Agent Guardian proves that AI autonomy and user control aren't mutually exclusive. By introducing a nuanced trust model, we enable agents to be both powerful and safe.
Auth0 was instrumental in making this possible. Without Token Vault, we'd be storing sensitive credentials. Without the Management API, we'd be building OAuth flows from scratch. Without Actions, agent-user binding would require custom infrastructure.
We believe Guardian represents a new primitive for the AI agent ecosystem—a trust layer that any agent can adopt, regardless of framework or implementation.
The future of AI isn't fully autonomous agents or fully manual tools. It's collaborative intelligence where agents act, users approve, and trust is earned through transparency.
Log in or sign up for Devpost to join the conversation.