Inspiration

Every company we talked to has the same problem: their team spends hours every day on work that follows clear, repeatable rules. A payment fails and someone manually emails the customer. A GitHub issue gets opened and someone manually triages it. A Slack message comes in and someone manually responds. It is not strategic work. It is operational noise that scales with headcount.

The existing solutions fall into two camps: rigid automations like Zapier that break the moment anything is slightly different, or chat-based AI assistants that require someone to prompt them. Neither solves the actual problem. We wanted to build something closer to how real delegation works: you hire someone, explain what they need to do, connect them to the right tools, and they figure it out, coming to you only when a decision genuinely requires you.

The Auth0 AI SDK's Token Vault and CIBA protocol gave us exactly the primitives we needed to make this real: a way for agents to securely access user credentials without holding them directly, and a standardized backchannel approval flow for the moments that do require human judgment.

What it does

Armada is an autonomous AI agent workforce platform. You hire AI employees through a wizard, describe the role in plain text, select which services they can access, and Armada's AI analyzes your instructions to auto-configure the exact event triggers the agent needs. No YAML. No workflow builder. No code.

Once hired, agents operate entirely on their own:

  • A Stripe payment fails and the agent reads the event, drafts and sends a recovery email
  • A GitHub issue is opened with a critical label and the agent triages it, comments, and assigns it
  • A Slack message mentions a customer complaint and the agent reads context and responds
  • Every morning at 9am the agent summarizes overnight activity and posts a briefing

Every agent starts at Level 0 Probationary with read-only access. They earn autonomy through verified successful work. When an agent wants to take a consequential action like sending an external email, issuing a refund, or merging a PR, it pauses and sends a push notification to your phone. One tap to approve or deny. The agent resumes instantly on approval.

Over time, as an agent builds a track record, it earns higher trust levels and needs less approval. Trust also decays with inactivity. Agents have to keep working to maintain their level.

System Architecture

How we built it

Event layer: Every integration uses the service's native push mechanism: Slack Events API, Stripe webhook signatures, Gmail Pub/Sub, GitHub webhook HMAC. No polling. Agents respond to events in seconds.

Auth0 AI SDK: The Token Vault handles all OAuth credential management. When an agent executes, it retrieves scoped tokens via backgroundTokenStorage so agents never hold user credentials in their execution context. CIBA (Client-Initiated Backchannel Authentication) handles the approval flow: when a gated action is attempted, execution pauses, a request is created via Auth0's backchannel auth endpoint, and Firebase FCM delivers a push notification to the user's Flutter mobile app. On approval, the agent resumes with a time-scoped token.

Agent runtime: Each agent run resolves the agent's config, retrieves tokens, builds a context-aware system prompt that includes the agent's current trust level and available tool permissions, then streams execution through Gemini 2.5 Pro via the Vercel AI SDK. Tool calls are intercepted at the registry layer and either executed if trust level permits, or escalated to CIBA.

Trust engine: Trust is computed with a 7-day exponential decay half-life. Successful autonomous actions award points. CIBA-approved actions award reduced points. Failed actions deduct points. Trust levels enforce permissions at the tool registry, not in the prompt, which would be trivially bypassed.

Stack: Next.js 16 + React 19 on Vercel Edge, Neon Postgres + Drizzle ORM, Flutter mobile app, Firebase Admin for push notifications, Tailwind CSS v4 + Framer Motion for the UI.

CIBA Approval Flow

Challenges we ran into

Token Vault async context propagation was the hardest technical problem. Auth0's backgroundTokenStorage uses Node.js AsyncLocalStorage to pass token context through the call stack, but when execution crosses async boundaries inside the AI streaming loop, the context silently drops. We had to carefully structure the runner to ensure the storage context wraps the entire tool execution chain, not just the outer function call.

CIBA integration with streaming AI execution required us to design a clean pause and resume mechanism. The AI SDK streams tool calls as they happen. We had to intercept mid-stream, persist run state to the database, park the execution, wait for the mobile approval, and resume without losing context. Getting the timeout handling right with a 5-minute window and proper cleanup on expiry took several iterations.

Webhook signature verification across four services each with its own signature scheme required careful middleware design. Stripe uses HMAC-SHA256 with a timestamp to prevent replay attacks. Slack uses a similar scheme. GitHub uses a different HMAC format. Gmail Pub/Sub uses a Google-signed JWT. Getting all four right with proper error handling that does not leak information was more involved than expected.

Trust decay math needed to be computed on read rather than stored on write, otherwise we would need a background job running constantly to degrade scores. We compute decayed score at query time using the formula below, making the decay seamless without additional infrastructure:

$$score_{decayed} = score \times 0.5^{\frac{t}{604800}}$$

where $t$ is seconds since last action and $604800$ is the number of seconds in 7 days.

Accomplishments that we're proud of

The moment we got a real Stripe payment_intent.payment_failed webhook fire an agent that autonomously drafted and sent a recovery email without anyone touching anything was the moment Armada felt real. Watching the Activity Feed show the full execution trace in under 10 seconds was genuinely exciting.

We are also proud of how the trust system works in practice. The permission enforcement is real, the decay is real, and the CIBA escalation path is a real OAuth 2.0 protocol. Agents actually cannot take actions above their trust level. That design decision, enforcing permissions at the tool registry layer rather than the prompt, is what makes it trustworthy rather than just narratively interesting.

Trust System

What we learned

The Auth0 AI SDK's CIBA implementation handles most of the protocol complexity we would otherwise have had to build from scratch: the backchannel request creation, the polling and notification split, the token scoping on approval. Understanding how to compose it with our own approval UI rather than replacing it was the key insight. Auth0 handles the cryptographic approval chain and we handle the UX on top.

We also learned that building an autonomous agent platform is an infrastructure problem as much as an AI problem. The AI part, prompting Gemini and getting reliable tool call behavior, was relatively straightforward. The hard parts were secure credential management, event routing, pause and resume across async boundaries, and audit logging that is actually useful. Those are engineering problems, not AI problems.

What's next for Armada

  • Agent-to-agent delegation where senior agents assign sub-tasks to junior agents with scoped permissions
  • Multi-user workspaces with team-level agent management and role-based access to approve or deny CIBA requests
  • Expanded integrations including Linear, Notion, Salesforce, and HubSpot
  • Trust analytics for visualizing trust trajectories over time and identifying which action types drive growth
  • Agent template marketplace where teams can hire pre-built agents for common roles in one click

Bonus Blog Post

What Building Armada Taught Us About Token Vault (The Hard Way)

When we started Armada, the pitch was simple: AI employees that act on real events and earn trust over time. The architecture was clear in our heads. The Auth0 AI SDK would handle credentials. Agents would call tools. Everything would work.

It did not work. Not even close, at first.

The first wall we hit was Token Vault async context propagation. Auth0's backgroundTokenStorage uses Node.js AsyncLocalStorage to thread token context through the call stack. In theory, you wrap your agent execution in the storage context and tokens are available anywhere downstream. In practice, when execution crosses async boundaries inside the Vercel AI SDK's streaming loop, the context silently drops. No error. No warning. The agent just could not retrieve tokens and failed quietly. We spent the better part of a day adding log statements at every layer before we traced it back to where exactly the context was escaping. The fix was straightforward once we understood it: the storage context has to wrap the entire tool execution chain, not just the top-level runner function. But finding that took real time.

The second wall was GitHub. We assumed connecting GitHub through Auth0 Token Vault would work the same as Google. It does not. GitHub OAuth Apps do not support refresh tokens, which means Token Vault cannot perform the RFC 8693 token exchange it needs. We had to switch to a GitHub App, configure token expiration, and update the Auth0 connection settings. That was not in any tutorial. We found it by reading the Token Vault source and working backwards from the error.

The third thing we learned was more philosophical. Token Vault is not just a convenience wrapper. It changes the security model of your entire application. Before Token Vault, the natural pattern is to retrieve a user's stored OAuth token and pass it into your agent execution context. That token then lives in memory, in logs, potentially in your AI provider's request trace. With Token Vault, the agent never holds the credential. It requests a scoped token at execution time, uses it, and that is it. The credential never touches your application layer directly. Once we understood that was the point, not just a feature, the architecture decisions became much clearer.

Building Armada with Token Vault was harder than we expected and more correct than we could have achieved without it.

Built With

Share this project:

Updates