Inspiration

The hardest problem in AI agents is not intelligence, it is authorization. Who decides what the AI is allowed to do? Every AI assistant that connects to your Gmail is holding a token that can read and send your email. Most demos store that token in an env variable and move on. That felt like a design failure.

What it does

Briefcase is an AI assistant that manages your Google workspace through natural conversation.

Gmail: Read your unread emails with summaries, send new emails, and reply in-thread. Briefcase looks up the original thread server-side so replies land in the right conversation.

Google Calendar: Check your availability for any time or date, get a BUSY or FREE verdict, and read upcoming events.

Google Drive: Search files by keyword or list recent documents across your entire Drive.

Google Contacts: Look up anyone by name to find their email address instantly.

Human-in-the-Loop approvals: Every write action — sending email, creating calendar events — is blocked behind an approval card. The AI drafts the full action (recipient, subject, body), presents it for review, and waits. You can edit any field before approving or deny it entirely. Nothing executes without explicit confirmation.

Activity log: A full audit trail of every tool call the AI made. Filterable by service (Gmail, Calendar, Drive, Contacts). Shows whether each action was auto-read, approved, denied, or errored.

Permissions dashboard: See every connected service and every OAuth scope granted. Reconnect or disconnect any service in one click.

HITL toggle: Turn Human-in-the-Loop on or off from Settings. When off, write actions execute automatically (useful for power users who trust the AI).

Chat persistence: Conversations survive page refresh, stored in Postgres per user.

How we built it

  • Auth0 Token Vault stores all Google OAuth tokens server-side. The app never holds credentials. Tokens are fetched via getAccessTokenForConnection per request.
  • MRRT keeps the session alive across Gmail, Calendar, Drive, and Contacts without re-prompting.
  • GPT-4o handles tool calling rounds. GPT-4o-mini handles streaming responses.
  • Neon Postgres stores chat history, activity log, and user settings.
  • Next.js 16 with Vercel AI SDK for streaming and the raw OpenAI SDK for structured tool calls.

Challenges we ran into

Zod v4 is incompatible with the Vercel AI SDK tool schema format, which forced a switch to raw jsonSchema() definitions for all tools. Auth0 Token Vault required careful setup: the My Account API, MRRT, and Token Vault grant type all need to be configured together or token exchange silently fails.

Accomplishments we are proud of

The HITL approval card pattern. It turned out to be the most important design decision in the whole project. Realizing that authorization is a first-class UX problem, not a settings toggle, changed how the entire app was structured.

What we learned

The bottleneck for AI agents in real workflows is not capability, it is trust infrastructure. Token Vault handles credential security. HITL handles action authorization. The activity log handles accountability. Together they answer the question every user implicitly asks: can I trust this thing with my inbox?

What's next

The same architecture extends to any OAuth service. Slack, Notion, GitHub. Token Vault and HITL are provider-agnostic patterns that scale without rebuilding the trust layer each time.


Bonus Blog Post: Building Briefcase — AI That Asks Before It Acts

When I started this project, I thought the hard problem was connecting GPT-4 to Gmail. That took two days. The problem that took the rest of the time was figuring out how to make an AI agent that users could actually trust.

The credential problem nobody talks about

Every AI assistant that connects to your Gmail is holding an OAuth token that can read and send your email. Most demos store that token in an environment variable and move on. That felt like a design failure to me.

Auth0 Token Vault changed the architecture entirely. Tokens are stored server-side, scoped per user, and never exposed to the application layer. The app requests a token when it needs one through getAccessTokenForConnection. Multi-resource refresh tokens keep the session alive across Gmail, Calendar, Drive, and Contacts without re-prompting the user. The result: Briefcase never holds your credentials. Token Vault does.

This is not just a security detail. It changes what you can promise users about what the app can and cannot do with their data.

The moment I realized the first version was wrong

The first working version of Briefcase just executed everything. Ask it to reply to an email, it replied. Fast, impressive, deeply uncomfortable.

I sent a test email to myself and immediately felt uneasy. The AI had taken an irreversible action on my behalf and I had not explicitly said yes. That feeling pointed at the real design problem: the missing layer was not capability, it was authorization.

Write actions now go through a Human-in-the-Loop approval card before execution. The agent drafts the email, shows the full content including recipient, subject, and body, and waits. The user can edit any field, then approve or deny. Nothing sends without an explicit decision.

This is not a friction point. It is the feature. The approval card is where user control becomes real rather than theoretical.

Building for transparency, not just security

Security without visibility is still a trust problem. So every tool call Briefcase makes gets logged: which service it accessed, what action it took, whether it was approved or blocked, and when. The activity log gives users a complete audit trail of what the AI did in their name.

Combined with a permissions dashboard showing every connected service and every scope granted, users can see exactly what Briefcase can and cannot access. One click disconnects any service. This level of control is not standard in AI assistants and it should be.

What four integrations taught me about architecture

Briefcase connects to Gmail, Google Calendar, Drive, and Contacts through a single OAuth connection managed by Auth0. Adding each service reinforced the same pattern: Token Vault handles the credential layer uniformly, HITL handles the authorization layer uniformly, and the activity log handles the accountability layer uniformly.

The pattern scales in a way that per-integration hacks do not. Slack, Notion, GitHub, any OAuth service fits the same architecture. The work is in the integrations, not in rebuilding the trust infrastructure each time.

The insight that surprised me

I expected to learn about AI capabilities building this. What I actually learned is that the bottleneck for AI agents in real workflows is not intelligence, it is trust infrastructure. Users are not hesitant about AI because it is not smart enough. They are hesitant because they do not know what it will do next.

Briefcase is one answer to that: transparent about what it can access, explicit about what it is about to do, and auditable after the fact. That combination, Token Vault plus HITL plus activity log, is what makes an AI agent something a person can actually hand their inbox to.

Built With

  • auth0
  • mrrt
  • neon-postgres
  • next.js
  • openai-gpt-4o
  • shadcn/ui
  • tailwind-css
  • token-vault
  • typescript
  • vercel-ai-sdk
Share this project:

Updates