Building an AI Employee Marketplace

Inspiration

The idea started with a simple frustration: every "AI employee" product on the market either requires a technical team to configure it, costs thousands of dollars a month, or is just a chatbot you have to go visit in a separate tab. None of them actually show up the way a real hire does.

When you bring on a new employee, they get an email address. They join your Slack. They get access to your Google Drive. They introduce themselves. They ask questions about how your team works. Then they start doing things — and you correct them until you trust them.

No AI product worked that way. So I built one that does.

The competitive signal that sharpened the idea: monday.com launched Agentalent.ai — an AI agent hiring platform backed by AWS and Anthropic — on March 23, 2026, three days before this hackathon. It targets enterprises with procurement departments. The 33 million small businesses that don't have IT teams, don't have procurement processes, and just need someone to show up and handle the work — they're completely unaddressed. That's the market I built for.


How I built it

The architecture has three layers working together.

The agent itself is built on OpenClaw with Claude Haiku as the underlying model. It receives email via AgentMail (giving it a real inbox at its own address), has access to Google Workspace through its own Google service account, and operates through 10 skill files — markdown documents that describe how to handle each class of task. When a high-risk action is needed (sending an external email, creating a calendar event), the agent routes it to an approval queue and waits. The human approves, edits, or rejects. Only then does the agent act.

The trust system is the core mechanic that makes the coworker framing real. Every task type has a trust score — a recency-weighted approval rate:

$$w_i = \frac{1}{1 + e^{-k(t_i - t_0)}}$$

where $w_i$ is the recency weight for approval $i$, $t_i$ is the timestamp, and $t_0$ is the current time. The weighted score determines autonomy level:

$$\text{autonomy} = \begin{cases} \text{auto-execute} & \text{if } \bar{s} \geq 0.95 \text{ and } n \geq 20 \ \text{queue if stakes} > 5 & \text{if } 0.60 \leq \bar{s} < 0.95 \ \text{always queue} & \text{if } \bar{s} < 0.60 \end{cases}$$

This means autonomy is earned through a track record, not assumed on day one. The agent starts with zero autonomy and works its way up task by task — exactly how you'd treat a new hire.

The marketplace is a Turborepo monorepo with a Next.js 15 frontend and a separate Node.js provisioning service running BullMQ jobs. When a company clicks hire, the provisioning pipeline:

  1. Creates an AgentMail inbox for the agent's email identity
  2. Creates a Google service account for its Workspace identity
  3. Builds and starts a Docker container with the agent package injected
  4. Triggers the agent to send its first onboarding email

The entire flow — from payment to agent emailing the hiring manager — targets under 5 minutes.

The framework-agnostic adapter was a key architectural decision. Rather than locking creators into OpenClaw, any agent framework (LangGraph, CrewAI, custom Python) can run on the platform by implementing three endpoints: POST /task, POST /internal/approvals/:id/resolve, and GET /internal/health. The platform handles provisioning, billing, approval routing, and memory — the creator just builds the brain.

AgentMind is the collective knowledge layer — genuinely novel in the market. Agents contribute anonymized strategies and blockers when they solve something hard. Other agents query it. The privacy pipeline is entirely deterministic: five independent layers run in sequence before any content is stored.

The Shannon entropy filter catches credentials that don't match any known pattern by flagging statistically random strings:

$$H(X) = -\sum_{i} p(x_i) \log_2 p(x_i)$$

Strings with $H > 4.2$ bits per character and length $> 16$ are treated as credentials and redacted. English prose typically scores between 3.8 and 4.2; truly random strings score near 5.0.


Challenges

The identity problem was the first hard challenge. Giving an AI agent a genuine workplace identity — not a cosmetic label, but an actual Google account that appears in Drive access lists and Calendar invites — requires Google service accounts provisioned programmatically via the Cloud IAM API. Getting this right, and understanding why a service account behaves differently from a user account in different Workspace contexts, took significant iteration.

Deterministic safety without LLMs was the design constraint that shaped AgentMind. The tempting solution was to have Claude decide what's safe to share. I rejected it. LLMs hallucinate. A guardrail that relies on a model to correctly identify PII 100% of the time is not a guardrail — it's a hope. Building five independent deterministic layers (regex, NER, entropy, memory tier gate, post-redaction block) so that each catches what the others miss was slower but the only defensible approach.

The framework-agnostic contract required careful thinking about what the platform actually needs to own versus what creators own. Too much platform control kills creator creativity. Too little creates an inconsistent and potentially unsafe experience for companies. The adapter contract — just three endpoints, with the approval queue block injected by the platform at deploy time regardless of what the creator submitted — turned out to be the right line.

OpenClaw's security model is a known issue. The platform addresses it architecturally: every agent runs in an isolated container with a network allowlist enforced at the infrastructure level. The agent's code is irrelevant — packets don't leave the container to non-allowlisted domains. This is containment, not trust. The longer-term path is replacing OpenClaw with a purpose-built minimal runtime, which the adapter contract makes straightforward — swapping the runtime inside the container doesn't require touching the marketplace.


What I learned

The most important insight wasn't technical: the coworker framing only works if every detail supports it. An agent with a @yourmarketplace.com email but no Google identity, or a Google identity but no Slack presence, breaks the illusion. Users notice immediately. Every integration decision had to be made with "does this feel like a real teammate?" as the test, not "is this technically functional?"

The second insight: deterministic beats probabilistic for anything that touches privacy. It's slower to build, harder to maintain, and less flexible. But when you're building infrastructure that handles real company data, the only acceptable answer to "what might leak?" is "nothing, because the code makes it structurally impossible" — not "probably nothing, the model usually catches it."

Built With

  • agentmail
  • bullmq-databases:-postgresql-(prisma-orm)
  • calendar)
  • clerk-libraries:-spacy-(local-ner)
  • cloudflare-r2
  • docker
  • docs
  • drive
  • fastapi
  • google-workspace-apis-(gmail
  • lancedb
  • lancedb-(vector-memory)-cloud-&-infrastructure:-docker
  • langchain/langgraph
  • nextjs
  • openclaw
  • postgresql
  • pptxgenjs
  • python
  • python-frameworks:-next.js-15
  • react-icons
  • redis
  • sharp
  • sheets
  • slack-api
  • sqlite
  • stripe
  • stripe-connect
  • turborepo-apis-&-services:-anthropic-claude-api
  • typescript
Share this project:

Updates