AgentOS

Inspiration

Setting up an open-source AI agent is a DevOps project. Docker configs, OAuth registration, prompt engineering, hosting — by the time you're done, you've spent a weekend on infrastructure before the thing has done a single useful task. The ops lead drowning in support tickets doesn't have that weekend. The small eng team behind on PR reviews definitely doesn't.

We looked at the existing "AI employee" products — Lindy, Sintra, Relevance AI — and kept seeing the same thing: glorified Zapier with better marketing copy. They call their automations "employees" but they're really just workflows. A workflow runs when triggered. An employee owns an outcome. "When a PR opens, run this check" is not the same as "I will review every PR within 10 minutes, remember your team's style preferences, and get better at it over time."

The Fiverr model clicked for us. A bakery owner can't design their own logo, so they go to a talent directory, browse profiles, pick someone, and hire them. We wanted that exact experience for AI teammates — browse, read what they do, connect your tools, hire. No code. No config files. Done in two clicks.

What it does

The platform packages AI agent instances as specialized, containerized employees that teams can browse and hire through a talent directory. Each employee runs in its own isolated Docker container with a curated skill set, scoped permissions, and its own memory.

We shipped two starter employees for the hackathon:

Code Review Engineer — connects to GitHub via OAuth, reviews PRs within minutes of opening, posts inline comments and summary reviews. Scoped to read and comment only — it can never merge.

Customer Support — connects to Slack and Gmail, triages incoming support messages, categorizes them (bug, billing, feature request), and drafts responses. Never auto-sends without approval.

The hire flow is four steps: pick an autonomy tier (Observer / Assistant / Operator), connect your tools via OAuth, set a brief (which repos to watch, what cadence), and confirm. On confirm, the platform spins up a dedicated container, bootstraps the agent with the role template and Claude as the LLM, and the employee is live.

How we built it

Backend — FastAPI handles the full lifecycle: hiring (container orchestration via Docker SDK), credential storage (encrypted OAuth tokens the containers never see), task dispatch (HTTP to each employee's container over a Docker bridge network), and an auth gateway that enforces role-scoped permissions on every external API call.

Agent runtime — Each container runs an agent gateway with a FastAPI task server alongside it. The entrypoint script generates a persona file and operating instructions from the role template, configures the LLM provider, and starts both services. The platform dispatches work; the agent does the thinking.

Frontend — Next.js 16, Tailwind v4, shadcn/ui. Dark mode only. A retro-futurist design language with dot-matrix display fonts, mission-control layouts, and Framer Motion animations. The entire hire flow — landing page, directory, employee profiles, four-step wizard, confirmation screen — is wired to the backend through Next.js API routes.

Infrastructure — Everything runs locally on Docker Desktop. The platform and all employee containers share a Docker bridge network, so the platform can dispatch tasks to containers by internal IP without publishing ports. Supabase handles persistence.

Challenges

Docker networking on Mac. Docker Desktop for Mac doesn't let the host reach container bridge IPs, which broke our original plan of running the platform natively while dispatching tasks to agent containers. We had to move the platform itself into Docker Compose to share the bridge network — slower iteration cycles, but the only way to get the full hire-to-dispatch loop working locally.

Agent reliability. The agent framework we built on has 7,900+ open GitHub issues for a reason. Silent gateway crashes, dropped channels, config quirks that only surface at runtime. Getting the entrypoint bootstrapping right took more debugging than the rest of the backend combined.

The mid-hackathon frontend reset. The initial scaffold was built around the wrong concept — a personal reader app, not a hiring platform. We scrapped it entirely and wrote a handoff brief for a clean restart. Losing that time hurt. Shipping the right product on a blank canvas was better than polishing the wrong one.

The terminology discipline. We committed to the hiring metaphor throughout — "AI employees" not "agents," "talent directory" not "marketplace," "onboarding" not "configuration." The backend still uses agent internally (renaming Python and SQL mid-hackathon was too risky), so the Next.js API layer acts as a translation boundary between the two vocabularies.

What we learned

The hiring metaphor isn't just a UX decision — it shapes the whole permission model. Once you commit to "employees" over "agents," a lot of hard problems become obvious. Onboarding means OAuth scoping. Performance review means audit logs. Offboarding means token revocation. The metaphor does real structural work.

What's next

The hackathon demo ends at "hired and running." After this, we want to make the employees actually work — real GitHub PR reviews, real Slack triage, real Gmail drafts. Then a creator marketplace where anyone can publish employee templates, and eventually a coaching layer where employees improve from feedback over time.