AladdinAI — Self-hosted AI Agent Platform

Inspiration

Every AI tool I tried had the same problem: your data leaves your servers, your agents live in someone else's cloud, and you have no real control over what happens under the hood.

I kept thinking about small businesses — a sales team managing leads over WhatsApp, a support team buried in emails, a developer trying to automate deployments. None of them can afford an enterprise software stack. None of them want to hand their customer data to a third-party cloud. But all of them could genuinely benefit from AI agents that actually do things — not just chat.

That was the spark. What if you could run a full AI workforce on your own infrastructure, controlled from the apps you already use every day?


What it does

AladdinAI is an open-source BYOI (Bring Your Own Infrastructure) AI agent platform. You deploy it on your own servers, connect your own LLM providers, and build agents that work across every channel your business already uses.

Key capabilities:

  • Multi-agent orchestration — agents delegate tasks to each other autonomously
  • IMAP email integration — agents read incoming emails and reply on their own, like a real team member
  • WhatsApp & Telegram control — send a message to your bot, it executes the task: updates the CRM, runs a script, replies to a client
  • Browser-based terminal — full shell access to your servers, directly in the UI, secured through Traefik and SSH
  • Built-in CRM — agents track customers, deals, and conversations automatically
  • Vector memory (MongoDB Atlas) — agents remember facts about users across sessions, not just within a single chat
  • Image generation — agents can generate and send images as part of workflows
  • GitHub App integration — automated code review, PR comments, and reviewer assignment powered by NVIDIA NIM
  • SQL playground — query and explore your data directly inside the platform
  • NVIDIA NIM support — native inference layer with proper tool_choice=auto and parallel_tool_calls handling for fully autonomous agents

Everything runs on infrastructure you own. No data leaves your servers unless you decide it should.


How we built it

The stack is split into three layers:

Backend (Python / FastAPI)

  • FastAPI for the agent API and webhook handling
  • SQLAlchemy + SQLite (WAL mode) for relational data
  • MongoDB Atlas for vector memory and GridFS media storage
  • APScheduler for cron-based agent tasks
  • IMAP polling loop for email monitoring
  • Traefik + ttyd/wetty for the browser terminal

Frontend (Next.js 15 / TypeScript)

  • Next.js 15 with App Router and shadcn/ui components
  • Real-time agent execution tracing in the UI
  • Visual agent builder and multi-agent workflow editor

LLM layer

  • Provider-agnostic: OpenAI, Anthropic, NVIDIA NIM, Ollama, BentoML
  • NVIDIA NIM required specific handling: disabling parallel_tool_calls, explicit tool_choice=auto, and NIM-compatible system prompt formatting
  • GitHub Actions pipeline with CodeQL security scanning on every push

Challenges we ran into

NVIDIA NIM tool compatibility was the biggest technical challenge. NIM-served models handle tool calls differently from OpenAI — getting autonomous agent execution to work reliably required careful tuning of parallel_tool_calls, tool_choice, and system prompt structure.

Security at every layer. Running a platform with shell access, file operations, and SQL queries means the attack surface is real. We fixed path traversal vulnerabilities, SQL injection vectors, ReDoS patterns, HTML sanitization issues, and SSRF risks — many surfaced through CodeQL scanning. Security is never done, but we've treated it seriously from the start.

Real-time email as an agent channel. IMAP monitoring sounds simple until you're handling connection drops, IDLE reconnects, threading, HTML stripping, and making sure the agent's reply lands in the right thread. Getting that to feel seamless took several iterations.

Keeping the stack simple enough to self-host. A platform this capable can easily become impossible to deploy. We invested heavily in the setup experience — npx aladdin-ai gets you running in minutes, with Render and Railway configs for those who want managed hosting.


Accomplishments that we're proud of

  • A genuinely working end-to-end loop: a message arrives in WhatsApp → the agent reads it → updates the CRM → replies via email → logs the interaction in memory. No manual steps.
  • Browser terminal with real SSH access, secured and routed through Traefik, running inside the same platform as the agents.
  • NVIDIA NIM integrated as a first-class provider — not bolted on, but built in from the architecture level.
  • GitHub App that does automated code review powered by NIM — a real productivity tool, not a demo.
  • Released v2.1.8 in under a month of development, with 240+ commits, automated changelog, and CodeQL passing clean.

What we learned

Building a self-hosted platform forces you to think differently than building a SaaS. Every dependency matters more. Every security gap is your user's problem. Every setup step that's too hard means someone doesn't deploy it.

We also learned that the hardest part of multi-agent systems isn't the orchestration logic — it's the edges. What happens when an agent fails mid-task? When an email arrives while another is processing? When a tool call returns unexpected output? Handling those edges gracefully is what separates a demo from a platform.

And practically: NVIDIA NIM has real quirks that aren't in the docs. You find them by building.


What's next for AladdinAI

Excel / spreadsheet integration — agents that can read, write, and analyze Excel and CSV files. This is the missing piece for most small business workflows: inventory tracking, sales reports, price lists. An agent that can receive a spreadsheet by email, process it, and reply with an updated version is genuinely useful for thousands of businesses.

Pre-built agent templates — ready-to-deploy agents for common roles: sales agent, support agent, inventory manager, code reviewer. Lower the barrier for non-technical users who know what they want but don't want to build from scratch.

Simpler onboarding — the current setup is developer-friendly. We want it to be business-owner-friendly too. A guided setup wizard, example workflows, and a one-click demo environment.

NVIDIA Inception Program — we're applying to deepen the NIM integration and explore collaboration with the NVIDIA ecosystem. The combination of self-hosted infrastructure and NIM-powered inference is exactly what enterprises with data sovereignty requirements need.

The goal is simple: any business should be able to run AI agents on their own terms, without giving up control of their data or their stack.

Built With

Share this project:

Updates