LLMai: An AI coding agent that never leaves your laptop

Inspiration

Cloud-based AI coding tools send your proprietary source code, prompts, and terminal history to external servers you don't control. For privacy-conscious developers and enterprise environments, this presents an unacceptable security risk. We built LLMai to provide the full agentic power of modern AI coding assistants while guaranteeing that every byte of your data stays safely on your local machine.

What it does

LLMai is a 100% local, open-source AI coding agent that doesn't just chat—it actually does the work. It plans, reads files, writes code, and runs shell commands driven by a model on your own hardware.

-Real Agentic Loop: It plans the next step, calls a tool, observes the result, and iterates up to 20 times until the task is complete.

-Explicit-Permission Writes: Read-only tools execute instantly, but anything that mutates state (writing files, running commands) pauses for your explicit approval.

-GitLab Integration: Connect it to your repository to automatically triage issues, fetch merge requests, read failing pipeline logs, and open fix MRs.

How we built it

-Backend: A lightweight, highly readable Python loop—no heavy abstraction frameworks.

-AI Orchestration: Native function-calling for models like Qwen 2.5 Coder and Llama 3.1/3.2, with an intelligent XML-based fallback for Gemma, Phi, and Mistral.

-Frontend: A premium, dark-mode full-screen browser UI (HTML/Vanilla JS/CSS) with glassmorphism aesthetics, connecting via WebSockets. We also maintain a rich terminal REPL for CLI lovers.

-LLM Engine: Powered entirely by local Ollama instances (with a provider-agnostic architecture that allows a fallback to Gemini via API key if needed).

Challenges we ran into

-Model Compatibility: Different local models handle tool-calling differently. We had to build a dynamic system that automatically detects a model's capability and seamlessly switches between native JSON function calling and an XML-based fallback.

-Context Window Management: Long agentic sessions quickly fill up local model context windows. We implemented a context compression engine that auto-summarizes older turns when the conversation exceeds ~50k tokens.

-Security & Sandboxing: Ensuring the agent could be powerful enough to run shell commands without being dangerous. We implemented strict path traversal blocks, a destructive-command blocklist, and the visual human-in-the-loop approval system.

Accomplishments that we're proud of

-Delivering a fully functional local AI agent capable of chaining 8+ tools from a single sentence (e.g., finding a failing GitLab pipeline, editing the source, testing, and opening a fix MR).

-Designing a stunning, modern web dashboard that makes it incredibly easy to monitor the agent's thought process and approve or reject state-mutating actions.

-Achieving all of this in a simple, honest architecture without API keys, telemetry, or hidden usage caps.

What we learned

We learned that you don't need massive, opaque frameworks to build powerful AI agents. A well-designed, permission-gated Python loop paired with the right local model (like Qwen 2.5 Coder) can achieve production-level coding assistance with zero privacy trade-offs.

What's next for LLMai

-Implementing persistent localStorage chat history for the web UI.

-Expanding native Git platform integrations beyond GitLab (GitHub, Bitbucket).

-Adding cross-session memory management so the agent remembers your specific codebase quirks across different projects.

Built With

  • anthropic-(claude)
  • backend:-python-+-fastapi-ai-framework:-langchain-llm-providers:-openai
  • basic-dlp-(pii-detection)-others:-pydantic
  • gitlabapi
  • google-gemini-frontend/dashboard:-html-+-javascript-core-technologies:-intelligent-router
  • python-dotenv
  • semantic-cache
  • tiktoken
Share this project:

Updates