Inspiration

The inspiration for Arbok came from a very practical pain point: my API bill. I love using Cline (the AI coding agent), but as my projects grew, Cline struggled to maintain context without reading every single file. This "context stuffing" resulted in massive token consumption and slower response times. I realized that while LLMs have large context windows, stuffing raw code into them is inefficient.

I learned from the community (referencing discussions on r/LocalLLaMA and the "Memory Bank" concept) that there is a "third way" between RAG (Vector Search) and Brute Force (reading everything): AST-based Indexing. I wanted to build a tool that gives Cline a structural map of the code, rather than just raw text.

What it does

Arbok is an MCP (Model Context Protocol) Server that acts as a high-efficiency memory layer for Cline. It drastically reduces token usage by preventing the AI from needing to read full files to understand the codebase.

  1. Local AST Indexing: Arbok parses your code (TypeScript, Python, etc.) into an Abstract Syntax Tree (AST) and stores the structure (Nodes, Edges, Dependencies) in a local SQLite database. Cline can query this graph to understand "What does this function do?" or "Where is this class used?" without opening the file.
  2. Automated Memory Bank: It automatically generates and updates the "Memory Bank" markdown files (like productContext.md and activeContext.md). When you finish a task, Arbok updates the documentation for you.
  3. Context Efficiency: It generates custom .clinerules that instruct Cline to use Arbok's tools for context retrieval, keeping the prompt window clean and focused.

How we built it

We built Arbok using Node.js and TypeScript, leveraging the new Model Context Protocol (MCP) SDK to interface with Cline.

  • Core Engine: We used web-tree-sitter for high-speed, error-tolerant parsing of source code into ASTs.
  • Database: We utilized better-sqlite3 to store the symbol table and dependency graph locally within the project (.arbok/index.db).
  • Watcher: We implemented chokidar to watch for file changes in real-time. As soon as you save a file, Arbok re-indexes the AST and updates the Memory Bank.
  • Integration: We designed specific MCP tools (arbok:get_symbols, arbok:update_memory_bank) that allow the AI to interact with the database using natural language commands.

Challenges we ran into

  • Unified Graph Schema: Creating a database schema that accurately represents code structures across different languages (e.g., mapping TypeScript interfaces vs. Python classes) was difficult. We had to define abstract "Nodes" and "Edges" that were flexible enough for polyglot support.
  • The "Context vs. Cost" Balance: Figuring out exactly how much information to return in the AST summary was tricky. If we returned too little, Cline got confused; too much, and we lost the token savings. We had to tune the summarization logic carefully.
  • MCP Integration: As MCP is a relatively new protocol, debugging the communication between the Cline client and our local server required a lot of trial and error with STDIO transport.

Accomplishments that we're proud of

  • Massive Token Reduction: We achieved our goal of significantly lowering input tokens for large codebase exploration.
  • "Set and Forget" Documentation: The Memory Bank feature successfully updates itself. Watching the documentation files rewrite themselves after a code change feels like magic.
  • Speed: The AST parsing is incredibly fast, allowing for near real-time updates without slowing down the developer's workflow.

What we learned

  • RAG isn't always the answer: For coding tasks, structure matters more than semantic similarity. An AST graph is often more useful to an LLM than vector embeddings.
  • Agents need Tools, not just Text: Giving an agent the ability to query a codebase is far more powerful than just feeding it the codebase.
  • The power of MCP: Building on the Model Context Protocol allowed us to create a tool that isn't just for Cline, but potentially for any MCP-compliant AI client in the future.

What's next for Arbok

  • More Languages: Expanding AST support to Go, Rust, and C++.
  • Impact Analysis: Adding a feature where Arbok can tell Cline: "If you change this function, these 5 other files will break."
  • Visual Graph: Generating a visual representation of the codebase structure to help human developers understand the architecture alongside the AI.

Built With

Share this project:

Updates