leafcutter

Inspiration

"Yet you, my creator, detest and spurn me, thy creature, to whom thou art bound by ties only dissoluble by the annihilation of one of us." -Mary Shelley, Frankenstein

What if you could have Claude CLI without Internet access or API tokens?

This is the question that leafcutter aims to answer: like the eponymous ants, our software aims to multiply the power of tiny, local models to pick up this heavy mantle, except now for free and open source! Claude has officially helped build its own successor.

What it does

leafcutter is the happy jumper that goes around your github repositories and mends your open wounds. No more maggots in these parts, for leafcutter is a scab-mending CLI, solving errors, a bit like a not-so-distant cousin of Claude Code, except this cousin is radically anarchist and solves everything for free! Thank you, Leafcutter! Thank you! 🍃🐜

🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟🦗🦟 jumping cricket noises

How we built it

llama-cpp-python runs GGUF models entirely in-process, no server needed
GBNF grammars constrain the model's token sampling to valid JSON tool call schemas, solving unreliable output at the source
Two-pass inference: first pass decides which tool to call, second pass generates the structured arguments
Sliding context window with extractive compression keeps the most relevant lines when token budget fills up
Every tool call surfaces a preview and waits for explicit user approval before touching anything
CLI built on prompt_toolkit and rich for a clean REPL experience with no web UI required

Challenges we ran into

Small models collapse into hallucinated or malformed JSON without grammar constraints — getting GBNF coverage right without adding too much latency took significant tuning
Raspberry Pi 3B+ has ~1GB usable RAM, which makes context compression genuinely painful — extractive compression alone isn't enough and we had to be aggressive about what stays in the window
A 135M model's context length is tiny, so the scaffolding has to do a lot more heavy lifting than it would with a larger model

Accomplishments that we're proud of

A 135M parameter model on a $35 computer, offline, can read a file, spot a bug, write a fix, and ask before applying it — end to end, that works
Grammar-constrained function calling turns a near-toy-sized model into something that can participate in a real agentic loop
The user is always in control: nothing executes without a confirmation prompt

What we learned

Grammar constraints are underused — constraining sampling at the token level is cleaner and more reliable than building regex parsers around broken output
Context management is where local agents live or die; the interesting engineering is in that layer, not in the model itself
A 2048-token window requires a completely different scaffolding strategy than a 128k one

What's next for leafcutter

OpenAI-compatible REST endpoint so leafcutter can act as a drop-in local backend for tools that already speak that protocol
Multi-file context so the agent can reason across a whole repository, not just file by file
Smarter compression to replace extractive summarization
Single binary distribution that works out of the box on a fresh Raspberry Pi with no Python setup required

Built With

claude
python

Updates

Ivan Gaspart started this project — Apr 04, 2026 12:38 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.