Inspiration

Over the past year working as an AI coding specialist and consultant, I've talked with many companies and enterprises, both large and small. A clear pattern emerged: while AI coding tools are powerful and generate code incredibly fast, code review has become a critical bottleneck. Companies simply don't have enough experienced developers to review the volume of AI-generated code.

However, these companies do have something valuable—established best practices and coding rules. The insight was clear: if we could build an AI agent that reviews code based on user-provided rules, scales to any codebase size, and maintains accuracy, it would solve a real pain point in the AI-assisted development workflow.

I also researched existing AI review tools and found critical gaps:

  1. Not truly agentic: Most just send code changes to an LLM in one shot without an agentic loop, making them less intelligent than coding agents
  2. Limited customizability: Primarily rely on built-in general (often nit-picking) capabilities rather than user-defined rules
  3. Hard to use: Require SaaS accounts, run in a black box, difficult to trace what happened, and hard to unify across dev environments and CI/CD
  4. Privacy risks: Can't use user-specified LLM models, forcing companies to send code to third-party services

What it does

Firekeeper is an agentic AI code reviewer CLI that automates code review based on user-defined rules. It addresses the bottleneck of reviewing AI-generated code at scale by:

Core capabilities:

  • Privacy-first: Bring your own LLM API key and model—works with any OpenAI-compatible endpoint
  • Agentic review: Uses an agentic loop with tools to intelligently investigate code changes, not just one-shot LLM calls
  • Custom rules: Define project-specific review rules in firekeeper.toml with detailed instructions for the AI agent
  • Flexible scope: Review uncommitted changes, specific commits, date ranges, or entire repositories
  • Parallel execution: Splits review tasks across multiple workers for speed and focus, with configurable file batching
  • Structured output: JSON output and markdown trace files for integration with CI/CD and debugging
  • Context engineering: Include files, shell command outputs, and Agent Skills as context for reviews

Workflow:

  1. Configure rules in firekeeper.toml (e.g., "documentation sync", "no code duplication", "no hardcoded credentials")
  2. Run firekeeper review to analyze code changes
  3. The orchestrator splits work by rule and file scope, spawning parallel worker agents
  4. Each worker uses tools to investigate violations (reading files, searching patterns, checking diffs)
  5. Results are aggregated and reported with file paths, line numbers, and violation details

Integration points:

  • Local development: Review before committing
  • Git hooks: Auto-review on pre-commit
  • CI/CD pipelines: Structured output for automated quality gates
  • Coding agents: Provide review feedback for auto-optimization loops

How we built it

We started building firekeeper less than a month ago. The project is built entirely in Rust to ensure it's native, cross-platform, small, fast, and secure—critical requirements for a CLI tool that developers will run locally and in CI/CD pipelines.

For maximum customizability, we developed and open-sourced two new Rust libraries:

  • tiny-loop: A lightweight agent loop library (currently in early phase) that powers firekeeper's agentic review capabilities
  • toml-scaffold: A configuration formatter library that handles firekeeper's flexible rule definitions

Since I'm participating solo, I built this entirely by myself with the help of a coding agent. We chose OpenRouter + Gemini 3 Flash as the default LLM combination because it's affordable, fast, and smart enough for code review tasks.

Challenges we ran into

Async Rust complexity: Writing async Rust is always a pain—there are so many concepts to understand and get right. The learning curve is steep, especially when building a concurrent system with parallel workers.

Agent framework design: I found existing Rust agent frameworks not ergonomic enough, so I designed tiny-loop from scratch. This was really brain-draining work to accomplish what it can do now—building an agent loop library that's both flexible and easy to use required careful API design and iteration.

Context engineering: It's still really hard to give AI just the right context to do things correctly. Too little context and the agent misses violations; too much and it gets confused or slow. Even now, I think the review time is a bit too long and has space to optimize.

Accomplishments that we're proud of

It actually works: Firekeeper CLI is fully functional and distributed on GitHub as open source. Users can install it with a single shell command regardless of platform—no complex setup required.

Easy onboarding: We provide built-in rules so users can get started immediately. The only requirement at the getting started phase is providing an API key.

Parallel execution: The parallel agent execution system is working well, splitting review tasks across multiple workers for both speed and focus.

Readable reports: The generated markdown reports are clean and readable, making it easy to understand what the AI found and why.

Mission accomplished: The project basically satisfies the initial plan and addresses the pain point we set out to solve—scalable, customizable AI code review that developers can run anywhere.

What we learned

Gemini's capabilities: We realized Gemini models have powerful code understanding abilities. The parallel tool call feature is a real boost in agentic scenarios—allowing the agent to investigate multiple aspects simultaneously. The large context window also makes it possible to handle large task executions without losing important details.

Context engineering is core: When implementing an agentic AI tool, context engineering is really the core challenge—not just prompt engineering. It's about deciding what information to provide, when to provide it, and how to structure it so the agent can make intelligent decisions across multiple reasoning steps.

What's next for firekeeper.ai

Since this is a tool I genuinely need, I will definitely continue polishing the project and working to get more visibility.

If the project gains popularity, there's also a business plan to make profit from it—while keeping the core principles unchanged: privacy first. Users will always be able to bring their own LLM and run firekeeper locally without sending code to third-party services.

Built With

Share this project:

Updates