Repo Explainer

Inspiration

Every developer knows the pain of onboarding to a new codebase. You clone the repo, stare at hundreds of files, and spend daysโ€”sometimes weeksโ€”just trying to understand how things connect. Where's the entry point? What calls what? What patterns are being used?

We asked ourselves: What if AI could give you the architectural map instantly? Not just a file listing, but real understandingโ€”component relationships, data flows, tech stack decisions, and visual diagrams that make it all click.

What it does

Repo Explainer is a CLI tool that analyzes any repository (local or remote) and generates comprehensive, beautiful architecture documentation in minutes.

Point it at any codebase:

repo-explain analyze https://github.com/kubernetes/kubernetes --depth deep

And get:

  • ๐Ÿ“š Coherent documentation with index.md as your starting point
  • ๐Ÿ“Š Mermaid diagrams (components + data flow) auto-rendered to SVG
  • ๐Ÿ—๏ธ Architecture breakdowns with file-to-component mappings
  • ๐Ÿ” Pattern detection (MVC, microservices, repository pattern, etc.)
  • ๐Ÿ“ฆ Tech stack analysis with dependency graphs
  • ๐Ÿ”— Function-level traceability with line numbers

It works on public repos, private repos (via SSH), or any local directory. No manual setupโ€”just one command.

How we built it

Core Stack:

  • Python with Rich CLI for beautiful terminal output
  • OpenCode AI as the analysis engine (configurable LLM backend)
  • Mermaid CLI for diagram rendering
  • Git integration for automatic cloning and caching

Architecture:

  1. Repository Loader โ€“ Handles local paths, HTTPS URLs, and SSH URLs. Smart caching so subsequent runs are instant.
  2. OpenCode Service โ€“ Orchestrates AI analysis with specialized prompts for different depths (quick/standard/deep/extra-deep).
  3. Prompt System โ€“ Modular templates (quick_scan_v2, architecture_deep_dive, pattern_detection, dependency_mapping) designed for token efficiency and accurate file-to-component mapping.
  4. Doc Composer โ€“ Transforms raw analysis into navigable Markdown with cross-linked pages.
  5. Diagram Renderer โ€“ Renders Mermaid to SVG with auto-fix for syntax errors.

Key Design Decision: We enforce that every component cites its source file and line ranges. This makes the docs actually usefulโ€”you can jump straight from the architecture diagram to the exact code.

Challenges we ran into

1. Mermaid Syntax Errors from AI

LLMs sometimes generate invalid Mermaid syntax. We built an auto-fix loop that detects rendering failures, sends the error back to the AI for correction, and retries up to 2 times. Works surprisingly well.

2. Token Efficiency for Large Repos

Analyzing Kubernetes (2M+ lines) can't send everything to the LLM. We developed a progressive analysis strategy: quick scan for inventory โ†’ targeted deep dives on key components โ†’ synthesis. The prompt system explicitly guides the AI on which files to read.

3. File-to-Component Mapping Accuracy

Early versions would say "there's an auth module" without saying where. We redesigned prompts with strict output schemas requiring file_path and line_range for every component and function. Now everything is traceable.

4. Remote Repository Handling

Supporting HTTPS, SSH, and local paths with proper caching, force-refresh options, and owner/repo directory structure took iteration to get right.

Accomplishments that we're proud of

  • โœ… One-command analysis of any repositoryโ€”no config files needed
  • โœ… Actually useful output โ€” not walls of text, but navigable docs with diagrams
  • โœ… Self-documenting accuracy โ€” we ran repo-explain on itself and validated the output
  • โœ… Auto-fix for diagram errors โ€” graceful degradation when AI makes mistakes
  • โœ… Evidence-based assertions โ€” every claim links back to specific files and lines
  • โœ… Works on massive codebases โ€” tested on Kubernetes, React, Linux kernel (quick scan)
  • โœ… Beautiful CLI UX โ€” Rich progress indicators, verbose mode for transparency, colored output

What we learned

1. Prompt Engineering is Architecture

The quality of output is 80% prompt design. Investing in explicit output schemas, validation checklists, and few-shot examples in prompts made a massive difference.

2. AI Needs Guardrails

LLMs hallucinate. Requiring file paths and line numbers forces the AI to ground its analysis in actual code. "Show your work" improves accuracy.

3. Progressive Disclosure Works

Users don't want 50 pages upfront. Starting with index.md that links to subsections, with diagrams embedded, matches how developers actually explore.

4. The 80/20 of Codebase Understanding

For most repos, understanding the entry points, main components, and data flow covers 80% of what you need to get started. We optimized for that.

What's next for repo-explain

Near-term:

  • ๐Ÿ”„ Incremental analysis โ€” only re-analyze changed files
  • ๐ŸŒ HTML output โ€” interactive, searchable documentation site
  • ๐Ÿ”— Multi-repo analysis โ€” understand how microservices connect
  • ๐Ÿ“ Custom prompts โ€” bring your own analysis templates

Medium-term:

  • ๐Ÿงฉ IDE integrations โ€” VS Code extension for inline architecture hints
  • ๐Ÿค– CI/CD integration โ€” auto-generate docs on every merge
  • ๐Ÿ“ˆ Historical analysis โ€” track architectural evolution over time

Long-term:

  • ๐ŸŽฏ Onboarding copilot โ€” "Explain how authentication works in this repo"
  • ๐Ÿ” Code review assistant โ€” "Does this PR violate existing patterns?"
  • ๐ŸŒ Public repo index โ€” searchable architecture docs for popular open source

Our vision: Make "understanding code" as easy as "running code." Every developer should be able to grok any codebase in minutes, not days.

Built With

Share this project:

Updates