Repo Explainer
Inspiration
Every developer knows the pain of onboarding to a new codebase. You clone the repo, stare at hundreds of files, and spend daysโsometimes weeksโjust trying to understand how things connect. Where's the entry point? What calls what? What patterns are being used?
We asked ourselves: What if AI could give you the architectural map instantly? Not just a file listing, but real understandingโcomponent relationships, data flows, tech stack decisions, and visual diagrams that make it all click.
What it does
Repo Explainer is a CLI tool that analyzes any repository (local or remote) and generates comprehensive, beautiful architecture documentation in minutes.
Point it at any codebase:
repo-explain analyze https://github.com/kubernetes/kubernetes --depth deep
And get:
- ๐ Coherent documentation with
index.mdas your starting point - ๐ Mermaid diagrams (components + data flow) auto-rendered to SVG
- ๐๏ธ Architecture breakdowns with file-to-component mappings
- ๐ Pattern detection (MVC, microservices, repository pattern, etc.)
- ๐ฆ Tech stack analysis with dependency graphs
- ๐ Function-level traceability with line numbers
It works on public repos, private repos (via SSH), or any local directory. No manual setupโjust one command.
How we built it
Core Stack:
- Python with Rich CLI for beautiful terminal output
- OpenCode AI as the analysis engine (configurable LLM backend)
- Mermaid CLI for diagram rendering
- Git integration for automatic cloning and caching
Architecture:
- Repository Loader โ Handles local paths, HTTPS URLs, and SSH URLs. Smart caching so subsequent runs are instant.
- OpenCode Service โ Orchestrates AI analysis with specialized prompts for different depths (quick/standard/deep/extra-deep).
- Prompt System โ Modular templates (
quick_scan_v2,architecture_deep_dive,pattern_detection,dependency_mapping) designed for token efficiency and accurate file-to-component mapping. - Doc Composer โ Transforms raw analysis into navigable Markdown with cross-linked pages.
- Diagram Renderer โ Renders Mermaid to SVG with auto-fix for syntax errors.
Key Design Decision: We enforce that every component cites its source file and line ranges. This makes the docs actually usefulโyou can jump straight from the architecture diagram to the exact code.
Challenges we ran into
1. Mermaid Syntax Errors from AI
LLMs sometimes generate invalid Mermaid syntax. We built an auto-fix loop that detects rendering failures, sends the error back to the AI for correction, and retries up to 2 times. Works surprisingly well.
2. Token Efficiency for Large Repos
Analyzing Kubernetes (2M+ lines) can't send everything to the LLM. We developed a progressive analysis strategy: quick scan for inventory โ targeted deep dives on key components โ synthesis. The prompt system explicitly guides the AI on which files to read.
3. File-to-Component Mapping Accuracy
Early versions would say "there's an auth module" without saying where. We redesigned prompts with strict output schemas requiring file_path and line_range for every component and function. Now everything is traceable.
4. Remote Repository Handling
Supporting HTTPS, SSH, and local paths with proper caching, force-refresh options, and owner/repo directory structure took iteration to get right.
Accomplishments that we're proud of
- โ One-command analysis of any repositoryโno config files needed
- โ Actually useful output โ not walls of text, but navigable docs with diagrams
- โ Self-documenting accuracy โ we ran repo-explain on itself and validated the output
- โ Auto-fix for diagram errors โ graceful degradation when AI makes mistakes
- โ Evidence-based assertions โ every claim links back to specific files and lines
- โ Works on massive codebases โ tested on Kubernetes, React, Linux kernel (quick scan)
- โ Beautiful CLI UX โ Rich progress indicators, verbose mode for transparency, colored output
What we learned
1. Prompt Engineering is Architecture
The quality of output is 80% prompt design. Investing in explicit output schemas, validation checklists, and few-shot examples in prompts made a massive difference.
2. AI Needs Guardrails
LLMs hallucinate. Requiring file paths and line numbers forces the AI to ground its analysis in actual code. "Show your work" improves accuracy.
3. Progressive Disclosure Works
Users don't want 50 pages upfront. Starting with index.md that links to subsections, with diagrams embedded, matches how developers actually explore.
4. The 80/20 of Codebase Understanding
For most repos, understanding the entry points, main components, and data flow covers 80% of what you need to get started. We optimized for that.
What's next for repo-explain
Near-term:
- ๐ Incremental analysis โ only re-analyze changed files
- ๐ HTML output โ interactive, searchable documentation site
- ๐ Multi-repo analysis โ understand how microservices connect
- ๐ Custom prompts โ bring your own analysis templates
Medium-term:
- ๐งฉ IDE integrations โ VS Code extension for inline architecture hints
- ๐ค CI/CD integration โ auto-generate docs on every merge
- ๐ Historical analysis โ track architectural evolution over time
Long-term:
- ๐ฏ Onboarding copilot โ "Explain how authentication works in this repo"
- ๐ Code review assistant โ "Does this PR violate existing patterns?"
- ๐ Public repo index โ searchable architecture docs for popular open source
Our vision: Make "understanding code" as easy as "running code." Every developer should be able to grok any codebase in minutes, not days.
Log in or sign up for Devpost to join the conversation.