Inspiration

Every time we opened a new repository to contribute, we faced the same problem: Where do we even start? Large codebases feel like mazes. You waste hours jumping between random files, trying to guess the entry point and core logic. We built Gitmap to remove that friction.

What it does

Gitmap analyzes any GitHub repository and: Detects likely entry points Ranks files by architectural importance Identifies core vs non-core files Highlights which files can safely be ignored Instead of reading randomly, developers get a structured starting point in seconds.

How we built it

When a repository is submitted: Our backend fetches the full file tree using the GitHub API. We parse source files using Tree-Sitter to extract real import relationships from the Abstract Syntax Tree (AST). We construct a dependency graph of the codebase. We then compute architectural importance using: PageRank-style centrality Reverse dependency weighting Flow proximity from detected entry points Structural penalties for non-core files The result is a mathematically ranked map of the most influential files in the repository. This is graph-based analysis — not regex guessing or simple heuristics. Gemini Integration Once ranking is complete, we generate a structured architectural snapshot of the repository. That filtered, high-density context is passed to Gemini 3 Flash. Because irrelevant files are already removed, Gemini focuses only on meaningful architecture, leading to more accurate explanations and faster onboarding.

Challenges we ran into

Ranking files correctly was the hardest problem. We initially tried: Hardcoded heuristics Regex-based detection Simple pattern matching None of them were reliable across different repositories. The breakthrough came when we shifted to Tree-Sitter and built a real dependency graph. Once we modeled the repository as a graph problem, ranking became mathematically consistent instead of guesswork.

Accomplishments that we're proud of

We successfully built a working architectural ranking engine. Now we can drop any repository into Gitmap and immediately understand its structure. It has already saved us time, and we’re confident it will help developers who struggle with onboarding large codebases.

What we learned

We learned how to build a system from first principles instead of relying on shortcuts. We also learned that real architectural understanding requires modeling structure not just scanning text.

What's next for Gitmap

Expand language coverage to support nearly all major GitHub languages Improve entry point detection accuracy Add visual graph exploration Integrate deeper AI-assisted architectural summaries Optimize performance for very large repositories

Built With

Share this project:

Updates