Inspiration
Every developer has inherited a codebase and asked the same haunting question: "Why is this built this way?"
The answer is never in the code itself. It lives scattered across a Slack thread from 2021 that nobody linked to anything, a closed GitHub issue the PR never referenced, and a commit message that just says fix auth. The senior developer who made the call left two years ago. The reasoning died with their departure.
We felt this pain firsthand. During a sprint, we spent three days reverse-engineering an architectural decision that turned out to have a very specific reason — one that was buried in a 47-comment issue thread from 18 months ago. Three days. For context that already existed. It just had no way to surface.
That frustration pointed us to a gap nobody had filled:
| What exists today | What's missing |
|---|---|
| GitHub Copilot — explains WHAT code does | Nothing that explains WHY decisions were made |
| SonarQube — detects code complexity | Nothing that connects complexity to the decision that caused it |
| Dependabot — flags outdated packages | Nothing that checks if the original reasoning for choosing a package still holds |
git blame — shows WHO touched a line |
Nothing that explains WHY that line is the way it is |
When we heard about Gemini 1.5 Pro's 1 million token context window, we immediately saw the unlock. Every other AI coding tool chunks the repo, embeds it, and retrieves fragments. But archaeology doesn't work on fragments — it works on the complete artifact trail. For the first time, a model could hold an entire repository's history in a single call and reason across it holistically.
GitSpire — The Repository Archaeologist was born.
What It Does
GitSpire reads an entire GitHub repository — every commit, issue, pull request, and key file — and sends it to Gemini 1.5 Pro in a single long-context call. Gemini reconstructs the invisible WHY-layer: the reasoning behind every non-obvious architectural decision, the failed approaches that were tried and abandoned, the implicit assumptions the codebase silently depends on, and the ghost decisions that look intentional but have zero documentation trail.
The output is a structured Knowledge Core — a permanent, queryable record of why the codebase is the way it is.
The User Experience
1. Paste any public GitHub URL
2. Click "Analyze Repository"
3. Wait 30–90 seconds (the entire repo is processed in one Gemini call)
4. Receive a Knowledge Core with 7 types of archaeological insight
5. Ask questions, check assumptions, generate onboarding paths
The Seven Insight Types
| Type | What It Extracts |
|---|---|
| 🔵 Decision Atoms | Major architectural choices with WHY they were made, evidence citations, and confidence scores |
| 🟢 Assumptions | Hidden truths the system silently depends on, risk-rated critical / moderate / low |
| 🔴 Failure Memory | Approaches that were tried and abandoned — so nobody repeats them |
| 🟣 Ghost Decisions | Intentional-looking patterns with zero documentation trail |
| 🟡 Regretted Decisions | Choices that show regret signals (reverts, TODOs, issue language) in the artifact trail |
| 🔵 Orphaned Architecture | Code areas with no active owner or institutional understanding |
| ⚪ Decision Pulse | A freshness score and staleness map for the whole architecture |
The Four Live Tools (Powered by Follow-Up Gemini Calls)
Ask Why — Type any natural language question about the codebase. Gemini answers from the Knowledge Core with specific citations back to decision atoms, commit SHAs, and issue numbers. Never from general training knowledge.
Assumption Alarm — Paste any code snippet. GitSpire checks it against the extracted assumption list and alerts you if your change would violate an architectural constraint the original authors depended on.
Onboarding Path — Describe a feature you want to build. GitSpire generates a ranked checklist of what you must understand before safely touching that area — sorted by likelihood to break something.
GitSpire Guard (preview) — A GitHub App that monitors pull requests and automatically posts an architectural review comment, flagging violations before they merge.
How We Built It
The Core Innovation: One Call, Whole Repo
Most AI-powered code tools use RAG — they chunk the codebase, embed it, retrieve relevant fragments, and pass those fragments to the model. This fundamentally loses cross-chunk context. Decisions that span multiple commits, issues, and PRs become invisible.
GitSpire does the exact opposite:
Traditional RAG:
Codebase (500k tokens) → chunk → embed → retrieve → model (small window)
⚠ Context lost between chunks. Architectural decisions are invisible.
GitSpire:
Codebase (500k tokens) → build_archaeology_context() → Gemini 1.5 Pro (1M tokens)
✓ Entire repository. One call. Zero context loss.
Gemini 1.5 Pro's 1 million token context window is what makes this possible — and it is not a feature, it is the entire premise. No other architecture would allow the full artifact trace needed for true archaeological analysis.
The Pipeline (POST /api/analyze)
Step 1 — GitHub Ingestion. We built a GitHub client that fetches the full repository bundle using asyncio.gather() for maximum parallelism: repository metadata, up to 150 commits (paginated), 100 issues (all states), 50 pull requests, the full file tree, and the 10 most architecturally significant files (README, config, ARCHITECTURE docs, changelogs).
Step 2 — BABEL Translation Layer (optional). Many open-source repositories have contributors writing commit messages and issue bodies in their native language. Without translation, those decision signals are invisible to Gemini. Our BABEL layer runs Google Translate on every non-English artifact and passes both the original and translated text through the pipeline. In testing, BABEL increased the number of extracted decision atoms by ~83% on multilingual repositories.
Step 3 — Context Assembly. We assemble a single structured string up to 800,000 characters, organized chronologically (oldest commits first) so Gemini can follow the evolution of decisions over time. Commit messages are never truncated mid-sentence — if something must be cut for the size limit, we cut oldest commits before we ever touch a message.
Step 4 — Single Gemini Call. We send the full context to gemini-1.5-pro with temperature=0.2 (precision over creativity — the same repo should yield consistent insights across analyses), max_output_tokens=8192, and response_mime_type="application/json" (forces clean structured output). The prompt instructs Gemini to act as a software archaeologist, cite every claim against specific commit SHAs, issue numbers, and PR references, and never invent what isn't evidenced.
Step 5 — Parse → Pydantic. We parse the JSON output into a strict Pydantic v2 KnowledgeCore model with graceful degradation — _safe_parse_json() uses a three-attempt recovery strategy so a partial response never crashes the pipeline.
Step 6 — Firebase Cache. The Knowledge Core is stored in Firebase Realtime DB under a 16-character SHA256 key of the repo URL, with a 24-hour TTL. Subsequent queries (Ask Why, Assumption Alarm, Onboarding) load the cached core instantly — no re-analysis required.
Stack
| Layer | Technology |
|---|---|
| AI Engine | Gemini 1.5 Pro (1M context, JSON mode) |
| Backend | Python 3.11, FastAPI, Pydantic v2, uvicorn |
| Data Layer | Firebase Realtime Database |
| Translation | Google Cloud Translate (BABEL layer) |
| Frontend | Vanilla JS SPA — app.js, api.js, ui.js, panels.js, Marked.js, Lottie |
| Deployment | Railway (Nixpacks, auto-deploy) |
Challenges We Ran Into
1. Prompting Gemini for evidence-anchored output, not hallucination. The hardest engineering problem wasn't sending data to Gemini — it was getting Gemini to commit to specific evidence rather than plausible-sounding generalities. "The team chose PostgreSQL for reliability" is useless. "The team chose PostgreSQL after a Redis data-loss incident (issue:#47, commit:abc1234) demonstrated that in-memory storage was unsuitable for this data criticality" is actionable. We went through 11 major prompt iterations before the output was consistently evidence-anchored.
2. Context size engineering. Gemini's 1M token window is large, but GitHub repositories are larger. We had to build a principled prioritization system: key architectural files come first (they contain more decision-signal per token than code), followed by issues (where debates happen), followed by PRs, followed by commits, with the oldest commits prioritized because they contain the foundational decisions. Getting the truncation logic right — especially the rule that commit messages are never cut mid-sentence — took significant iteration.
3. Parsing reliability across edge cases.
Gemini in JSON mode is highly reliable, but not 100%. Large outputs occasionally have malformed JSON near the token limit boundary. Our _safe_parse_json() three-attempt recovery strategy (clean parse → boundary extraction → graceful degradation) handles this without surfacing errors to the user.
4. Frontend UX for a novel output shape. The Knowledge Core is a data structure that doesn't map neatly to any existing UI pattern. We had to design and build a tabbed panel system from scratch that makes seven distinct insight types feel natural and explorable — not overwhelming. Displaying confidence scores, risk levels, and evidence citations in a way that engineers trust required significant design iteration.
5. Rate limiting without authentication. Making the tool accessible (no signup required) while preventing abuse meant building a careful in-memory rate limiter (10 analysis calls/hour per IP) and a 24-hour Firebase cache so the same repository URL never triggers a duplicate Gemini call within a day.
Accomplishments That We're Proud Of
The single-call architecture actually works. We were genuinely uncertain before we ran the first real test whether Gemini could hold an entire repository's history and extract meaningful, evidence-cited insights. It does — and the output quality is consistently high.
Evidence citation quality. The decision atoms Gemini extracts consistently cite real commit SHAs, real issue numbers, and real PR references — not hallucinated ones. The prompt engineering required to achieve this is one of our most significant technical contributions.
BABEL uncovering hidden decisions. On multilingual repositories, the BABEL layer surfaces decision atoms that would be completely invisible to an English-only analysis. Finding architectural reasoning buried in Japanese or Portuguese commit messages and making it accessible feels like exactly the kind of problem AI should be solving.
GitSpire Guard concept. The Guard capability preview — a GitHub App that auto-reviews PRs against the stored Knowledge Core and posts architectural violation comments before merge — is the most compelling demonstration of what happens when you make the WHY-layer a persistent, live artifact rather than a one-time report.
Iron Rule architecture. We enforced strict separation of concerns throughout: all Gemini calls through gemini_client.py, all prompts in prompts.py, all Firebase operations through firebase_client.py, zero business logic in routes. This made the system debuggable and extensible under hackathon time pressure.
What We Learned
Long-context is not just "bigger RAG" — it is a fundamentally different capability. Gemini 1.5 Pro's 1M token window doesn't just let you fit more data in — it lets the model reason across the relationships between artifacts that are far apart in time and structure. A decision made in commit #3 that gets revisited in issue #89 and finally resolved in PR #141 is a pattern that RAG simply cannot see. Whole-context archaeology is a genuinely new category of AI application.
Prompt engineering for evidence-grounding is a real discipline. Getting an LLM to commit to specific evidence rather than plausible-sounding synthesis requires fundamentally different techniques than getting it to explain or summarize. Specificity constraints, citation format requirements, and explicit anti-hallucination instructions all compound.
The WHY-layer is almost always recoverable. We assumed going in that many repositories would have too thin an artifact trail to extract meaningful reasoning. We were wrong — even repositories with sparse commit messages typically have rich issue and PR discussions that contain the WHY. The signal is there. It just needs a model large enough to read it all at once.
What's Next for GitSpire — The Repository Archaeologist & Guard Detective
GitSpire Guard — Full Activation. The Guard system is fully scaffolded and one GitHub App registration away from being live. The /webhook/github endpoint receives PR webhooks, loads the cached Knowledge Core, runs a single Gemini violation-check call, and posts an automated architectural review comment. We want Guard to become the standard PR reviewer for teams that care about architectural integrity.
Decision Atom Diff. Re-analyze a repository after 30, 60, or 90 days and generate a diff of the Knowledge Core — showing which architectural decisions have changed, which assumptions are no longer valid, and which new ghost decisions have appeared. Architecture evolution, tracked automatically.
GitSpire for Private Repos. OAuth integration with GitHub to allow analysis of private repositories. The pipeline is identical — only the API authentication layer changes.
IDE Integration. A VS Code extension that surfaces the relevant decision atoms and assumptions inline as you edit — without requiring a trip to the GitSpire UI. Hover over a function and see why that pattern exists.
Team Knowledge Sharing. Share a Knowledge Core URL with your team so everyone onboarding to a codebase starts with the same archaeological foundation. Decision atoms as living documentation that lives alongside the code.
Multi-Repository Archaeology. Cross-repository analysis for organizations with multiple interconnected services — surface the architectural decisions that span service boundaries and the assumptions each service makes about the others.
GitSpire — GitHub Copilot knows WHAT your code does. GitSpire knows WHY.
Log in or sign up for Devpost to join the conversation.