The Ghost - Project Story
Inspiration
A few years ago, I lost my father. As a way for my children to still connect with their grandfather, I used AI to build a memory twin of him - so they could ask him questions, hear his stories, feel his presence even after he was gone.
The insight that drove that work is simple: the most valuable thing technology can preserve is human knowledge before it disappears.
I kept thinking about that same problem inside software teams. Every senior engineer carries years of undocumented knowledge - why this function is structured the way it is, why that dependency can never be upgraded, what broke production the last time someone touched that file. When they leave, or when a junior engineer opens an MR without context, that knowledge evaporates. Teams repeat the same mistakes. Security regressions resurface. CI pipelines fail for reasons that were already figured out two years ago.
The AI Paradox makes this worse, not better. AI writes code faster than ever - but the bottleneck has shifted to understanding the code that already exists. Ghost is my answer to that.
What it does
Ghost is a two-agent GitLab Duo flow that excavates institutional knowledge from your codebase's git history and surfaces it automatically when someone opens a merge request touching code with hidden context, known dangers, or a history of past breakages.
One trigger. One mention. Everything your team ever knew about that file.
When you mention @ai-ghost-gitlab-ai-hackathon on a merge request, Ghost:
- Reads the commit history of every changed file - going back years
- Searches the entire codebase for CVE references, security patches, reverts, prototype pollution patterns, and "don't touch this" warnings
- Identifies the engineers who last deeply understood the code and extracts their key insights
- Checks whether a similar change was attempted before and what happened
- Posts a single structured comment on the MR with: why the code exists, what is dangerous, who knew it best, whether this has been tried before, and any past breakages
Real example: On a merge request touching lib/utils.js in Express.js, Ghost surfaced CVE-2024-51999 - a security patch that was applied and then reverted the same day because the CVE was rejected as invalid. It found the exact engineers, the exact commits, the exact reasoning from the release notes. Knowledge that would take a developer hours to reconstruct manually. Ghost found it in 2 minutes and 47 seconds.
Ghost also integrates with Google Cloud Firestore as a persistent memory store. Every time Ghost analyzes a file, it saves findings. On the next analysis, it retrieves previous findings and surfaces what has changed since - making Ghost progressively smarter about your codebase over time.
How we built it
Ghost runs entirely on the GitLab Duo Agent Platform using a custom two-agent flow defined in YAML.
Agent 1 - The Archaeologist uses five tools:
list_commits- scans the project's entire commit historyget_commit- reads individual commit details, authors, and messagesgitlab_blob_search- hunts across the codebase for CVE references, security patches, reverts, and danger markersget_repository_file- reads the actual file contentlist_repository_tree- maps project structure for context
Agent 2 - The Synthesizer takes the Archaeologist's raw findings and posts one clean, structured MR comment using create_merge_request_note.
The agents chain together: Archaeologist runs first, passes its complete findings to the Synthesizer via context:archaeologist.final_answer, then the Synthesizer writes the report.
The demo codebase is the real Express.js framework - mirrored from GitHub into our GitLab project. 14 years of real commit history, real security incidents, real reverts. Not a toy demo.
The memory layer is a Node.js service deployed on Google Cloud Run, backed by Firestore. It exposes a simple REST API that Ghost calls to save and retrieve findings per file. Ghost accumulates knowledge across every MR it analyzes, becoming a living institutional memory for the codebase.
The flow is validated through the hackathon CI pipeline, registered in the AI Catalog, and triggered via the standard Mention mechanism - fully native to GitLab.
Challenges we ran into
The schema validation wall. Getting the flow YAML to pass the hackathon's validate-items CI job took significant iteration. The schema requires a specific format for toolsets, inputs, and prompt variables that is not well-documented. We had to reverse-engineer the correct format by studying other participants' passing flows in the AI Catalog.
Context passing between agents. The Archaeologist needs to pass structured findings to the Synthesizer. GitLab's agent context system uses context:archaeologist.final_answer - but getting the Archaeologist to output clean, structured text that the Synthesizer could reliably parse required careful prompt engineering.
The project URL problem. When Ghost is triggered via mention, the flow context does not automatically include the full project URL - only the MR number. The first run required the user to provide the URL in the mention comment. We solved this by updating the trigger instructions to include the project URL in the mention, which Ghost then uses for all subsequent API calls.
Finding the right demo codebase. A synthetic demo repo would have looked fake. We needed a real codebase with genuinely haunted files - real CVEs, real reverts, real security debates. Express.js was the answer: 14 years of history, a rejected CVE, a same-day revert, and real engineers whose insights are preserved in commit messages.
Time. We built this in under 24 hours, solo.
Accomplishments that we are proud of
Ghost actually works on a real codebase. It is not a demo that shows a pre-scripted output. We triggered it on the real Express.js repository and it found CVE-2024-51999 - a real security story involving a patch, a rejection, and a same-day revert - entirely autonomously. We did not plant that story. Ghost found it.
The institutional memory framing. Every other submission in this hackathon accelerates the forward SDLC - faster code review, faster test generation, faster deployment. Ghost is the only one that looks backward. It mines history to protect the future. That is a genuinely different category.
The personal connection. The same insight that led me to build an AI twin of my father - that the most valuable thing technology can preserve is human knowledge before it disappears - is exactly what Ghost does for engineering teams. Building something technically impressive that also comes from a real place feels like the right kind of hackathon project.
Shipping with Google Cloud. The Firestore and Cloud Run memory layer is not bolted on. It genuinely makes Ghost smarter over time. Every analysis builds on the last. That is a meaningful product decision, not a checkbox.
What we learned
Prompt engineering is the real differentiator. The tools are the same for everyone - list_commits, gitlab_blob_search, get_commit. What makes Ghost's output compelling is the Archaeologist prompt's instruction to be a detective, not a summarizer. That framing change produced dramatically richer, more specific findings.
Real data beats synthetic demos every time. The decision to mirror Express.js instead of building a fake app was the right call. When Ghost finds a real CVE story with real engineers and real commit SHAs, judges and users feel that authenticity immediately.
The GitLab Duo Agent Platform is genuinely powerful. The chained agent architecture, the tool catalog, the AI Catalog registration system - it is a real agentic platform, not a chatbot wrapper. Building on it taught us what ambient agentic AI actually means in practice: agents that live in your workflow and act on triggers, not assistants you have to ask.
Ship early, iterate fast. We had a working flow triggered and producing output within the first few hours. Every improvement after that was refinement, not foundation.
What's next for The Ghost
Automatic trigger on every MR. Right now Ghost requires a manual mention. The next version uses GitLab's pipeline event triggers to automatically run Ghost on every MR that touches files with a history of past breakages - no mention needed.
Cross-project memory. Firestore currently stores findings per project per file. The next version builds a cross-project knowledge graph - so if the same CVE pattern appears across multiple repositories, Ghost surfaces that connection.
Ghost Score. A per-file danger rating based on: number of past breakages, frequency of reverts, number of CVEs, and how long since someone with deep context last touched it. Files with high Ghost Scores get automatic warnings before anyone opens an MR.
The formula for Ghost Score:
$$G = \frac{B \cdot w_b + R \cdot w_r + C \cdot w_c}{D + 1}$$
Where $B$ = past breakages, $R$ = revert count, $C$ = CVE count, $D$ = days since last expert touch, and $w$ values are empirically tuned weights.
IDE integration. Surface Ghost findings directly in VS Code or JetBrains before a developer even opens an MR - right when they open a file, Ghost shows the institutional memory sidebar.
The deeper vision: Ghost is what happens when you apply memory preservation technology - the same technology I used to preserve my father's knowledge for my children - to software teams. Every codebase deserves to remember what its engineers knew. Ghost makes that possible.
Built With
- claude-sonnet-4-(anthropic)
- express.js
- gitlab-duo-agent-platform
- google-cloud-firestore
- google-cloud-run
- node.js
- rest
- yaml
Log in or sign up for Devpost to join the conversation.