CodeBaseArchaeologist

Inspiration

The inspiration for Codebase Archaeologist came from a universal developer experience: staring at a piece of code and asking "Why is this timeout set to 47000?" or "Why does this file look like this?" During a late-night debugging session, I found myself jumping between git blame, archived issues, and pipeline logs just to understand why a particular configuration was chosen. This manual investigation process was taking hours, and I realized the answer shouldn't require archaeological-level digging - it should be instantly available.

Every developer has faced this problem, yet no solution existed for automating codebase investigation. The "aha!" moment came when I recognized that understanding code decisions requires piecing together evidence from multiple sources: git history, issue discussions, merge request debates, and CI/CD constraints. This became the foundation for the three-agent methodology.

What it does Codebase Archaeologist is a GitLab Duo Custom Agent that answers the fundamental developer question: "Why does this code look like this?" The agent uses a three-agent methodology to conduct comprehensive investigations:

Git History Agent - Analyzes commit chronology, author patterns, and code evolution Issues & MR Agent - Investigates discussions, code reviews, and decision context CI Failure Agent - Examines pipeline constraints and technical limitations Synthesis Agent - Combines all evidence into coherent, evidence-based narratives The agent can recognize common development patterns like "scaffold-then-fix," incremental refactoring, and bug-driven development, providing developers with instant explanations that would normally take hours of manual investigation. What used to be a $O(n \times m \times k)$ problem (where $n$ = commits, $m$ = issues, $k$ = CI jobs) becomes $O(1)$ - constant time instant response.

What it does

Codebase Archaeologist is a GitLab Duo Custom Agent that answers the fundamental developer question: "Why does this code look like this?" The agent uses a three-agent methodology to conduct comprehensive investigations:

How we built it

Phase 1: Python Prototype We started with a Python data pipeline implementing the three-agent methodology:

python

Three-agent parallel investigation

async def investigate(file_path: str, question: str): results = await asyncio.gather( git_history_agent.analyze(file_path, question), issues_mr_agent.investigate(file_path, question), ci_failure_agent.scan(file_path, question) ) return synthesize_evidence(results) The prototype successfully demonstrated that parallel investigation across multiple data sources could gather comprehensive evidence for code decisions.

Phase 2: GitLab Duo Native Solution When faced with API credit limitations, we pivoted to building a native GitLab Duo Custom Agent:

Agent Configuration: Created .gitlab/agents/archaeologist/config.yaml with detailed system prompt and digital archaeologist persona Flow Orchestration: Built .gitlab/flows/archaeologist-flow.yaml to coordinate the three-agent methodology Tool Integration: Configured GitLab's built-in tools for repository access, issues/MRs, and CI/CD data Pattern Recognition: Implemented identification for common development workflows Phase 3: Demo Implementation We used the redux-mock-store repository as a perfect test case, showing a classic "scaffold-then-fix" pattern with three commits from March 20, 2026 that demonstrate initial scaffolding followed by rapid configuration fixes.

Challenges we ran into

API Credit Limitations: The initial Python prototype hit a critical error: "Your credit balance is too low to access the Anthropic API." This constraint forced a complete architectural rethink and ultimately led to a superior native solution.

GitLab Duo Agent Platform Learning Curve: The GitLab Duo Agent Platform was new and required mastering YAML-based agent configuration, flow orchestration patterns, and tool integration specifics.

Evidence Synthesis Complexity: Combining information from three different data sources into coherent narratives required careful prompt engineering to ensure the agent could correlate commits with related issues and connect CI failures to code changes.

Pattern Recognition Implementation: Identifying development patterns from git data required understanding software engineering workflows and translating them into AI-recognizable patterns.

Accomplishments that we're proud of

Technical Innovation: Built entirely within GitLab ecosystem with no external API dependencies, demonstrating creative use of GitLab Duo Agent Platform.

Three-Agent Methodology: Developed a parallel investigation approach that provides comprehensive coverage across git history, issues/MRs, and CI failures.

Pattern Recognition: Successfully implemented identification for common development workflows like scaffold-then-fix, incremental refactoring, and bug-driven development.

Evidence-Based Responses: Every conclusion is backed by specific commits, issues, or pipeline data, ensuring credibility and accuracy.

Problem-Solving Achievement: Turned API credit limitation into a strength by building natively, making the solution more accessible and cost-effective.

Real Developer Impact: Solved a genuine problem that every developer experiences, transforming hours of manual investigation into instant AI-powered explanations.

What we learned

Native Integration Trumps External APIs: Building within GitLab eliminated dependencies and provided superior data access compared to external AI APIs.

Three-Agent Methodology Works: Parallel investigation across multiple data sources provides comprehensive coverage that sequential analysis misses.

Persona-Driven AI Matters: Giving the agent a clear "digital archaeologist" persona significantly improves response quality and consistency.

Constraints Drive Innovation: Limitations often lead to more elegant solutions - the API credit constraint forced a better architectural approach.

Evidence-Based AI is Crucial: Every conclusion must be backed by specific data for credibility in technical explanations.

Hackathon Requirements Are Features: Mandatory constraints can become competitive advantages when approached creatively.

What's next for CodeBase Archaeologist

Version 2.0 Features: Multi-repository analysis for cross-project dependency tracing, team pattern recognition to identify team-specific workflows, automated documentation generation, IDE integrations for VS Code and JetBrains, and performance metrics with codebase health scoring.

Enterprise Capabilities: Compliance analysis for SOC2 and GDPR impact tracing, security audit trail documentation, cost analysis with technical debt quantification, and onboarding assistant for new developer orientation.

Platform Expansion: Beyond GitLab to GitHub integration, multi-language code analysis support, custom pattern definition for teams, and RESTful API for integration with other development tools.

Community & Open Source: Pattern library for community-contributed development patterns, agent marketplace for sharing custom agents, academic partnerships for software engineering research, and training programs for codebase investigation best practices.

The vision is to make Codebase Archaeologist an indispensable tool for software development teams, preserving institutional knowledge and making codebase understanding accessible to everyone - from junior developers to senior architects.

Built With

gitlab
gitlabdau
python
yaml

Updates

Hack codes started this project — Mar 25, 2026 07:57 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.