💡 Inspiration

Every developer knows the exact moment of friction: you push code, grab a coffee, come back to your desk, and your continuous integration pipeline is flashing bright red. You leave your code editor, jump into a massive wall of raw terminal logs, spend minutes hunting down a missing dependency or a syntax lint error, manually write a hotfix patch, create a branch, push it, and wait. Again.

We got tired of that repetitive, focus-breaking loop. That frustration is what sparked Axolotl—an autonomous, self-healing software engineering agent designed to eliminate the manual cycle of diagnosing and repairing common pipeline failures. Our goal was to build a system that acts in under a minute, doing the heavy lifting of failure log parsing and patch generation automatically, while keeping the human engineer firmly in control of the final deployment.


🎯 What it Does

Axolotl operates as an event-driven reactive agent that monitors GitLab CI/CD pipelines in real time. The moment a pipeline status updates to failed, Axolotl triggers its execution lifecycle:

  • Webhook Ingestion & Validation: Intercepts pipeline alerts via secure webhooks, immediately running cryptographic verification checks and architectural circuit breakers.
  • Log Ingestion & Slicing: Dynamically connects via a Model Context Protocol (MCP) tool layer to pull raw failed job traces into memory buffers.
  • AI Root-Cause Diagnostics: Dispatches the aggregated log slices, project configurations, and file schemas to Google Gemini 2.5 Flash to generate a deep technical diagnostic analysis.
  • Deterministic Remediation: Produces a structured patch, provisions an isolated branch namespace ($\texttt{axolotl/fix/}\langle\text{pipeline_id}\rangle$), commits the code modifications, and opens a fully documented Merge Request.
  • Human-in-the-Loop Enforced Gate: Locks the deployment behind a strict, non-bypassable validation gate on a real-time WebSocket dashboard, leaving the final Approve & Merge or Reject decision to a human developer.

🛠️ How We Built It

Axolotl is built using a decoupled, highly responsive event-driven micro-architecture designed for low latency and continuous observability:

  • The Asynchronous Broker Backend: Developed using FastAPI and Uvicorn in Python 3.12. The core logic utilizes asynchronous background tasks to handle webhook ingestion instantaneously, preventing timeouts during heavy concurrent build phases.
  • Agent Reasoning & Tool Use: Orchestrated via the Google ADK (Agent Development Kit). Instead of using raw, error-prone REST API integrations, we implemented the modern Model Context Protocol (MCP) via stdio transport. The agent leverages specific tools to query logs and manipulate the target git structures seamlessly.
  • Real-Time Observability Frontend: Built using Next.js 16 (App Router), TypeScript 5.7, and Tailwind CSS 4. The interface opens a native persistent WebSocket connection back to the backend server, transforming incoming events into a live, 8-stage visual progress timeline accompanied by a raw terminal-style log feed.
  • Persistence Layer: Powered by MongoDB Atlas. We created clean, schema-enforced collections utilizing async Motor drivers to maintain historical metrics, system events, user settings, and project states.

🚧 Challenges We Ran Into

  • Bridging AI Intentions with Git Mechanics: The hardest architectural hurdle was integrating the GitLab MCP layer with our agent workflow loop. The LLM lives in a world of abstract reasoning—outputting a FixProposal containing a text narrative of a root cause and a string of updated content. GitLab lives in a rigid world of atomic API states, remote tree refs, blobs, and commit validation rules. Translating abstract AI patches into deterministic Git trees, while handling structural multi-line log chunk slicing without losing context, required intensive engineering iteration.
  • Schema Drift in Parallel Workflows: Building the backend event publisher, the agent decision tree, and the frontend terminal layout concurrently meant that any accidental alteration to shared data models could cause a cascading break across our development team. To combat this, we treated data structures like PipelineFailure, FixProposal, and Event as immutable contracts. We established a strict rule: zero schema modifications were permitted without explicit team review.

🏆 Accomplishments We're Proud Of

  • Sub-Minute Mean Time to Repair (MTTR): We successfully reduced the time window from initial GitLab webhook failure ingestion to a fully provisioned, patched, and open Merge Request to less than 60 seconds.
  • Bulletproof Safety Circuit Breakers: We built a mathematical guard system ensuring the agent can never trigger infinite self-referential failure loops. The engine automatically rejects processing failures that originate within its own branch regex boundaries ($^ \text{axolotl/fix/}.*$).
  • Elimination of the "Black Box" Problem: By building a native WebSocket pipeline, we mapped the agent's interior lifecycle directly onto an elegant dark-themed UI timeline, keeping every granular operational trace fully visible to the developer.

🧠 What We Learned

  • Prompt Engineering is a Discipline of Precision: Prompt design for deterministic code repair cannot rely on generic reasoning instructions. Specificity, explicit JSON output constraints, and absolute structural parsing declarations matter enormously.
  • MCP Extensibility Replaces Legacy Wrappers: Leveraging Model Context Protocol client-server structures makes external platform automation dramatically cleaner, safer, and more modular than manually coding legacy REST API wrapper methods.
  • Observability is Mandatory for Agents: In autonomous workflows, traditional line-by-line debugging falls short. Integrating pluggable tracing infrastructure via Arize Phoenix was critical; without deep trace observability, debugging the agent's conceptual pathways and confidence shifts would have been nearly impossible.

🔮 What's Next for Axolotl

While our MVP successfully automates standard developer pain points like missing dependency injection (ModuleNotFoundError), code styling violations, and simple lint adjustments, our architecture is intentionally configured to expand:

  • Dynamic Confidence Scoring: We aim to implement a runtime evaluation algorithm that automatically rejects patches falling below a calculated metric threshold, ensuring low-confidence proposals are completely filtered out before reaching the UI:

$$C_{\text{calculated}} < C_{\text{threshold}} \implies \text{Drop FixProposal}$$

  • Reinforcement Learning from HITL Actions: We plan to store human validation histories (Approve vs. Reject telemetry tags) to construct localized training sets, allowing the underlying agent models to learn specific code preference nuances per repository over time.
  • Multi-Platform Proliferation: We are extending our base orchestrator core beyond GitLab webhooks to support native GitHub Actions, Bitbucket Pipelines, and customized enterprise CI systems.

Built With

Share this project:

Updates