The Pit Crew Chief: A Hybrid AI Defect Manager
Codegeist 2025 Submission - Atlassian Williams Racing Edition
1. Executive Summary /Inspiration
In high-performance racing, a mechanical failure isn't just bad luck; it's a breakdown in the process. The same applies to software. "The Pit Crew Chief" is an AI Agent that treats Defects not as isolated incidents, but as "Translation Errors" between High-Level Requirements and Low-Level Implementation and adding a Senior Engineer Brain (Google Novelity) and SOTA ARC Prize Techniques.
2. What it does
Unlike standard RAG chatbots that simply summarize Jira tickets, this agent acts as a Translator, Analyst, and Manager to autonomously:
Trace defects back to their originating requirement using Statistical Machine Translation (SMT).
Diagnose the root cause using Causal Logic (Python Code Gen).
Assign the fix to the most qualified engineer using Neural Memory.
3. How we Build it (The Core Philosophy: "Bugs are a Foreign Language")
We frame Defect Management as a Machine Translation problem, inspired by Brand new Google Translation Innovations regarding Linguistic Gap (High Level vs rural langauges).
** High-Resource Language: Requirements (Jira Stories). Abundant, structured, "Hindi"**
Low-Resource Language: Defects (Logs, Error Codes). Scarce, cryptic, "Gondi"
The Mission: To build a Unified Translation Model that aligns these two worlds using Statistical Machine Translation (SMT), treating "Latency Requirement" and "TimeoutException" as synonymous concepts in different languages.
3. How we Build it (Architecture Overview)
Our system is a Hybrid Agent composed of three specialized independent modules.
Module A: The Translator (Traceability)
Goal: Rebuild missing links between Defects and Requirements.
Technology: Statistical Machine Translation (SMT).
We use an IBM Model 1 approach (Dice Coefficient + EM Algorithm) to learn a probabilistic lexicon.
It learns that the word "Latency" in a Requirement statistically correlates with "504 Gateway Timeout" in a Defect Log.
Benefit: Works on small datasets ("Low Resource") where Deep Learning fails.
Data Strategy: Pre-trained on SEOSS 33 (Hibernate Project) to learn general open-source code/issue translation patterns.
Module B: The Analyst (Root Cause)
Goal: Diagnose why the defect occurred using hard data.
Technology: Causal RAG (Python/Pandas Approach). SOTA in https://arcprize.org/ & https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-3 with outperformance other RAG approaches massivly.
Step 1: Code Gen: The Agent generates Python (Pandas) code to extract facts from the defect data (e.g., "Calculate p99 latency from logs").
Step 2: Execution: The code runs in a sandbox to get the exact number (e.g., "1200ms").
Step 3: Consultant: A secondary LLM compares the Fact (1200ms) vs. the Requirement (<500ms) to output a verdict ("Compliance Violation").
Data Strategy: Fine-tuned on iTrust/LibEST (Safety-Critical datasets) to learn strict compliance logic.
Module C: The Manager (Assignment)
Goal: Assign the fix to the "Context Owner."
Technology: Titans Neural Memory (mcp-titan).
Mechanism: It uses a Surprise Metric to learn team patterns over time.
Process: It watches the project stream. It "memorizes" that "Jane fixed the last 3 billing bugs."
Recall: When a new billing bug appears, it queries its Long-Term Memory (not just Jira history) to find the implicit owner.
Data Strategy: Validated against AIDev to distinguish between human errors and AI-generated hallucinations.
##4. Challanges & Accomplishments that we're proud of
Titans & Miras (https://research.google/blog/titans-miras-helping-ai-have-long-term-memory/) is a game changing Revolution. The problem this is theoretical as not released by google yet. We "pre-released" the first application using this Brand new approach.
5. What's next for The Pit Crew Chief
Regarding my google search and found studies developers spend between 30% and 50% of their time debugging and fixing bugs or $2.41 trillion annually in the US. For older, complex industrial systems, this can spike as high as 75%. In bigger companies managing and rooting this bugs is a major challage. We hope to help to make a a significant productive improvement with "The Pit Crew Chief". Bringing it to atlassian plattform before it even hit the mainstream.
6. Technology Stack
Platform: Atlassian Forge (Custom UI + Rovo Agent (not in Demoas as not in licence but integrated in code)).
Translation Engine: Custom Python SMT Class.
Logic Engine: Ollama/OpenAI GPT-4o (generating Python & Consulting).
Memory Engine: Model Context Protocol Server
Execution: Local Python Runtime (Tunnelled to Forge).
The "How to Install" & Activation Guide
Install the Forge AppInstallation Link: Click here to install onto your Jira instance.Permissions: The app requires access to storage:app and external fetch permissions for *.ngrok-free.app and api.openai.com to communicate with the Hybrid Brain API.
Activate the Senior Engineer Brain (The "Offline" Part)To witness the SOTA Statistical Machine Translation (SMT) and Titans Neural Memory logic, judges must run the local backend: Clone the Repo: Download the code from the GitHub Repository. Run the API: Execute the Python "Brain API" script locally as described in the README. Bridge the Connection: Use an ngrok tunnel to link the local API to your Forge environment.
Rovo-Ready InterfaceInterface: The rovoActionHandler is fully mapped to the Translator, Analyst, and Manager modules.Actions: Even if Rovo is not enabled on your site, the manifest.yml defines active actions for find-link, diagnose-cause, and recommend-assignee.🧠 Reflecting Console Demo: Hybrid ArchitectureHow We Built ItThe Hybrid Engine: We built a Hybrid Agent using Atlassian Forge for the Jira integration and a local Python "Brain API" for heavy lifting (SMT and Titans Memory).
Future-Proof Design: Because Rovo was unavailable in our license, we implemented the rovoActionHandler as a future-proof interface while using the test_translator.js suite to prove the AI's efficacy.Challenges & Intelligent FallbacksLicensing Constraints: We navigated the lack of Rovo access by building a robust console-based validation system that demonstrates our logic in real-time.
Resilient Logic: If the local Brain API is offline, the app autonomously switches to Intelligent Fallback Mode:Translator: Switches from SMT to Bag-of-Words Vector Similarity.Analyst: Falls back to Rule-based Causal Analysis (e.g., Latency vs. Requirement limits).Manager: Reverts to a weighted historical context formula:$$FinalScore = (HistoricalScore \times 0.7) + (AvailabilityScore \times 0.3)$$
Built With
- atlassian-forge
- custom-python-smt-class
- local-python-runetime
- memoryengine
- ollama
- openai
- smt
Log in or sign up for Devpost to join the conversation.