🚀 RepoPilot — Vectorized CI/CD Intelligence System

Transforming CI/CD from reactive debugging into a continuously learning system using AI agents + semantic memory.


📓 Abstract

CI/CD pipelines repeatedly fail for familiar reasons, yet most systems treat each failure as new.

RepoPilot introduces a semantic memory layer that allows pipelines to learn from past fixes.

By combining AI agents with vector-based retrieval, CI debugging becomes faster, smarter, and increasingly autonomous.


📍 The Problem — CI Has No Memory

In most development teams, CI failures follow a predictable pattern:

  1. A build fails
  2. A developer investigates
  3. The issue is fixed
  4. The solution disappears into commit history

Weeks later, a similar failure appears again.

  • The logs exist
  • The fix exists
  • But the connection between them is lost

CI systems detect failures efficiently — but they do not understand them.

That gap is where RepoPilot begins.


🎯 Core Idea — Create a Semantic Memory Layer

Instead of treating CI logs as disposable text, RepoPilot stores:

  • Raw CI failure logs
  • AI-generated root cause analysis
  • Code changes that fixed the issue
  • Repository metadata
  • Pull request links
  • Timestamp

When a new build fails, RepoPilot searches past failures by meaning — not just keywords.


🤖 What is RepoPilot?

RepoPilot is an AI-powered CI/CD fix agent that:

  • 🎧 Listens to GitHub webhook events
  • 🔍 Monitors CI workflow failures
  • 🤖 Analyzes logs using AI agents
  • 🛠 Generates code fixes
  • 🔁 Opens pull requests automatically
  • 🧠 Stores failure + fix history in Elasticsearch

This transforms CI from reactive debugging into a continuously learning system.


🧠 Core Innovation — Semantic Failure Memory

Each CI failure is stored as a structured document containing:

  • error_text (semantic_text field)
  • AI-generated root cause analysis
  • Code changes that fixed the issue
  • Repository metadata
  • Pull request URL
  • Timestamp

Instead of keyword matching, RepoPilot uses semantic similarity search.

This means it can retrieve failures that mean the same thing — even if the logs differ syntactically.


🏗 Architecture Overview

🔹 Step 1 — GitHub Webhook Trigger

When a workflow fails:

  • GitHub sends a webhook event
  • RepoPilot receives it via FastAPI
  • Logs and metadata are collected

🔹 Step 2 — AI Analysis with CrewAI

RepoPilot runs CrewAI agents to:

  • Parse CI logs
  • Identify root cause
  • Generate patch suggestions
  • Provide fix explanation

This makes the response structured and actionable.


🔹 Step 3 — Semantic Indexing in Elasticsearch

The failure is stored using a semantic_text field.

This enables:

  • Automatic vector embeddings
  • Meaning-based similarity matching
  • Fast top-K retrieval of related past failures

🔹 Step 4 — Retrieval on New Failure

When a new build fails:

  • Logs are trimmed and submitted as a semantic query
  • Elasticsearch returns top similar fixes
  • RepoPilot uses those results to improve fix generation

The more failures stored, the smarter the system becomes.

This creates a continuous learning feedback loop.


⚙️ Tech Stack

  • Backend: FastAPI
  • AI Agents: CrewAI
  • Vector Search: Elasticsearch (semantic_text)
  • Integration: GitHub App + Webhooks
  • Automation: Pull Request generation

🏆 Key Achievements

  • Built a fully functional GitHub App
  • Implemented AI agent-based CI log analysis
  • Designed a semantic memory architecture
  • Enabled automated PR generation from CI failures
  • Created a self-improving debugging loop

🚀 Future Improvements

  • CI-specific fine-tuned embeddings
  • Cross-repository failure intelligence
  • Slack / Teams integration
  • Patch confidence scoring
  • Enterprise dashboard analytics

🔗 Links


🧠 Vision

RepoPilot turns CI from a reactive system into a learning organism.

Instead of asking:

"Why did this fail again?"

The system responds:

"I've seen this before. Here's the fix."


⭐ If you like this project, consider giving it a star!

Built With

Share this project:

Updates