Elastic Contributor Co-Pilot

Inspiration

Large open-source and enterprise GitHub repositories receive hundreds of pull requests and issues every month. While this scale drives innovation, it also introduces friction. New contributors often struggle to determine whether their work duplicates existing efforts, which coding conventions apply, or who should review their changes. Maintainers, meanwhile, spend significant time repeating manual triage steps — searching for similar issues, reviewing changes against standards, assessing impact, and resolving reviewer disagreements.

We were inspired to design a system that automates this repetitive triage workflow while preserving quality, explainability, and repository flexibility. The goal was not just automation, but intelligent, context-aware assistance that works across any repository.

What It Does

Elastic Contributor Co-Pilot is a configurable, multi-agent AI system that automates pull request and issue triage for GitHub repositories.

When a contributor opens a PR or issue, a GitHub webhook triggers a pipeline of specialized AI agents built using Kibana Agent Builder. Each agent performs a focused reasoning task — retrieving context, validating standards, estimating impact, and resolving conflicts. The system generates a structured, explainable report and posts it directly as a GitHub comment, enabling contributors and reviewers to make faster, better-informed decisions.

Because the architecture is repository-agnostic, it can adapt to different coding standards, benchmarks, and workflows.

How We Built It

We implemented a chained multi-agent architecture using Elastic Agent Builder, with each agent responsible for a well-defined task:

Agent 1 – Context Retriever: Uses ELSER semantic search to retrieve related past issues, pull requests, and discussions based on meaning rather than keyword overlap. It also identifies potential reviewers using repository metadata such as CODEOWNERS.

Agent 2 – Architecture Critic: Analyzes PR diffs against configurable coding and architectural standards, flagging potential violations with severity indicators and quality scores.

Agent 3 – Impact Quantifier: Executes ES|QL queries over historical metrics — such as test results or benchmark data — to estimate regression risk and behavioral changes introduced by the PR.

Agent 4 – Conflict Resolver: Monitors discussion threads for reviewer disagreements, searches historical resolution patterns, and suggests data-backed consensus recommendations.

All agents are orchestrated within Agent Builder using tools spanning Search, ES|QL, and workflows. A real-time dashboard provides visibility into agent execution and outputs.

Challenges We Faced

Designing prompts that consistently directed each agent to the correct tools — without overlapping responsibilities — required careful iteration and refinement. Additionally, normalizing tool responses across repositories with varying structures and conventions demanded robust schema handling and defensive system design.

What We Learned

Decomposing a complex workflow into specialized agents significantly improves reliability, clarity, and maintainability compared to relying on a single monolithic LLM. Semantic search consistently outperforms keyword search for historical context discovery, and Agent Builder enables rapid experimentation with structured, production-ready AI pipelines.

Accomplishments

We built a reusable, repository-agnostic multi-agent system that reduces manual triage effort from tens of minutes to under a minute per PR or issue. More importantly, we demonstrated how explainable, tool-driven AI systems can scale developer workflows without sacrificing transparency.

What’s Next

Next steps include repository-specific agent templates for faster onboarding, deeper CI/CD and issue-labeling integrations, and feedback-driven learning loops informed by maintainer actions.