Inspiration
SRE teams spend 70%+ of incident time on diagnosis rather than resolution. We asked: what if AI could predict failures before they happen, investigate them autonomously, and deliver a fix — all before users are impacted?
What it does
PRISM — a full-stack platform that connects Splunk observability data (via the Splunk MCP Server), Cisco's Deep Time Series Model for predictive anomaly detection, and Google Gemini AI for multi-agent reasoning. It predicts incidents, investigates root causes through 5 specialized AI agents, and generates remediation Pull Requests on GitHub automatically.
How we built it
The backend is Fastify + TypeScript orchestrating a multi-agent pipeline. Metrics flow from Splunk through MCP, get split into coarse/fine temporal contexts, and are scored by CDTSM using:
$$\text{score} = (\text{trend_acceleration} \times 50) + (p90_\text{divergence} \times 30) + 20$$
Agents stream results via SSE in real-time. The frontend is React 19 with TanStack Router.
Challenges we ran into
Designing the CDTSM context-splitting strategy for meaningful predictions, orchestrating 5 agents with interdependent outputs without blocking, and creating a GitHub remediation pipeline that produces reviewable code — not just suggestions.
Accomplishments that we're proud of
- Built a multi-agent incident investigation platform that combines Splunk operational data, AI reasoning, and GitHub workflows.
- Successfully traced incidents back to the most likely pull request using telemetry, deployment, and code change correlation. Implemented human-in-the-loop remediation, allowing engineers to review AI recommendations before creating fix PRs.
- Integrated Splunk MCP to enable agents to investigate incidents directly from operational data rather than static datasets.
- Added predictive reliability capabilities using time-series forecasting to identify potential issues before they become critical incidents.
What we learned
MCP as an integration protocol dramatically simplifies AI ↔ data-source communication. Multi-agent architectures shine when each agent has a narrow, well-defined scope with clear inputs/outputs.
What's next for PRISM
Enhance predictive incident prevention with advanced forecasting and anomaly detection models.
Log in or sign up for Devpost to join the conversation.