Inspiration
Ops tools often feel like a black box during incidents. You get alerts, but no clear story of why things broke. I wanted to build something transparent, trustworthy, and demo-ready — where every AI-generated insight is backed by real evidence and clear visuals.
What it does
RootCause AI is an intelligent AIOps assistant that:
- Diagnoses incidents with an LLM-guided, schema-validated RCA.
- Links every causal step to concrete logs, metrics, commits, or bug reports.
- Renders interactive causal chains in a no-code Streamlit UI.
- Detects real-time anomalies and predicts CPU, memory, or response-time issues before they escalate.
How we built it
- Analyzer Core → Normalizes events, builds a RAG prompt, validates JSON, ranks hypotheses, and maps them to evidence.
- Connectors → Logs, GitHub commits, metrics (CSV/JSON), bug reports, and Datadog live metrics.
- No-Code UI → Streamlit app for provider selection, demo mode, anomaly & prediction panels, and causal chain visualization.
- Simulation Mode → Prebuilt incidents (e.g., DB deadlock) for demoing without external data.
- Visualizations → NetworkX + Plotly interactive graphs with PNG export.
Challenges we ran into
- Getting consistent JSON outputs from different LLMs.
- Normalizing timestamps across logs, metrics, and commits.
- Balancing anomaly sensitivity: early warnings vs. false positives.
- Designing graphs that are both clear and information-dense.
Accomplishments we’re proud of
- Built an end-to-end RCA demo that anyone can run in minutes.
- Delivered auditable, evidence-linked AI outputs instead of black-box guesses.
- Made anomaly detection and predictions simple, fast, and explainable without heavy ML.
- Created a forkable, extensible reference project for any DevOps team.
What we learned
- Schema-validated JSON makes LLM outputs reliable.
- RCA prompts must include timestamps, severities, and diffs for accuracy.
- Simple stats (z-scores, trends) can outperform ML for clarity and speed.
- Transparency builds trust — teams adopt AI faster when they see why it reached a conclusion.
What’s next for RootCause AI
- Add connectors for Kubernetes, cloud logs, and CI/CD pipelines.
- Improve multi-incident correlation to uncover cross-outage patterns.
- Enhance prediction models with lightweight ML for long-term forecasting.
- Package as a plug-and-play open-source tool for easy adoption.
Log in or sign up for Devpost to join the conversation.