Rootcause Ai

Inspiration

Ops tools often feel like a black box during incidents. You get alerts, but no clear story of why things broke. I wanted to build something transparent, trustworthy, and demo-ready — where every AI-generated insight is backed by real evidence and clear visuals.

What it does

RootCause AI is an intelligent AIOps assistant that:

Diagnoses incidents with an LLM-guided, schema-validated RCA.
Links every causal step to concrete logs, metrics, commits, or bug reports.
Renders interactive causal chains in a no-code Streamlit UI.
Detects real-time anomalies and predicts CPU, memory, or response-time issues before they escalate.

How we built it

Analyzer Core → Normalizes events, builds a RAG prompt, validates JSON, ranks hypotheses, and maps them to evidence.
Connectors → Logs, GitHub commits, metrics (CSV/JSON), bug reports, and Datadog live metrics.
No-Code UI → Streamlit app for provider selection, demo mode, anomaly & prediction panels, and causal chain visualization.
Simulation Mode → Prebuilt incidents (e.g., DB deadlock) for demoing without external data.
Visualizations → NetworkX + Plotly interactive graphs with PNG export.

Challenges we ran into

Getting consistent JSON outputs from different LLMs.
Normalizing timestamps across logs, metrics, and commits.
Balancing anomaly sensitivity: early warnings vs. false positives.
Designing graphs that are both clear and information-dense.

Accomplishments we’re proud of

Built an end-to-end RCA demo that anyone can run in minutes.
Delivered auditable, evidence-linked AI outputs instead of black-box guesses.
Made anomaly detection and predictions simple, fast, and explainable without heavy ML.
Created a forkable, extensible reference project for any DevOps team.

What we learned

Schema-validated JSON makes LLM outputs reliable.
RCA prompts must include timestamps, severities, and diffs for accuracy.
Simple stats (z-scores, trends) can outperform ML for clarity and speed.
Transparency builds trust — teams adopt AI faster when they see why it reached a conclusion.

What’s next for RootCause AI

Add connectors for Kubernetes, cloud logs, and CI/CD pipelines.
Improve multi-incident correlation to uncover cross-outage patterns.
Enhance prediction models with lightweight ML for long-term forecasting.
Package as a plug-and-play open-source tool for easy adoption.

Built With

and-requests-for-the-web-stack-and-no?code-experience.-numpy-and-pandas-for-rolling-stats
datadog
github
matplotlib
netwokx
numpy
openai
pandas
python
streamlit-ui
z?scores

Updates

Garvit Haswani started this project — Sep 11, 2025 04:28 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.