Inspiration
Every DevOps engineer has a war story — the 2 AM PagerDuty call, the production cluster that silently died under a memory spike, the release that looked clean in staging and destroyed latency in prod. We kept asking: why do we only find out after the damage is done? Tools like Grafana and Datadog are brilliant — but they're ambulances, not seatbelts. Sentinel AI was born from that frustration. We wanted to build the "Grammarly for deployments" — something that flags the risk before you hit merge, not after your users start filing tickets.
What it does
Sentinel AI is a pre-deployment guardrail that integrates directly into your git branches and Slack workspace. Before any release goes live, it cross-references your PR commits against live Prometheus metrics (CPU, memory, latency), open Jira bug backlogs, and a historical incident database to compute a weighted Risk Score across four dimensions — Memory Load, Bug Backlog, Cluster Latency, and Historical Similarity (25 points each). Based on that score, it blocks or approves the release and, if risky, auto-generates a canary traffic split schedule (10% → 25% → 50% → 100%) with health checkpoints. It also ships an Incident Time Machine that reconstructs past outage timelines and an RCA Scanner that pinpoints the exact code trigger behind a failure.
How we built it
We built Sentinel AI on the MERN stack (MongoDB Atlas, Express, React, Node.js) in TypeScript for the Slack Agent Builder Challenge. The backend exposes four core API routes (/analyze-release, /explain-outage, /deployment-advice, /investigate) powered by a Gemini AI Engine with a sequential model rotation queue (gemini-2.5-flash → gemini-1.5-flash → gemini-1.5-pro) to survive free-tier rate limits. Historical incidents and runbooks are stored in MongoDB Atlas and retrieved via RAG for contextual risk analysis. A regex-based Security Masker Proxy middleware strips all PII, credentials, IP addresses, and webhook tokens before anything reaches the AI layer. The frontend is a Vite + React SPA that includes a Slack Block Kit Simulator — a live chat interface that demos exactly how Sentinel's interactive cards and action buttons appear inside an actual Slack workspace.
Challenges we ran into
API rate limits were brutal. Running Gemini on free tier meant a single burst of analysis requests would exhaust quota mid-demo. We solved this with the three-model rotation queue and a fallback environment key so the system degrades gracefully instead of crashing. Prompt compression was the other beast — feeding raw commit diffs, Jira backlogs, and Prometheus metrics into a single LLM context blew token limits fast. We had to engineer a token-compression layer that strips verbose JSON keys down to short-key schemas while preserving enough semantic signal for the risk score to stay accurate. Getting the weighted risk formula tuned so it didn't produce false positives on every deploy took several iteration cycles of testing against seeded incident data.
Accomplishments that we're proud of
The Slack Block Kit Simulator is something we're genuinely proud of — it gives judges and users an authentic feel for the in-Slack experience without needing to configure an actual Slack app installation. The three-layer model rotation queue keeping the system alive through aggressive demo sessions was a real engineering win. Most importantly, the risk score formula — four equal 25-point weights that map directly to the four most common production failure causes — is deceptively simple but meaningfully accurate against the historical dataset we seeded.
What we learned
RAG over incident runbooks is dramatically more useful than generic LLM prompting for DevOps use cases — grounding the model in your actual past failures is what makes the recommendations actionable rather than generic. We also learned that security masking can't be an afterthought; it has to be the first middleware in the chain, not the last. And building a convincing simulator (the Slack Block Kit UI) taught us that demo experience is a product feature in itself.
What's next for Sentinel AI
Real GitHub Webhook integration — so the risk analyzer fires automatically on every PR open event, not just on manual Slack commands. Native Jira OAuth so teams can connect live bug boards instead of seeded mock data. A Release Calendar feature that predicts the safest deployment windows based on historical incident frequency by day and time. And expanding the RAG corpus to ingest runbooks directly from Confluence and Notion, so Sentinel's recommendations are always grounded in a team's own documented playbooks.
Built With
- bcrypt
- csv-parser
- dotenv
- express-middleware
- express.js
- gemini-1.5-flash
- gemini-1.5-pro
- gemini-2.5-flash
- github-api
- google-gemini-ai
- javascript
- jira-api
- json
- jwt
- langchain
- mcp
- mern
- mern-stack
- mongodb
- mongodb-atlas
- mongoose
- multer
- node.js
- prometheus
- python
- rag
- react
- regex
- render
- rest-api
- slack-api
- slack-block-kit
- spa
- typescript
- typescript-esm
- vite
- vite-proxy
Log in or sign up for Devpost to join the conversation.