Inspiration

Every engineer who has been on call knows the feeling: it's 3 AM, production is down, and you're frantically grepping through wikis, Slack threads, and half-outdated runbooks trying to remember how you fixed this last time. The knowledge exists somewhere — it's just scattered and slow to reach when every minute of downtime costs money.

At the same time, AI chatbots are everywhere now — but you can't trust a tool that confidently makes things up when production is on fire. We wanted an assistant that's both fast and provably trustworthy: it answers instantly, and it shows you exactly which document the answer came from. That combination — speed plus grounded transparency — became SentinelOps.

What it does

SentinelOps is an autonomous SRE (Site Reliability Engineering) agent that diagnoses production incidents using your own runbook library.

  • Ask in plain English — "How do I fix Redis cache key eviction during high traffic?" — and the agent retrieves the most relevant runbooks and answers from them.
  • Grounded, not hallucinated — every response shows a MongoDB Atlas Vector Search Results panel with the matched runbooks and their cosine-similarity scores, so you can verify the source.
  • Persistent memory — Gemini synthesizes a per-user memory profile from conversation history, stored in MongoDB Atlas.
  • Ingestion pipeline — paste any runbook and it's embedded into a 768-dimensional vector and written straight into Atlas, instantly searchable.
  • Dual models — switch between Gemini 2.5 Flash (fast) and Pro (deeper reasoning), both backed by the same Atlas grounding.
  • Observability hooks — an authenticated webhook endpoint lets monitoring tools trigger autonomous diagnosis.

How we built it

The foundation is MongoDB Atlas, which powers three live collections — users, sessions, and knowledge_vectors.

The retrieval flow:

  1. A user's question is embedded with Google's text-embedding-004 model into a 768-dimensional vector.
  2. MongoDB Atlas Vector Search runs a $vectorSearch aggregation, ranking runbooks by cosine similarity:

$$\text{similarity}(\mathbf{q}, \mathbf{d}) = \frac{\mathbf{q} \cdot \mathbf{d}}{\lVert \mathbf{q} \rVert \, \lVert \mathbf{d} \rVert}$$

  1. The top matches are injected into Gemini 2.5's context as grounding, and the matched titles + scores are surfaced in the UI.

Stack:

  • Backend: Python + Flask REST API, with four Vertex AI tools (search_knowledge_base, load_user_memory, save_chat_history, execute_mongodb_mcp_tool).
  • AI: Google Gemini 2.5 Flash & Pro via Vertex AI; text-embedding-004 for embeddings.
  • Database: MongoDB Atlas (M0 free tier) with a cosine-similarity vector index on 768-dim embeddings.
  • MCP: The official MongoDB MCP Server (25 tools) wired to Gemini over JSON-RPC 2.0, so the agent can run live database operations.
  • Infra: Containerized with Docker (Python + Node.js), deployed serverless on Google Cloud Run, with Cloud Logging and GCS runbook backups.
  • Frontend: A vanilla-JS glassmorphic dashboard on GitHub Pages — incident command, diagnostic chat, a live MongoDB Memory Core explorer, and a runbook ingester.

Challenges we ran into

  • Running the MCP server inside Cloud Run. The MongoDB MCP server is a Node.js subprocess, while our app is Python. Getting both runtimes into one container — and handling the cold-start timeout before the MCP handshake completed — took a multi-runtime Dockerfile and a threaded, timeout-guarded initialization with graceful fallback.
  • A silent AI-synthesis bug. Our memory-summary feature was calling an undefined GenerativeModel, which threw an exception that got swallowed by a try/except — so it silently fell back to a templated string instead of real AI output. We caught it by inspecting the live database and switched to the unified genai client.
  • Securing a public webhook. Our alert endpoint triggered a Gemini call on every request — an open door for quota abuse. We added X-Webhook-Secret authentication to lock it down.
  • "Committed" ≠ "deployed." More than once we fixed code, pushed to Git, and the live behavior didn't change — because Cloud Run was still serving the old revision. We learned to verify against the live endpoints, not the repo.
  • Vector index naming. Atlas Vector Search silently returns nothing if the index name in code doesn't match the one created in the Atlas UI — a subtle gotcha we now document explicitly.

Accomplishments that we're proud of

  • Grounding you can see. The similarity-score panel turns "trust me" into "here's the proof" — exactly what makes an AI agent usable for real incident response.
  • A genuinely live, end-to-end system — deployed on Cloud Run, with real Atlas vector search, real Gemini answers, and a real ingestion write-path, all working together.
  • Honest by design. Simulated/demo panels are clearly badged, and our stats are all verifiable facts — no inflated metrics.
  • Real partner integration — not just storing data in MongoDB, but using Atlas Vector Search and the official MongoDB MCP Server as core, load-bearing parts of the agent.

What we learned

  • RAG is only as trustworthy as its transparency. Surfacing the retrieved sources and scores changed the product from "a chatbot" into "a tool an engineer would actually rely on."
  • Vector search quality lives in the details — embedding model choice, index configuration, numCandidates, and similarity metric all materially affect results.
  • Tool-use orchestration is powerful but fragile — subprocess lifecycles, timeouts, and silent fallbacks need deliberate handling, especially serverless.
  • Demo integrity matters. Closing the gap between what we claimed and what the code did made the whole project stronger.

What's next for SentinelOps: Autonomous SRE & DevOps Incident Command Portal

  • Real integrations to replace the simulated panels — live Dynatrace/Datadog alert ingestion and genuine GitLab/GitHub merge-request creation for AI-generated hotfixes.
  • Multi-tenant runbook libraries so teams can ground the agent on their own private documentation.
  • Auto-ingestion of runbooks from existing wikis, Confluence, and Git repos.
  • Feedback loop — let engineers rate answers so retrieval and synthesis improve over time.
  • Proactive incident response — chaining Atlas-grounded diagnosis with automated remediation actions, with a human approval gate.

Built With

Share this project:

Updates