🧠 Description

AutoDevOps Doctor is an autonomous DevSecOps SRE companion that actively monitors database/application telemetry, diagnoses root causes, generates unified commit diffs, and opens/tracks GitLab Merge Requests automatically—built with Vertex AI (Gemini 2.5 Flash) and Python.


💡 Inspiration

System outages and deployment failures are the bane of every engineering team. SREs face alert fatigue, spending hours diagnosing simple configuration mismatches (such as setting MySQL port 3306 on a PostgreSQL instance) or trivial application bugs (like uncaught ZeroDivisionError statements).

We built AutoDevOps Doctor to act as a tireless virtual SRE teammate. It instantly connects telemetry alerts with the exact files in your GitLab registry, writes the fix, computes a diff preview, opens the Merge Request, and polls its status live.


🛠️ What it does

AutoDevOps Doctor provides a complete closed-loop diagnostics and remediation experience:

  1. Dynamic Telemetry Monitor: Loads open incidents from an SQLite registry. SREs can view incident details (ID, service, logs) or select specific items to remediate from a dashboard selectbox.
  2. Multi-Turn AI Diagnostics (Gemini 2.5 Flash): Connects to Google Vertex AI. Gemini reasons about the error log, determines what parameters are broken (e.g. database ports or zero division bugs), reads the repository file, and commits the fix.
  3. Commit Diff Preview: Computes a unified diff utilizing python's difflib and displays it instantly in Streamlit prior to/during MR creation.
  4. Duplicate MR Prevention: Checks the live GitLab project before pushing new commits to prevent creating redundant branches or duplicate MRs if the remediation is run multiple times.
  5. Live GitLab MR Tracker: Querying the GitLab API, it renders a custom CSS/HTML status card displaying State, Source/Target branches, Merge Conflicts, and discussion resolutions.
  6. Robust Fallback Engine: If Vertex AI API rate limits are exceeded, a rule-based fallback protocol immediately takes over to fetch, patch, and submit the MR using the SRE's local environment configurations.
  7. Python Code Bug Handling: The engine can correct logical bugs in application code (like patching ZeroDivisionError in calculator.py) in addition to configuration YAML manifests.

⚙️ How we built it

  • Core Orchestrator: Python 3.9+ using the official google-genai SDK to configure multi-turn agent logic with registered tool declarations.
  • Frontend Command Center: Streamlit, designed with a custom HSL dark-mode theme, custom HTML cards, and responsive micro-animations.
  • Database: SQLite3 managing active incident tables and repository simulators.
  • GitLab Integration: Live GitLab REST API via Python requests utilizing private authentication tokens.
  • Credentials/State: Secured in .env (ignored via .gitignore) using python-dotenv.

🦊 How Partner GitLab was Utilized

As a core partner integration, GitLab serves as the central hub for code management, pipeline monitoring, and remediation tracking. AutoDevOps Doctor interfaces directly with the live GitLab REST API using personal access tokens to orchestrate the entire DevSecOps lifecycle:

  1. Live Registry Reading (Files API):
    • Endpoint: GET /projects/:id/repository/files/:file_path/raw?ref=:branch
    • Utilized: To read the raw contents of broken deployment manifests (k8s/deployment.yaml) and application source code files (calculator.py) directly from the active branch for real-time analysis.
  2. Atomic Branch & Patch Committing (Commits API):
    • Endpoint: POST /projects/:id/repository/commits
    • Utilized: Rather than running multiple shell commands, the agent makes a single atomic REST call to create a new unique remediation branch and commit the patched manifest or source code changes in one clean action.
  3. Automated Code Review Initiation (Merge Request API):
    • Endpoint: POST /projects/:id/merge_requests
    • Utilized: Once code changes are pushed, the tool automatically creates a new Merge Request targeting the default branch (e.g. main), fully pre-populated with SRE triage notes, severity details, and Gemini-generated reasoning.
  4. Duplicate MR Prevention (Merge Requests List API):
    • Endpoint: GET /projects/:id/merge_requests?state=opened
    • Utilized: Prior to initiating commits, the engine scans the project's open MR registry. If an MR with the same remediation title is already open, it short-circuits the pipeline to avoid polluting the repo with duplicate branches and MRs.
  5. Real-time Status Polling (Merge Request Details API):
    • Endpoint: GET /projects/:id/merge_requests/:iid
    • Utilized: Displays a live status card on the dashboard showing the MR title, source/target branches, conflict states, and blocking discussion statuses retrieved straight from GitLab.

🚧 Challenges we ran into

  • Streamlit Session State Binding: Intercepting responses inside multi-turn AI tool calling execution paths and updating UI state variables in real-time required careful state caching.
  • Console Charmap Encoding Errors: Windows consoles failed to render certain emojis inside git run outputs. We resolved this by sanitizing console print statements while maintaining rich aesthetics in the Streamlit web browser.
  • Preventing Duplicate MR Pollution: In initial runs, repeated clicks created numerous branches and MRs on GitLab. We solved this by adding an API-driven pre-check that queries opened merge request list titles.

🏆 Accomplishments that we're proud of

  • Generalist Code Fixes: Building an agent that doesn't just replace config ports but successfully parses, understands, and patches a Python division-by-zero bug (calculator.py) to make unit tests pass.
  • Failsafe Fallback: Providing SRE teams with absolute uptime; the dashboard automatically falls back to rule-based parsing if Gemini API limits are hit, keeping the GitLab git pipeline functioning.
  • Premium User Experience: Constructing HSL-harmonized metrics widgets, git diff blocks, and a clean MR status tracker side-by-side.

📖 What we learned

  • How to design robust agent schemas where function call payloads are injected directly back into the LLM history.
  • The power of using low-level API queries (like GitLab's MR details) to build a unified SRE control ledger.

🔮 What's next for AutoDevOps Doctor

  • ChatOps Integrations: Pushing automated diffs to Slack/Discord with direct click-to-merge buttons.
  • Autonomous Rollbacks: Hooking up the engine to post-merge canary alerts. If a patched service causes new failures, the agent will autonomously open a revert MR.
  • Multi-Repo Management: Supporting larger microservice architectures across separate code registries.

Built With

Share this project:

Updates