Gemini SRE Commander

Inspiration

Every engineer has been there: it's 3 AM, alerts are firing, and you're drowning in logs, dashboards, and Slack threads trying to figure out what's broken. We built Gemini SRE Commander because incident response shouldn't feel like detective work with missing clues. We wanted to give SREs an AI-powered teammate that can read logs, analyze screenshots, and tell you exactly what's wrong—and how to fix it.

What it does

Gemini SRE Commander transforms chaotic system data into actionable incident intelligence:

Upload logs & screenshots – Drop in text logs, JSON exports, Grafana dashboards, even architecture diagrams
AI-powered analysis – Gemini 3 Flash processes everything together using multimodal reasoning
Get instant answers – Root cause, severity rating, evidence timeline, and step-by-step mitigation plan
Take action – Copy-paste runbook commands tailored to your specific incident
Track progress – Interactive checklist to mark mitigation steps as complete
Export & share – Download a markdown post-mortem for your incident review

Bonus features: Real-time log streaming via WebSocket for live incident analysis, and 4 pre-built demo scenarios to instantly show the system in action.

How we built it

| Component | Tech | Why We Chose It |

Key architectural decisions:

Structured output schema – Enforced JSON guarantees consistent, parseable responses
Context-aware runbooks – Keyword matching serves relevant commands based on incident type
Smart truncation – Large logs auto-truncate with markers to stay within token limits
Connection-scoped state – Each WebSocket session has isolated buffers to prevent data leaks

Challenges we ran into

1. Context window limits Large log files exceeded AI token limits. We implemented smart truncation that preserves the most recent logs (usually the most relevant) with clear [TRUNCATED] markers.

2. Structured output reliability LLMs occasionally return malformed JSON or extra prose. We combined Gemini's native schema enforcement with a graceful fallback parser that returns valid error objects instead of crashing the UI.

3. WebSocket state management Managing buffered logs and analysis state across concurrent connections was tricky. We solved it with unique connection IDs and isolated state per session.

4. Runbook relevance Generic commands aren't helpful during specific incidents. We built a keyword-matching system that serves context-aware commands (e.g., database commands for DB incidents, cache commands for Redis issues).

Accomplishments that we're proud of

End-to-end workflow – From log upload to exportable post-mortem in one seamless flow

10-second demo – Judges click "Load Demo" and see complete incident analysis instantly

True multimodal reasoning – The AI actually "looks" at architecture diagrams and metrics screenshots, not just text logs

Smart runbooks – Copy-paste commands with confirmation warnings for destructive operations

Real-time streaming – Live log analysis that triggers automatically when patterns emerge

What we learned

Technical insights:

Gemini's structured output is a game-changer—enforcing JSON schemas at the API level eliminates an entire class of parsing bugs
Bun's native WebSocket support makes real-time features trivial to implement
Multimodal AI unlocks use cases impossible with text-only models

Product insights:

Progress tracking transforms static reports into collaborative tools
Export functionality (markdown post-mortems) makes tools immediately production-ready

What's next for Gemini SRE Commander

Team collaboration features (comments, assignments, shared timelines)
Custom runbook editor for team-specific commands
Alert correlation to group related alerts into single incidents

Long-term vision:

"The Self-Healing Platform" – A system that not only diagnoses incidents but executes safe remediation steps automatically, with human approval for critical actions.

Built With

Updates

Devin Febrian started this project — Feb 09, 2026 03:14 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.