About the Project

In large production systems, the issues that cause the most disruption often begin as small, recurring signals buried inside noisy logs. After 25 years leading IT operations, I’ve seen how these quiet problems build into full incidents when teams are overwhelmed.

IncidentOps is a lightweight, AI-assisted incident pipeline designed to surface those signals early. It detects anomalies, summarizes them, assigns deterministic severity, generates remediation suggestions, produces factual audit logs, evaluates governance and escalation, and identifies recurring patterns across runs. Everything is persisted in SQLite and visualized through a clear Streamlit UI.


What Inspired Me

  • Years of dealing with alert fatigue, repeated issues, and noisy production environments.
  • The desire to turn scattered signals into one coherent incident story.
  • Recognizing that subtle issues often remain unnoticed in the corners of production systems.
  • Using AI in a practical, responsible way to reduce operational stress.

What I Learned

  • Kiro’s spec-driven workflow drastically accelerated development; tasks that normally take weeks were completed in hours.
  • AI still requires human review—correctness, safety, and guardrails matter.
  • Deterministic triage, strong auditability, and DB-backed insights are essential for trustworthy incident systems.
  • Clear specifications make complex multi-agent pipelines easier to evolve and maintain.

How I Built It

I used Kiro to define executable specs for the pipeline, UI, DB layer, and MCP tools, then implemented each component step-by-step:

  • Sequential multi-agent pipeline:
    MonitorAgent → LLMAlertSummaryAgent → TriageAgent → LLMResolutionAgent → OpsLogAgent → LLMGovernanceAgent → LLMGovernanceInsightsAgent → NotificationAgent

  • SQLite persistence layer with structured write APIs and aggregation functions for trends, distributions, and recurring patterns.

  • Streamlit UI with Pipeline Runner, Dashboards, Governance, Deep Insights, Notifications, and Audit Logs pages.

  • Local MCP server for controllable gmail/pushover test notifications.

  • Spec-driven development using Kiro tasks to scaffold components and iterate quickly with human validation.


Challenges

  • AI-generated code required supervision to avoid logical errors or loops.
  • Some tooling (especially diagrams) needed fallback approaches for clarity.
  • Maintaining consistency across agents, DB writes, and UI components required careful iteration.

Closing

IncidentOps demonstrates how AI and deterministic logic can work together to bring clarity to the subtle issues that often hide in production systems. It reflects real operational experience, leverages Kiro’s spec-driven development to move quickly, and delivers a functioning MVP that surfaces anomalies, highlights trends, and provides actionable insights for engineering teams.

Built With

Share this project:

Updates

posted an update

  1. The README has been refreshed with complete MCP server setup details and deployment notes for clarity.
  2. The full verification report has also been added under /docs/verification_report.md to document test coverage and pipeline validation.

Log in or sign up for Devpost to join the conversation.