Inspiration

Software incidents are still handled with too much context switching. Engineers jump between alerts, dashboards, commits, runbooks, architecture docs, and old postmortems while trying to reconstruct what happened under pressure. I wanted to build something that reflects where software engineering is going next: not just AI that writes code, but AI that helps supervise and improve real engineering workflows.

That led to DevProd, an incident-response control plane designed around bounded, reviewable agents. The goal was to create a system that can investigate incidents, surface evidence, retrieve the right operational knowledge, rank likely root causes, suggest remediation, and draft a postmortem, while still keeping a human in control.

What it does

DevProd is an AI-powered incident investigation workflow for software teams.

Given an incident, it can:

  • classify the issue and investigation path
  • collect and structure evidence from alerts and incident context
  • correlate likely causal changes
  • retrieve relevant runbooks, architecture notes, and prior incidents
  • rank root-cause hypotheses
  • recommend remediation steps
  • draft a postmortem
  • expose a reviewable workflow trace

It also includes a benchmark arena of synthetic engineering incidents with expected outcomes and rubrics, so the workflow can be tested against realistic scenarios instead of being treated like an unmeasured chatbot.

How we built it

We built DevProd as a small full-stack application with a clear separation between the user-facing control plane, the workflow orchestration layer, and the benchmark corpus.

Frontend

  • Next.js dashboard
  • incident inbox
  • investigation view
  • evidence, retrieval, hypotheses, remediation, and postmortem panels

Backend

  • FastAPI service
  • structured API routes for incident intake, investigation runs, retrieval, hypotheses, remediation, and postmortem outputs
  • local orchestration stub that reads benchmark scenarios, knowledge documents, and prompt bundles

Workflow and evaluation

  • specialized prompt roles for:
    • triage
    • evidence
    • retrieval
    • hypothesis
    • remediation
    • postmortem
    • policy review
  • seeded benchmark scenarios in arena/scenarios
  • retrieval corpus in knowledge
  • shared response contracts in packages/contracts

DigitalOcean Gradient AI

The system is designed around DigitalOcean Gradient AI as the intended hosted AI layer for:

  • agent orchestration
  • inference
  • retrieval-backed workflows
  • evaluation and traces

To make the project reviewable under hackathon time constraints, I included:

  • a demo provider for local execution
  • a live provider integration path in the backend for DigitalOcean Gradient AI
  • a DigitalOcean App Platform deployment spec in .do/app.yaml

Challenges we ran into

The biggest challenge was time. I wanted the project to be more than a polished UI, so I spent time building the underlying system shape: contracts, scenarios, knowledge documents, evaluation artifacts, and a workflow that is inspectable rather than magical.

Another challenge was deployment. I prepared the project for DigitalOcean App Platform and set up a multi-service app spec, but I ran out of time before completing a final public deployment. Rather than fake that part, I kept the submission honest and focused on delivering a runnable local prototype with the deployment configuration included.

There was also a product-design challenge: keeping the workflow ambitious without making it look like an unsafe autonomous operator. That is why DevProd is structured around bounded agents, review steps, retrieval, and explicit traces.

Accomplishments that we're proud of

  • Built a full-stack incident-response application instead of only a prompt demo
  • Created a multi-agent workflow with distinct responsibilities
  • Added a benchmark arena with multiple realistic incident scenarios
  • Built a retrieval corpus of runbooks, architecture notes, incidents, and postmortems
  • Exposed investigation outputs through structured backend routes
  • Created a dashboard that surfaces evidence, hypotheses, remediation, and postmortem results
  • Included deployment config for DigitalOcean App Platform and a live-provider path for Gradient AI
  • Kept the system reviewable, bounded, and measurable

What we learned

I learned that the most useful AI systems for engineering are not the ones that try to act omniscient. They are the ones that make context legible, constrain behavior, and give humans better leverage during messy workflows.

I also learned how important benchmarkability is. Once you frame the product as a workflow rather than a chatbot, you naturally need scenarios, expected outcomes, rubrics, and traces. That changes how you design both the product and the codebase.

On the platform side, I learned a lot about shaping an app for DigitalOcean deployment, separating demo-mode behavior from live-provider behavior, and building toward a cloud-native AI architecture even when the final hosted deployment is still in progress.

What's next for DevProd

Next, I want to:

  • complete the public DigitalOcean App Platform deployment
  • connect the live workflow fully to DigitalOcean Gradient AI
  • expand the benchmark arena with more failure modes and distractor patterns
  • add richer trace and evaluation views in the dashboard
  • move run history from local SQLite to a persistent managed store
  • support reviewer feedback loops for workflow policy iteration
  • make DevProd usable as a real incident copilot for small engineering teams

The long-term vision is for DevProd to become a practical control plane for supervising, evaluating, and improving AI-assisted engineering operations.

Built With

Share this project:

Updates