AutoTriage & AutoFix Agent

Inspiration

Modern software development moves at lightning speed, yet debugging and triaging issues still consume a significant portion of engineering time.

We aimed to create a system that automatically detects, analyzes, and fixes runtime issues before developers even notice them like a cloud-native AI DevOps assistant.

The inspiration came from:

AWS’s advancements in Bedrock and CloudWatch anomaly detection
Frustration with late-night debugging sessions
The realization that AI agents could autonomously analyze logs and suggest fixes

We asked ourselves:

“What if incidents could fix themselves?”

From that question, the AutoTriage & AutoFix Agent was born.

What it does

AutoTriage & AutoFix Agent is a cloud-based DevOps agent that continuously:

Monitors AWS logs, metrics, and API responses using CloudWatch and OpenTelemetry
Triages issues using an LLM via Amazon Bedrock
Generates root cause analysis (RCA) reports for developers
Proposes auto-fixes by suggesting code changes or pull requests via GitHub integration
Optionally verifies deployments in sandbox environments like AWS Lambda

Additionally, the project includes a frontend dashboard built with HTML, CSS, and JavaScript, providing a user-friendly interface to interact with the system.

How we built it

We built the agent using an AWS-native backend-focused stack for scalability and integration:

Backend:
- AWS Lambda functions triggered by GitHub webhooks or events
- Amazon Bedrock for LLM reasoning and suggestion generation
- AWS CloudWatch + OpenTelemetry for monitoring logs and metrics
- GitHub Actions for applying fixes or suggesting pull requests
Frontend:
- User interface built with HTML, CSS, and JavaScript, providing a dashboard to monitor and manage the agent
Data handling:
- Logs and context are temporarily processed in Lambda; no long-term storage is included yet
Automation:
- The agent detects issues → queries Bedrock → generates suggestions → optionally posts GitHub comments

Mathematical abstraction:

[ f_{\text{autoFix}}(x) = \text{LLM}_{\text{Bedrock}}(\text{logs}(x) + \text{repo context}) ]

Where ( f_{\text{autoFix}} ) is the function mapping observed logs and repository context to actionable code fixes.

Challenges we ran into

Prompt accuracy: LLMs could hallucinate fixes; mitigated by grounding prompts in actual logs and context.
Security: Giving an AI model access to repositories required careful IAM role isolation and GitHub token management.
Trigger reliability: Ensuring GitHub webhooks consistently reach Lambda functions required retries and logging.
Cost management: Running LLM queries through Bedrock needed careful usage to avoid unnecessary costs.

Built With

Updates

Rayyan Khan started this project — Oct 22, 2025 07:13 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.