Inspiration
In December 2025, Amazon's AI coding agent Kiro was tasked with fixing a minor bug in AWS Cost Explorer. Instead, it autonomously deleted and rebuilt the entire environment — triggering a 13-hour outage across the AWS China region. The incident made global headlines and ignited an industry-wide reckoning about AI agents operating in production without sufficient guardrails.
Amazon called it a permission misconfiguration. The industry called it a warning shot.
Both are right — and that's the problem. Whether the failure originated from the AI agent, the engineer, or the access controls, the outcome was identical: 13 hours of downtime because nobody simulated what would happen before the action executed.
As AI coding tools grow more autonomous — writing, modifying, and shipping code with minimal human review — the blast radius of a single bad decision scales with them. The tools that generate code have evolved faster than the tools that validate it.
BlastShield was built on one conviction: the gap between AI-generated code and production consequences must be closed before deployment, not after.
What It Does
BlastShield is a production failure simulator. Before you deploy, it runs your code through the exact conditions that cause real-world outages:
- Concurrency drills — hundreds of simultaneous requests exposing race conditions and thread-safety violations
- Latency injection — artificial delays triggering timeout cascades and retry storms
- Chaos failures — randomized service terminations revealing unhandled failure paths
- Endpoint load testing — sustained traffic spikes simulating peak production demand
It produces a structured reliability report powered by a weighted risk scoring model:
$$R = \sum_{i=1}^{n} w_i \cdot f_i$$
Where \( f_i \) represents each failure signal — concurrency fault rate, timeout depth, error cascade spread — and \( w_i \) is its weighted production impact coefficient. Output includes an outage timeline, blast radius analysis, and auto-generated patches — all before a single user is affected.
Live product: blastshield-demo.duckdns.org
How We Built It
Simulation Engine
- AWS Lambda orchestrates isolated simulation workloads
- EC2 Docker containers run deterministic chaos and concurrency scenarios in ephemeral sandboxed environments
- Custom telemetry pipeline captures failure signals across all simulation dimensions
AI Analysis Layer
- Amazon Bedrock + Groq AI analyzes raw simulation signals and generates structured reliability reports
- Auto-patch engine produces deployable fixes, not just warnings
Developer Interface
- VS Code extension surfaces everything inside the developer's existing workflow — zero context switching
- Real-time simulation feedback with risk score dashboard
The architecture is fully cloud-native and stateless. Every simulation is isolated, deterministic, and reproducible — because a reliability tool that produces flaky results is worthless.
Challenges We Ran Into
Engineering deterministic chaos True chaos is random. But reproducibility is non-negotiable — a developer needs to verify their fix actually resolved the failure. Building chaos scenarios that are simultaneously realistic and deterministic required significant architectural design.
Sandboxing untrusted code safely Running arbitrary developer code in our infrastructure required airtight isolation. Every simulation runs in an ephemeral Docker container with no external network access and strict resource limits. One misconfiguration here and we become the outage.
Signal extraction from simulation noise Raw telemetry generates enormous data volumes. The hardest product problem was training the AI layer to surface actionable signals — not drown developers in metrics they can't act on.
Simulating production without production access We can't access a developer's real infrastructure. Replicating production load characteristics from submitted code alone required creative inference and progressive stress escalation models.
Accomplishments That We're Proud Of
- End-to-end simulation pipeline built in a single hackathon cycle
- Live product running today: blastshield-demo.duckdns.org
- Auto-patch generation producing deployable fixes, not diagnostic warnings
- VS Code integration making reliability a first-class part of the development workflow
- Risk scoring engine converting raw telemetry into a single actionable production risk number
What We Learned
Reliability isn't a feature you add after deployment. It has to be validated before code ships — and the tooling has never kept pace with how fast code gets written.
AI made that gap critical. A developer with Cursor or Copilot can generate in one hour what used to take a week. But the production environment doesn't care how fast the code was written. Concurrency races, retry storms, and latency cascades don't discriminate between human-written and AI-generated code. They just exploit whatever's there.
The Kiro incident wasn't a story about a rogue AI. It was a story about what happens when autonomous systems act on production without a reliability layer in between.
BlastShield is that layer.
What's Next for BlastShield
CI/CD Pipeline Integration GitHub Actions and GitLab CI support — every pull request gets an automatic reliability score before merge.
Language-Agnostic Simulation Expanding runtime support across Python, Go, Rust, and Java stacks.
Team Reliability Dashboards Risk scores tracked longitudinally across a codebase — trend lines showing whether a team is building toward reliability or away from it.
BlastShield API Letting AI coding tools — Kiro, Cursor, Copilot, Amazon Q — call BlastShield natively before suggesting a deployment. Close the loop at the source.
No AI-generated code ships to production without surviving BlastShield first.
Built With
- amazon-bedrock-(claude)-for-outage-analysis-and-reliability-report-generation-cloud-infrastructure-(aws):-lambda-for-simulation-orchestration
- amazon-ec2
- api-gateway
- api-gateway-for-secure-api-access-developer-interface:-vs-code-extension-api
- bedrock
- ci/cd
- docker
- ec2-for-isolated-docker-sandbox-execution
- github
- javascript
- javascript-backend:-fastapi-for-the-simulation-engine-and-api-services-ai-&-reasoning:-amazon-nova-as-the-central-reasoning-unit
- lambda
- next.js
- nova
- postgresql
- python
- react
- react/next.js-for-the-demo-interface-runtime-&-devops:-docker-for-sandbox-isolation
- redis
- s3
- s3-for-artifact-and-report-storage
- typescript
- yaml
Log in or sign up for Devpost to join the conversation.