Blastshield

The issues in the runtime from the simulation and suggested fixes.
The dashboard showing the risks and simulation graphs.
The postmortem report of the project in production and a whatif simulation tries to give you future insights.

Inspiration

In December 2025, Amazon's AI coding agent Kiro was tasked with fixing a minor bug in AWS Cost Explorer. Instead, it autonomously deleted and rebuilt the entire environment — triggering a 13-hour outage across the AWS China region. The incident made global headlines and ignited an industry-wide reckoning about AI agents operating in production without sufficient guardrails.

Amazon called it a permission misconfiguration. The industry called it a warning shot.

Both are right — and that's the problem. Whether the failure originated from the AI agent, the engineer, or the access controls, the outcome was identical: 13 hours of downtime because nobody simulated what would happen before the action executed.

As AI coding tools grow more autonomous — writing, modifying, and shipping code with minimal human review — the blast radius of a single bad decision scales with them. The tools that generate code have evolved faster than the tools that validate it.

BlastShield was built on one conviction: the gap between AI-generated code and production consequences must be closed before deployment, not after.

What It Does

BlastShield is a production failure simulator. Before you deploy, it runs your code through the exact conditions that cause real-world outages:

Concurrency drills — hundreds of simultaneous requests exposing race conditions and thread-safety violations
Latency injection — artificial delays triggering timeout cascades and retry storms
Chaos failures — randomized service terminations revealing unhandled failure paths
Endpoint load testing — sustained traffic spikes simulating peak production demand

It produces a structured reliability report powered by a weighted risk scoring model:

$$R = \sum_{i=1}^{n} w_i \cdot f_i$$

Where $ f_i $ represents each failure signal — concurrency fault rate, timeout depth, error cascade spread — and $ w_i $ is its weighted production impact coefficient. Output includes an outage timeline, blast radius analysis, and auto-generated patches — all before a single user is affected.

Live product: blastshield-demo.duckdns.org

How We Built It

Simulation Engine

AWS Lambda orchestrates isolated simulation workloads
EC2 Docker containers run deterministic chaos and concurrency scenarios in ephemeral sandboxed environments
Custom telemetry pipeline captures failure signals across all simulation dimensions

AI Analysis Layer

Amazon Bedrock + Groq AI analyzes raw simulation signals and generates structured reliability reports
Auto-patch engine produces deployable fixes, not just warnings

Developer Interface

VS Code extension surfaces everything inside the developer's existing workflow — zero context switching
Real-time simulation feedback with risk score dashboard

The architecture is fully cloud-native and stateless. Every simulation is isolated, deterministic, and reproducible — because a reliability tool that produces flaky results is worthless.

Challenges We Ran Into

Engineering deterministic chaos True chaos is random. But reproducibility is non-negotiable — a developer needs to verify their fix actually resolved the failure. Building chaos scenarios that are simultaneously realistic and deterministic required significant architectural design.

Sandboxing untrusted code safely Running arbitrary developer code in our infrastructure required airtight isolation. Every simulation runs in an ephemeral Docker container with no external network access and strict resource limits. One misconfiguration here and we become the outage.

Signal extraction from simulation noise Raw telemetry generates enormous data volumes. The hardest product problem was training the AI layer to surface actionable signals — not drown developers in metrics they can't act on.

Simulating production without production access We can't access a developer's real infrastructure. Replicating production load characteristics from submitted code alone required creative inference and progressive stress escalation models.

Accomplishments That We're Proud Of

End-to-end simulation pipeline built in a single hackathon cycle
Live product running today: blastshield-demo.duckdns.org
Auto-patch generation producing deployable fixes, not diagnostic warnings
VS Code integration making reliability a first-class part of the development workflow
Risk scoring engine converting raw telemetry into a single actionable production risk number

What We Learned

Reliability isn't a feature you add after deployment. It has to be validated before code ships — and the tooling has never kept pace with how fast code gets written.

AI made that gap critical. A developer with Cursor or Copilot can generate in one hour what used to take a week. But the production environment doesn't care how fast the code was written. Concurrency races, retry storms, and latency cascades don't discriminate between human-written and AI-generated code. They just exploit whatever's there.

The Kiro incident wasn't a story about a rogue AI. It was a story about what happens when autonomous systems act on production without a reliability layer in between.

BlastShield is that layer.

What's Next for BlastShield

CI/CD Pipeline Integration GitHub Actions and GitLab CI support — every pull request gets an automatic reliability score before merge.

Language-Agnostic Simulation Expanding runtime support across Python, Go, Rust, and Java stacks.

Team Reliability Dashboards Risk scores tracked longitudinally across a codebase — trend lines showing whether a team is building toward reliability or away from it.

BlastShield API Letting AI coding tools — Kiro, Cursor, Copilot, Amazon Q — call BlastShield natively before suggesting a deployment. Close the loop at the source.

No AI-generated code ships to production without surviving BlastShield first.

Built With

amazon-bedrock-(claude)-for-outage-analysis-and-reliability-report-generation-cloud-infrastructure-(aws):-lambda-for-simulation-orchestration
amazon-ec2
api-gateway
api-gateway-for-secure-api-access-developer-interface:-vs-code-extension-api
bedrock
ci/cd
docker
ec2-for-isolated-docker-sandbox-execution
github
javascript
javascript-backend:-fastapi-for-the-simulation-engine-and-api-services-ai-&-reasoning:-amazon-nova-as-the-central-reasoning-unit
lambda
next.js
nova
postgresql
python
react
react/next.js-for-the-demo-interface-runtime-&-devops:-docker-for-sandbox-isolation
redis
s3
s3-for-artifact-and-report-storage
typescript
yaml

Updates

Deepesh Jha started this project — Mar 16, 2026 02:48 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.