Inspiration
The inspiration came from a real incident — an unexpected outage due to a failed Availability Zone. That single failure sparked a big question in my mind: “What if we could simulate these events before they bring systems down?” That’s when I came across AWS Fault Injection Simulator (FIS). It promised a safe, controlled way to inject chaos into systems to boost resilience — and I had to try it out.
What it does
The project demonstrates how to design and run fault injection experiments using AWS FIS. It simulates failures like:
EC2 instance termination
Network latency and packet loss
CPU stress
These simulations are monitored in real time using CloudWatch, with automated recovery via Auto Scaling Groups and Elastic Load Balancers to ensure high availability.
How we built it
Environment Setup: Launched EC2 instances with proper tagging and set up CloudWatch for monitoring.
Experiment Creation: Created and ran fault injection experiments using AWS Console, AWS CLI, and CloudFormation.
Observability: Configured CloudWatch Alarms and SNS for alerts.
Automation: Used IAM roles, SSM documents, and rollback actions for safe, repeatable chaos experiments.
Challenges we ran into
IAM Permissions: Setting up least-privilege execution roles was tricky.
Tagging Consistency: Required a well-thought-out tagging strategy to target resources effectively.
Fear of Failure: Initially hesitant to run destructive tests, even in test environments.
Cost Awareness: Needed to monitor AWS usage to keep experiment costs under control.
Accomplishments that we're proud of
Successfully ran complex FIS experiments without causing uncontrolled outages.
Automated fault injection using Infrastructure as Code (CloudFormation).
Built a monitoring and recovery setup using native AWS tools.
Gained confidence in testing real failure scenarios in a controlled, repeatable way.
What we learned
Chaos Engineering is about trust, not destruction — it's a way to build confidence in systems.
AWS FIS is tightly integrated with other AWS services and requires precise IAM setups.
Observability and rollback plans are non-negotiable when injecting faults.
Proactive testing helps uncover weak spots you didn’t even know existed.
What's next for Chaos Engineering in AWS with Fault Injection Simulator
Integrate with CI/CD pipelines to run FIS experiments post-deployment.
Explore multi-region chaos testing for high-availability apps.
Use custom SSM documents for more flexible and app-specific fault scenarios.
Build a dashboard to visualize resilience metrics over time.
Log in or sign up for Devpost to join the conversation.