Inspiration

The IAM Security Debt Cycle I've Lived Through

I've watched this happen at every company I've worked at. A developer needs to deploy something urgently. They create an IAM policy with "Action": "*" just to get it working. "We'll tighten this up later," they promise. But later never comes.

Months pass. That "temporary" admin access is now running in production. Nobody remembers what permissions are actually needed. The security debt accumulates silently across hundreds of roles.

Then the wake-up call hits: a security audit reveals massive over-privileged access. Compliance teams flag IAM violations during SOX or PCI audits. AWS sends notifications about unused permissions. But by then, teams are scrambling to understand what permissions are actually needed, and applications break when you try to restrict them.

I built Aegis IAM to break this cycle. I wanted a tool that helps developers create secure policies from day one, validates existing policies against real security standards, and analyzes actual AWS usage to identify what permissions are truly needed vs. what was granted.

What it does

Aegis IAM is an autonomous AI agent powered by Amazon Bedrock's Claude 3.5 Sonnet v2 with three core capabilities:

🤖 Generate Policy (Prevent Security Debt)

  • Describe what you need in plain English: "Lambda function that reads from DynamoDB table 'users' and writes to S3 bucket 'uploads'"
  • The agent asks clarifying questions to understand your exact requirements
  • Generates both permissions policy AND trust policy with least-privilege principles
  • Validates AWS values in real-time (regions, account IDs, ARNs)
  • Explains every permission and security decision in plain English
  • Provides refinement suggestions to further tighten security

🛡️ Validate Policy (Identify Existing Debt)

  • Upload any IAM policy JSON for comprehensive security analysis
  • Runs 50+ security checks including:
    • Wildcard action detection ("Action": "*")
    • Wildcard resource detection ("Resource": "*")
    • Missing condition statements (IP restrictions, MFA, VPC)
    • Privilege escalation risks
    • Service-specific security issues
  • Validates against 5 compliance frameworks: PCI DSS, HIPAA, SOX, GDPR, CIS Benchmark
  • Calculates risk score (0-100) with severity breakdown
  • Provides specific, actionable remediation steps for each finding

🔍 Audit Account (Discover Unused Permissions)

  • Autonomously connects to your AWS account via three MCP servers:
    • AWS IAM MCP: Discovers all IAM roles and their policies
    • CloudTrail MCP: Analyzes 90 days of API usage logs
    • AWS API MCP: Checks for SCPs and permission boundaries
  • Identifies permissions that were granted but never used
  • Calculates account-wide risk score
  • Generates right-sized policy recommendations based on actual usage
  • Shows which roles have critical/high/medium/low security issues

How I built it

Multi-Agent Architecture

  • Three specialized AI agents (Policy Generator, Validator, Auditor)
  • FastAPI backend orchestrates agent workflows
  • React 18 + TypeScript frontend with premium glassmorphism UI

Amazon Bedrock Integration

  • Claude 3.5 Sonnet v2 (anthropic.claude-3-5-sonnet-20241022-v2:0) as the reasoning engine
  • Custom prompt engineering with 200+ lines of behavioral rules
  • Conversational context management across multiple turns
  • Dynamic intent detection (when to ask questions vs. generate policies)

Multi-MCP Integration (The Hard Part)

  • Built custom JSON-RPC client (mcp_client.py) for stdio communication
  • Coordinates three MCP servers simultaneously:
    • awslabs-iam-mcp-server - Role discovery and policy retrieval
    • awslabs-cloudtrail-mcp-server - Event log analysis
    • @aws-mcp/server-aws-api - Account-level policy analysis
  • Handles async communication, timeout management, error recovery
  • Gracefully degrades when MCP servers unavailable (uses sample data)

Security Validation Engine

  • security_validator.py implements 50+ checks
  • Pattern matching for wildcards, privilege escalation, missing conditions
  • Compliance framework mapping (PCI DSS, HIPAA, SOX, GDPR, CIS)
  • Risk scoring algorithm with severity-based penalties

Tech Stack

  • Backend: Python 3.13, FastAPI, Boto3, Pydantic, Uvicorn
  • Frontend: React 18, TypeScript, Vite, TailwindCSS, Lucide React
  • AI: Amazon Bedrock (Claude 3.5 Sonnet v2)
  • Integration: Model Context Protocol (MCP), JSON-RPC, AWS SDK

Challenges we ran into

1. Teaching Context Awareness The agent kept returning JSON when users asked "why" questions. I had to implement sophisticated intent detection - understanding when users want explanations vs. policies vs. refinements. This required extensive prompt engineering and conversation history tracking.

2. Dynamic AWS Value Validation Users input invalid regions like "us-five-1" or 4-digit account IDs. Instead of hardcoding valid values, I taught the agent to understand AWS naming patterns (e.g., regions follow [geographic-area]-[direction]-[number]) and validate dynamically.

3. MCP Protocol Communication MCP servers use JSON-RPC over stdio, which required building a proper protocol client with request/response matching and timeout handling. Windows doesn't support select() on pipes, so I had to implement platform-specific I/O handling.

4. CloudTrail Event Correlation Matching CloudTrail event names to IAM policy actions isn't straightforward. PutObject in CloudTrail maps to s3:PutObject in IAM, but some services have complex mappings. I had to handle these transformations correctly.

5. Balancing Security vs. Usability The agent needs to generate policies that are secure but not so restrictive they break applications. This required understanding real-world AWS usage patterns, not just theoretical best practices.

Accomplishments that I'm proud of

Multi-MCP Orchestration: Successfully coordinated three different MCP servers to work together autonomously. The Audit Agent makes intelligent decisions about which tools to call and how to combine results.

Conversational Policy Generation: The agent doesn't just generate policies - it has real conversations, asks clarifying questions, validates inputs, and explains security decisions in plain English.

Production-Ready Validation: 50+ security checks across 5 compliance frameworks with specific remediation steps. This is something security teams could actually use today.

Unused Permission Detection: CloudTrail integration that analyzes 90 days of API usage to identify granted-but-unused permissions - directly reducing attack surface and costs.

Premium UI/UX: Glassmorphism design with animated gradients, real-time security scoring, syntax-highlighted JSON, and smooth transitions that make security analysis actually enjoyable.

What I learned

AI Agents Require Behavioral Design: The hardest part wasn't the Bedrock integration - it was teaching the agent to be helpful without being annoying, to ask questions without interrogating users, and to explain security without sounding like a compliance manual.

MCP is Powerful but Young: The Model Context Protocol is brilliant for giving agents access to external tools, but the ecosystem is still maturing. Documentation is sparse, and debugging stdio communication required deep protocol understanding.

Security is About Trade-offs: Every IAM policy balances security and functionality. Teaching the agent to make good trade-offs required understanding real-world AWS usage patterns and developer workflows.

Prompt Engineering is Critical: I went through 50+ iterations of the system prompt. Small wording changes ("You should" vs. "You MUST") completely changed agent behavior. Getting this right was essential.

What's next for Aegis IAM

For the AWS Community:

  • Open-source the core engine: Make the policy generation and validation logic available for community contributions
  • Custom compliance frameworks: Let organizations define their own security rules beyond standard frameworks
  • CI/CD integration: GitHub Actions and GitLab CI plugins to validate policies before deployment
  • Policy templates library: Community-contributed templates for common AWS architectures

Technical Enhancements:

  • Auto-remediation workflows: One-click fixes for security findings with rollback capabilities
  • Policy drift detection: Continuous monitoring that alerts when policies deviate from secure baseline
  • Multi-account support: Analyze entire AWS Organizations with consolidated reporting
  • Historical trend analysis: Track how IAM security posture changes over time

Why It Matters: IAM misconfigurations are the #1 cause of cloud security breaches. By making it easier to create secure policies from day one and identify unused permissions in existing infrastructure, Aegis IAM helps the AWS community break the security debt cycle before it starts.

Built With

  • amazon-bedrock-(claude-3.5-sonnet-v2)
  • async/await
  • aws-api-mcp-server
  • aws-cloudtrail-mcp-server
  • aws-iam-mcp-server
  • aws-sdk
  • boto3
  • fastapi
  • json-rpc
  • lucide-react
  • model-context-protocol-(mcp)
  • pydantic
  • python-3.13
  • react-18
  • restapi
  • strands
  • tailwindcss
  • typescript
  • uvicorn
  • vite
Share this project:

Updates