Inspiration

We had built an Iceberg maintenance and migration toolkit — a Spark-based tool that could migrate data from Snowflake and Hive, run table optimizations, and manage snapshot lifecycles. It worked beautifully on our local Docker setup.

Then came the moment that changed everything:

"Can we run this in production?"

Our MVP had generated real interest. But production meant AWS EMR clusters, not Docker containers. It meant VPCs, security groups, IAM roles, auto-scaling policies, and a dozen other AWS services we'd need to orchestrate.
A typical timeline? 2–3 months of infrastructure work before we could even think about deploying our actual application.

We had a different idea: What if we just talked to Claude Code about what we needed?

What it does

The result was a complete Iceberg maintenance and migration platform running on AWS EMR, deployed entirely through AI-guided infrastructure generation.

It can:

  • Deploy Spark applications on EMR with auto-scaling and Iceberg integration
  • Run external Hive Metastore and Trino through ECS Fargate with ALB
  • Manage data storage in secure, versioned S3 metadata
  • Automate setup, deployment, and cleanup with real-time progress tracking
  • Reduce costs by 60–80% through termination scripts

How we built it

We built this through conversation with Claude Code — no manual Terraform writing.
Claude architected the complete infrastructure:

  • Core: Multi-AZ VPC, EMR 7.0.0, ECS Hive Metastore, S3 encryption
  • Automation: PowerShell wrappers with phase tracking, resource imports, secure key handling
  • Result: 3,000+ lines across 10+ modules, 20+ AWS services, 15-25 min deploys

Challenges we ran into

  1. Bootstrap failures – Claude split init into phases with retry logic
  2. S3 import conflicts – Added auto-import for existing resources
  3. Hive instability – Fixed with ALB for stable DNS
  4. Slow cleanup – Parallelization cut time by 80%
  5. SSH key permissions – Secure ACL scripting for Windows

Accomplishments that we're proud of

  • Production-grade infra in 1 week (vs 2-3 months manually)
  • Deploy: 15-25 min | Cleanup: 5-10 min
  • 60-80% cost savings via auto-scaling
  • Auto-generated docs with inline rationale
  • Accessible to non-DevOps engineers

What we learnt

AI doesn't just code — it architects, optimizes, and teaches. Claude proactively suggested performance improvements and security best practices. Infrastructure-as-Conversation replaces doc-diving with dialogue-driven development.

What's next

  • Self-healing infrastructure with auto-remediation
  • AI-driven cost optimization
  • Automated compliance reporting

The future of infrastructure isn't just code — it's conversation.

Built With

Share this project:

Updates