Inspiration

We wanted to take a simple concept (a URL shortener) and push it to production-grade quality. The MLH Production Engineering hackathon challenged us to think beyond "does it work?" and ask "does it survive?"

What We Built

A full-stack URL shortener service with:

  • 3 load-balanced app replicas behind Nginx
  • Redis caching with cache HIT/MISS headers for sub-millisecond redirects
  • PostgreSQL with 3 data models (Users, URLs, Events) and full CRUD APIs
  • Prometheus + Grafana monitoring with dashboards tracking traffic, errors, and cache performance
  • Docker Compose orchestrating 7 containers with health checks and auto-restart policies
  • 25 automated tests at 84% code coverage with GitHub Actions CI

What We Learned

  • Chaos engineering is humbling. Killing a container and watching it not come back taught us more about Docker restart policies than any tutorial.
  • Caching changes everything. Adding Redis dropped our redirect latency from ~30ms to ~2ms on cache hits.
  • The /urls endpoint was our bottleneck. Returning all 2000 rows crushed performance at 500 concurrent users — pagination would be the next optimization.
  • Production engineering is about the boring stuff. JSON error handling, structured logging, health checks, and runbooks aren't glamorous, but they're what separate a script from a service.

How We Built It

We followed an incremental approach across 3 phases:

  1. Phase 0-1: Core Flask app with Peewee ORM, seed data loading, pytest suite, GitHub Actions CI
  2. Phase 2: Dockerized everything, added Locust load testing, chaos mode testing, failure documentation
  3. Phase 3: Scaled to 3 replicas with Nginx load balancer, added Redis caching, Prometheus metrics, Grafana dashboards, comprehensive documentation

Challenges

  • macOS port 5000 conflict — AirPlay Receiver was intercepting our traffic
  • PostgreSQL sequence sync — After seeding 2000 rows, the auto-increment tried to start at 1 again
  • Docker restart policydocker kill behaves differently than an internal process crash on Docker Desktop for Mac
  • Balancing coverage with real-world testing — The try/except blocks for DB failures are hard to trigger in tests but critical in production

Built With

Share this project:

Updates