Inspiration

What it does

How we built it

Challenges we ran into

Accomplishments that we're proud of## What it does

A production-grade URL shortener API with full lifecycle management:

  • Create short URLs with POST /shorten (custom or auto-generated codes)
  • Redirect via GET /<code> with a 302, logging every click
  • Update URLs with PUT /urls/<id> (title, destination, active status)
  • Deactivate with DELETE /urls/<id> (soft delete, preserving audit trail)
  • Monitor via GET /health (deep check: DB + Redis) and GET /metrics (system stats)

Every operation is validated, rate-limited, cached, logged, and auditable.

## How we built it

Application Layer:

  • Flask + Peewee ORM + PostgreSQL
  • Redis for caching redirect lookups (300s TTL, invalidated on mutation)
  • Flask-Limiter for rate limiting (200 req/min global, 30 req/min on writes)
  • Structured JSON logging with X-Request-ID for request tracing

Infrastructure:

  • 3 gunicorn instances (4 workers, 2 threads each = 24 concurrent handlers)
  • Nginx load balancer with max_fails / fail_timeout health-aware routing
  • Docker Compose with health checks, resource limits, and restart: always
  • Non-root container user, security headers (X-Content-Type-Options, X-Frame-Options)

Quality & Operations:

  • 59 pytest tests at 73% coverage, running on SQLite in-memory for speed
  • GitHub Actions CI that blocks any push dropping below 70% coverage
  • k6 load testing from 50 → 200 → 500 concurrent users
  • Connection pooling via PooledPostgresqlDatabase (20 max, 300s stale timeout)

Documentation:

  • README.md — setup, API reference, architecture diagram
  • RUNBOOK.md — start/stop/restart, troubleshooting, alert response
  • DECISIONS.md — 9 architectural decision records with rationale
  • FAILURE_MODES.md — 9 failure scenarios, capacity limits, known limitations
  • SLO.md — availability, latency, error rate targets with actuals
  • BOTTLENECK_REPORT.md — before/after performance analysis

## Challenges we faced

The 27% Problem. Our first load test at 500 concurrent users had a 27% error rate. Flask's built-in dev server is single-threaded — it simply can't handle concurrent connections. The fix wasn't obvious at first (we thought it was a database issue), but profiling showed the bottleneck was the WSGI layer itself. Switching to gunicorn with horizontal scaling dropped the error rate to 0%.

What we learned

What's next for Production-Ready URL Shortener

Built With

Share this project:

Updates