Inspiration

As developers, we have all shipped code that "works on my machine" only to watch it crash in production at 2 AM. This hackathon challenged us to think differently: don't just build features, build resilience. We wanted to prove that even a simple URL shortener can be made production-grade with the right engineering practices.

The idea of chaos engineering — breaking things on purpose to make them stronger — really excited us. Instead of waiting for failures to happen, we decided to cause them ourselves and build systems that heal automatically.

What it does

ShieldURL is a URL shortener API that refuses to die easily:

  • Shorten URLs via POST /shorten and redirect via GET /
  • Health monitoring at /health for load balancer integration
  • User management at /users with seeded CSV data from MLH
  • Graceful error handling — every error returns clean JSON, never a stack trace
  • Auto-healing containers — kill the app, and Docker brings it back in seconds
  • Automated test gating — broken code can never reach production
  • Idempotent database seeding — seed script runs safely on every container boot

How we built it

We started with the MLH Flask + Peewee + PostgreSQL template and built up from there:

  1. Core Service — Built the URL shortener with input validation, short code generation with collision retry logic, and full CRUD endpoints. Added User and Event models matching the MLH seed CSV schema.

  2. Bronze Tier — Added pytest unit tests and a GitHub Actions CI pipeline that runs on every push.

  3. Silver Tier — Wrote integration tests using Flask test client with in-memory SQLite, configured pytest-cov, and set CI to block merges if tests fail or coverage drops below 70%.

  4. Gold Tier — Implemented global JSON error handlers, containerized with Docker + docker-compose using restart: always, wrote an idempotent CSV seed script, and documented six failure modes with recovery strategies.

Tech stack: Python, Flask, Peewee ORM, PostgreSQL, SQLite, Docker, GitHub Actions, pytest, pytest-cov, Gunicorn

Challenges we ran into

  • Test isolation with Peewee DatabaseProxy — Getting the in-memory SQLite to work with Peewee proxy pattern was tricky. The in-memory database disappeared every time Flask teardown hook closed the connection. We solved this by separating the app factory into production and testing modes.

  • Balancing coverage with meaningful tests — It is easy to write tests that hit lines but don't actually verify behavior. We focused on testing real scenarios: invalid URLs, missing fields, short code collisions, and edge cases like FTP URLs and empty strings.

  • Idempotent seeding — The seed script needed to load 3 CSV files (users, urls, events) respecting foreign key order, without duplicating data if run multiple times. We implemented row-count checks before each table insert.

  • Making CI fast — The GitHub Actions workflow needed a PostgreSQL service container, uv installation, and test execution. We optimized the pipeline to run in under 2 minutes.

Accomplishments that we are proud of

  • 83% test coverage with 53 meaningful tests (not just line-hitters)
  • Zero stack traces exposed to clients — every error is clean JSON
  • Container auto-healing — docker kill followed by auto-restart in seconds
  • Idempotent seed script that safely loads MLH CSV data on every container boot
  • Green CI on first push — our pipeline caught nothing because we built quality in from the start
  • Comprehensive failure documentation — 6 failure modes with symptoms, responses, and recovery times

What we learned

  • Writing tests is not just about coverage numbers — it forces you to think about edge cases you would otherwise miss
  • The restart: always Docker policy is surprisingly powerful for basic resilience
  • Production engineering is a mindset, not a checklist. It changes how you write every line of code.
  • CI that blocks deploys is the single most impactful reliability practice — it is the first line of defense
  • Idempotent scripts are an SRE best practice that makes deployments fearless

What is next for ShieldURL

  • Add rate limiting to prevent abuse
  • Implement URL analytics (click tracking, geographic data)
  • Add database connection pooling with PgBouncer
  • Set up Prometheus metrics and Grafana dashboards for observability
  • Implement distributed tracing for debugging production issues

Built With

Share this project:

Updates