ShieldURL — Chaos-Resilient URL Shortener

/health
/users
/events
/urls
All the test cases passes with 83% coverage

Inspiration

As developers, we have all shipped code that "works on my machine" only to watch it crash in production at 2 AM. This hackathon challenged us to think differently: don't just build features, build resilience. We wanted to prove that even a simple URL shortener can be made production-grade with the right engineering practices.

The idea of chaos engineering — breaking things on purpose to make them stronger — really excited us. Instead of waiting for failures to happen, we decided to cause them ourselves and build systems that heal automatically.

What it does

ShieldURL is a URL shortener API that refuses to die easily:

Shorten URLs via POST /shorten and redirect via GET /


Health monitoring at /health for load balancer integration
User management at /users with seeded CSV data from MLH
Graceful error handling — every error returns clean JSON, never a stack trace
Auto-healing containers — kill the app, and Docker brings it back in seconds
Automated test gating — broken code can never reach production
Idempotent database seeding — seed script runs safely on every container boot



How we built it

We started with the MLH Flask + Peewee + PostgreSQL template and built up from there:


Core Service — Built the URL shortener with input validation, short code generation with collision retry logic, and full CRUD endpoints. Added User and Event models matching the MLH seed CSV schema.
Bronze Tier — Added pytest unit tests and a GitHub Actions CI pipeline that runs on every push.
Silver Tier — Wrote integration tests using Flask test client with in-memory SQLite, configured pytest-cov, and set CI to block merges if tests fail or coverage drops below 70%.
Gold Tier — Implemented global JSON error handlers, containerized with Docker + docker-compose using restart: always, wrote an idempotent CSV seed script, and documented six failure modes with recovery strategies.


Tech stack: Python, Flask, Peewee ORM, PostgreSQL, SQLite, Docker, GitHub Actions, pytest, pytest-cov, Gunicorn

Challenges we ran into


Test isolation with Peewee DatabaseProxy — Getting the in-memory SQLite to work with Peewee proxy pattern was tricky. The in-memory database disappeared every time Flask teardown hook closed the connection. We solved this by separating the app factory into production and testing modes.
Balancing coverage with meaningful tests — It is easy to write tests that hit lines but don't actually verify behavior. We focused on testing real scenarios: invalid URLs, missing fields, short code collisions, and edge cases like FTP URLs and empty strings.
Idempotent seeding — The seed script needed to load 3 CSV files (users, urls, events) respecting foreign key order, without duplicating data if run multiple times. We implemented row-count checks before each table insert.
Making CI fast — The GitHub Actions workflow needed a PostgreSQL service container, uv installation, and test execution. We optimized the pipeline to run in under 2 minutes.


Accomplishments that we are proud of


83% test coverage with 53 meaningful tests (not just line-hitters)
Zero stack traces exposed to clients — every error is clean JSON
Container auto-healing — docker kill followed by auto-restart in seconds
Idempotent seed script that safely loads MLH CSV data on every container boot
Green CI on first push — our pipeline caught nothing because we built quality in from the start
Comprehensive failure documentation — 6 failure modes with symptoms, responses, and recovery times


What we learned


Writing tests is not just about coverage numbers — it forces you to think about edge cases you would otherwise miss
The restart: always Docker policy is surprisingly powerful for basic resilience
Production engineering is a mindset, not a checklist. It changes how you write every line of code.
CI that blocks deploys is the single most impactful reliability practice — it is the first line of defense
Idempotent scripts are an SRE best practice that makes deployments fearless


What is next for ShieldURL


Add rate limiting to prevent abuse
Implement URL analytics (click tracking, geographic data)
Add database connection pooling with PgBouncer
Set up Prometheus metrics and Grafana dashboards for observability
Implement distributed tracing for debugging production issues

Built With

docker
flask
github-actions
gunicorn
peewe
postgresql
pytest
python

Submitted to

Production Engineering Hackathon

Created by

I led the back-end development and Site Reliability Engineering (SRE) to hit Gold Tier. I built the REST APIs in Flask, implemented global error handling, wrote automated tests to achieve 85% coverage, and containerized the app with Docker for auto-healing and idempotent database seeding.

Rushikesh Bobade
I handled the database architecture and data pipelines. I built the peewee ORM models and engineered the idempotent seed.py script to safely parse and bulk-load thousands of rows from CSV files, ensuring zero data duplication on container restarts.

Afroz Khan
I owned the containerization and chaos engineering requirements. I authored the Dockerfile and docker-compose setups, configured auto-healing (restart: always) to recover easily from simulated downtime, and managed the environment configurations.

Shravan Navale
I focused on quality assurance and continuous integration. I wrote comprehensive unit and integration tests using pytest to guarantee robust validation logic. I also set up the GitHub Actions pipeline to automatically run tests and strictly block deployments if code coverage dropped below 70%.

Chaitrali Tikar