What we built

We built a URL shortener that was meant to behave like a production service, not just survive a demo. The final deployment runs on a single DigitalOcean droplet with seven containers: two Flask app replicas behind nginx, PostgreSQL for persistence, Redis on the redirect path, Prometheus scraping both app instances, Grafana for dashboards, and a lightweight Discord monitor for incident signals. The service is live at http://159.89.93.54.

What made this project interesting for us was that the challenge was never just "can we shorten a URL?" It was "can we make it observable, load-test it, break it on purpose, and still explain exactly how it recovers?" That was the part we leaned into.

Why this hackathon clicked for us

Most hackathons end when the demo works on your laptop. This one started there. The rubric pushed us toward production questions: deploy it, instrument it, break it deliberately, and prove it recovers cleanly. That framing made this feel like a real engineering problem rather than a weekend sprint to get something on a screen.

We came in wanting to understand production engineering by actually doing it. We had read the blog posts. This format forced us to build it instead, and that gap between reading and doing turned out to be larger than expected.

The seed data decision

One of the smartest things we did was inspect the seed CSVs before finalizing the schema. That data changed multiple implementation decisions. The seed files contained duplicate usernames, so username could not be unique. They contained updated event types that did not appear clearly in the written prompt, so our event schema had to support them. And they contained inactive URLs, which turned out to matter for evaluator edge cases.

Designing only from the prompt would have left us with something that looked correct and still failed on real data. Reading the actual files first was probably the single highest-leverage hour of the weekend.

The hardest bug

The bug that took longest to reason about was the health path. We wanted /health to stay useful even when PostgreSQL was completely unavailable, but Flask's global before_request hook was trying to connect to the database before the route handler even ran. A dependency failure could make the health endpoint fail for the wrong reason entirely.

Once we saw it clearly, the fix was simple: exempt health and metrics endpoints from the global DB connect path and let them manage their own bounded checks. Getting to that understanding took longer than expected because the symptom looked like a broken route when the real problem was one layer earlier in the request lifecycle.

The numbers

Reliability: 29 out of 29 automated evaluator tests passed, including the hidden cases. Our local test suite runs 124 tests at 100% line coverage on every push, with GitHub Actions enforcing a 70% minimum gate. The coverage gate is there so any future regression is caught before it ships.

Scalability: All load tests ran on the live droplet through nginx with Redis caching active.

VUs RPS p95 Error rate
50 69 1100ms 0.00%
200 79 3100ms 0.00%
400 68 9800ms 0.00%
500* 130 6300ms 2.52%

* 500 VU run from an external machine against the live droplet IP, all others from the droplet itself.

The first real bottleneck was not redirect traffic but the mixed evaluator-style flow: list endpoints returning full JSON payloads alongside concurrent writes. We documented that bottleneck, the tuning pass that improved it, and the next scaling step in docs/load-testing.md.

Incident response: 31 seconds from killing an app container to the Discord alert firing in the operator channel. The target was five minutes. We ran the kill-and-recovery sequence multiple times to confirm it was consistent. We also tested database loss under active load: /health stayed at 200, all DB-backed routes returned structured 503 with no tracebacks, and the service recovered automatically once the container restarted.

Documentation: The repo includes architecture notes, API reference, environment variable documentation, deployment and rollback steps, failure mode notes, a runbook, a load-testing report, and a judging pack that maps evidence directly to each scoring tier. We treated it like something we would hand to another engineer, not just a judge.

What we would do differently

Set up the droplet earlier. We stayed local longer than we should have, which delayed a few deployment-specific discoveries that were easy to fix but stressful to find late. We would also rehearse the chaos demo sooner and more often. The recovery behavior was solid in the end, but practicing it earlier would have made the final stretch calmer.

What we took away from this

The most valuable part of the weekend was realizing how different a service feels once you design around failure instead of only around success. A URL shortener is a simple application on paper. Once you start asking what happens when the cache disappears, when the database is down, when an event write fails, or when the app restarts under traffic, it becomes a much more interesting engineering problem.

That is what we built toward all weekend: not just a feature-complete service, but a system we could observe, test, break, recover, and explain with evidence.

Built With

Share this project:

Updates