What Inspired Me
The MLH PE Hackathon 2026 was built around a compelling challenge: take a URL shortener template and push it to production-grade quality by completing a series of Production Engineering quests — covering reliability, scalability, incident response, and documentation. With five non-cash prizes on the line, I set my sights on the Scalability Quest and its top prize: a Raspberry Pi 5 Starter Kit. The goal wasn't just to build something that works — it was to build something that scales.
How I Built It
Step 1 — Building the Application
I started by building out the actual URL shortener, using the automated test suite as a guide for what needed to exist. This meant implementing full CRUD endpoints for:
- User management
- URL operations
- Event tracking
Once the core application was functional and tests were passing, I shifted focus entirely to the production engineering quests.
Step 2 — Bronze Tier (Baseline Load Test)
The first tier was straightforward: set up k6 and run a load test with 50 concurrent users, then document the results. There was no hard performance threshold to hit — just establish a baseline. I ran the test, captured the results, and moved on.
Step 3 — Silver Tier (Scale-Out)
This tier raised the stakes:
- Ramp up to 200 concurrent users
- Run 2 instances of the app using Docker Compose
- Put NGINX in front as a load balancer
- Keep p95 response time under 3 seconds
I set up the multi-container architecture and got everything running — but the 3-second target was elusive. Response times were hovering around 27 seconds locally, and I started questioning whether the target was even realistic for 200 concurrent users.
Then a hypothesis formed: what if the bottleneck is the machine itself, not the code?
I had free DigitalOcean credits, so I spun up a Droplet — 4 GB RAM, 2 vCPUs — and re-ran the exact same test. The results were staggering.
$$\text{Local p95} \approx 26.6s \xrightarrow{\text{DigitalOcean Droplet}} \text{VM p95} \approx 2.57s$$
That's roughly a 9× improvement — and it confirmed the hypothesis. The local machine was the bottleneck all along. I also re-ran the 50-user bronze test on the VM:
$$\text{p95: } 2.49s \xrightarrow{\text{VM}} 0.85s$$
From this point on, all tests were run on the VM.
For the silver tier verification, I captured:
docker psoutput showing 2 app containers + 1 NGINX container- A load test screenshot confirming success with 200 users under 3s p95 ✅
Step 4 — Gold Tier (Speed of Light)
The gold tier was the real challenge:
- Handle 500+ concurrent users (or 100 req/s)
- Implement caching
- Identify what was slow and explain how it was fixed
- Keep error rate under 5%
First Run — Chaos
I started by testing the existing 2-instance setup at 500 concurrent users. The result was immediate and dramatic:
- NGINX was throwing errors —
512 worker_connectionswasn't enough - Error rate: 92% 🚨
- p95: ~4.53s (surprisingly decent, but irrelevant given the error rate)
Fix #1 — NGINX Worker Connections
The fix was simple: increase worker_connections from 512 to 4096. One config change. Re-ran the test:
- Error rate: 0% ✅
- p95: 6.06s
Further Optimization — Gunicorn Workers & Horizontal Scaling
With the error rate under control, I focused on reducing p95:
| Configuration | p95 | Error Rate |
|---|---|---|
| 2 workers, 2 instances | 6.06s | 0% |
| 4 workers, 2 instances | 6.20s | 0% |
| 8 workers, 2 instances | 6.20s | 0% |
| 8 workers, 4 instances | 5.62s | 0% |
Doubling Gunicorn workers alone had minimal impact — which was initially puzzling, since more workers should mean more parallel request handling. The likely explanation is that other factors (like randomized sleep delays of 0.1–0.5s between k6 iterations) were introducing noise into the results. What did make a measurable difference was horizontal scaling — going from 2 to 4 app instances brought p95 down to 5.62s with a clean 0% error rate.
Gold Tier Verification
- Caching evidence: curl requests showing initial cache
MISSfollowed by cacheHIT✅ - Load test screenshot: 500+ users, error rate < 5% ✅
- Bottleneck report: documented the NGINX worker_connections issue and resolution ✅
What I Learned
- The machine matters — a lot. One of the most valuable lessons from this project is that performance testing on underpowered hardware produces misleading results. What looks like a code problem might just be a resource problem.
- Cloud infrastructure is approachable. Setting up a DigitalOcean Droplet and SSH-ing into it from my local machine was new territory, but it turned out to be straightforward and incredibly useful.
- Incremental optimization isn't always linear. More Gunicorn workers didn't always mean better performance — real-world load test results can be noisy, and understanding why is just as important as the numbers themselves.
- NGINX config can make or break your throughput. A single
worker_connectionssetting was the difference between a 92% error rate and 0%.
Challenges
The biggest challenge was the silver tier performance wall. I spent a significant amount of time wondering:
Is the 3-second target even achievable? Is there a bug in my code? Did they mean 30 seconds?
I tried various tweaks without meaningful improvement — until I made the call to test on a cloud VM. That decision turned everything around.
The other learning curve was DigitalOcean itself — it was my first time setting up a Droplet and managing a remote environment for performance testing. But it ended up being one of the most rewarding parts of the experience.
Built With
- digitalocean
- docker
- docker-compose
- faker
- flask
- flask-caching
- gunicorn
- k6
- nginx
- peewee
- postgresql
- psycopg2-binary
- python
- python-dotenv
- redis
- uv
Log in or sign up for Devpost to join the conversation.