Inspiration
We came into this hackathon wanting to go beyond just "building an app that works." Every team can stand up a REST API. We wanted to build one that stays up. The Production Engineering track caught our eye because it's the closest thing to real-world SRE work: you're not just writing code, you're thinking about what happens when things break at 2 AM. We asked ourselves: what if we treated a simple URL shortener like it was serving millions of users? That mindset shaped everything we built.
What it does
Farmers URL Shortener is a full-featured URL shortening API with production-grade infrastructure wrapped around it. At its core, it does what you'd expect: create short URLs, redirect users, track events, manage users with bulk CSV import. But the interesting part is everything around the API:
- Two Flask instances behind an Nginx load balancer with
least_connrouting - Redis cache-aside pattern on read-heavy endpoints (
GET /userswith 30s TTL,GET /urlswith 15s TTL), with automatic invalidation on writes - Prometheus scraping both app instances every 10 seconds, with three alert rules (ServiceDown, HighErrorRate, HighLatency)
- Grafana dashboard showing the four golden signals (Traffic, Errors, Latency, and Saturation) auto-provisioned from version-controlled JSON
- Structured JSON logging on every request, so we can actually grep production logs without crying
- Self-healing containers with Docker restart policies
- 70+ tests with 82% code coverage enforced in CI
The whole stack, 8 Docker containers, comes up with a single docker-compose up -d --build.
How we built it
We followed a phased approach from our master plan (todo.md), which we wrote before touching any code. Sellapan led the planning effort and kept us honest with code reviews, every PR got a second pair of eyes before merging.
Phase 1-2 was the foundation: Peewee models, Flask blueprints, all the CRUD endpoints. Prajith and Srijan tag-teamed the implementation, writing the API routes and the test suite in parallel. We caught a ton of edge cases early, integer usernames, duplicate emails, malformed CSVs, because we were writing tests alongside the code, not after.
Phase 3 was reliability: pytest with coverage thresholds, GitHub Actions CI, and graceful JSON error handling everywhere (no more Flask HTML stack traces leaking to users).
Phase 4 was where it got fun. We added prometheus-flask-exporter for metrics, structured logging with python-json-logger, and a deep health check that actually pings both PostgreSQL and Redis. Then we built the Grafana dashboard and did a "Sherlock Mode" demo, injected a time.sleep(0.5) into GET /users, watched the latency spike on the dashboard, diagnosed it without looking at code, fixed it, and watched the dashboard recover. Ravisankar helped us figure out the Prometheus/Grafana provisioning setup and made sure our monitoring config was solid.
Phase 5 was scalability: baseline load test with k6 at 50 VUs, then Nginx + 2 instances at 200 VUs, then Redis caching for the 500-VU tsunami test. Each step had before/after numbers documented in our capacity plan.
Challenges we ran into
The hidden test failures were brutal. We passed 27 out of 29 tests early on, but the last 2 took hours of debugging with no error messages to work from. One turned out to be a CSV bulk import issue, we were using User.create() which crashed on duplicates instead of User.get_or_create() which handles them gracefully. We only found it by reading the test output character by character.
Docker networking on Windows was another headache. PostgreSQL runs on port 5432 inside Docker but we had a local Postgres conflicting on the same port, so we mapped it to 5433 externally. Sounds simple, but it caused confusing "connection refused" errors for about 30 minutes before we figured out the port mapping.
Grafana auto-provisioning was tricky to get right. The dashboard JSON needs a specific datasource UID that matches the provisioned Prometheus datasource, and if they don't match, you get empty panels with no error message. Ravisankar researched the provisioning docs and helped us get the datasource UID wired correctly.
Cache invalidation: the two hardest problems in CS, right? We went with SCAN-based prefix invalidation (e.g., delete all keys matching users:* on any user write). Simple, but we had to make sure every single write endpoint called invalidate_cache(), missing one means stale data. Prajith caught a missing invalidation call on the PUT /users endpoint during code review.
Accomplishments that we're proud of
- 70 tests, 82% coverage, zero flaky tests. Every test runs against a real PostgreSQL database, not mocks. The
autousefixture truncates tables between tests so they're fully isolated. - The Sherlock Mode demo. Injecting a real bug, diagnosing it from the Grafana dashboard alone, fixing it, and watching the metrics recover, that felt like actual SRE work, not a hackathon exercise.
- 8-container orchestration in one command.
docker-compose up -d --buildgives you two load-balanced app servers, a database, a cache, a reverse proxy, metrics collection, dashboards, and alerting. All config is in version control. - The capacity plan progression. We have real numbers: 50 VUs on a single instance → 200 VUs with Nginx → 500 VUs with Redis caching. Each step has a documented "what was the bottleneck and how did we fix it."
- Cache never crashes the app. Every Redis call is wrapped in try/except. If Redis goes down, we just hit PostgreSQL directly. We tested this by stopping the Redis container mid-traffic.
What we learned
- Observability isn't optional, it's how you debug. Before Grafana, we were
print()-debugging latency issues. After Grafana, we could see exactly which endpoint was slow, when it started, and whether it correlated with traffic spikes. It changed how we think about debugging. - Load testing reveals problems you'd never find manually. Our API worked perfectly at 1 user. At 50 concurrent users, Flask's dev server fell over. At 200 users, PostgreSQL became the bottleneck. You can't reason your way to these findings, which you have to measure.
- Cache-aside is the right first caching pattern. We considered write-through caching but it adds dual-write complexity. Cache-aside is simple: miss → query DB → cache result. On write → invalidate. The database is always the source of truth.
- Test isolation matters more than test count. We had a flaky test early on because one test was leaving data behind that another test depended on. The
autouse=Truecleanup fixture fixed it permanently. Shared mutable state is the enemy.
What's next for Farmers URL Shortener
- Connection pooling with PgBouncer or Peewee's
PooledPostgresqlDatabaseunder 500 VUs, DB connection contention is our next bottleneck - Rate limiting at the Nginx layer to protect against abusive clients
- Read replicas for PostgreSQL, separate read and write traffic to scale horizontally
- SLO definitions with error budgets, formalize our reliability targets (99.9% availability, p95 < 500ms)
- The 3 remaining hidden tests, we're at 27/29 and those last two are haunting us
Key endpoints:
A URL shortener API that creates short links, tracks analytics events, and handles bulk CSV imports. Deployed live at https://walrus-app-mkqo6.ondigitalocean.app
GET /health— deep health check (DB + Redis status)POST /users,GET /users,PUT /users/:idPOST /users/bulk— CSV importPOST /urls,GET /urls,PUT /urls/:idGET /<short_code>— 302 redirectGET /events— analytics
Team
| Member | Role |
|---|---|
| Prajith | Coding, testing, and debugging |
| Srijan | Coding, testing, and debugging |
| Sellapan | Planning, code reviews, and quality checks |
| Ravisankar | Technology research, documentation, and video demo |
Log in or sign up for Devpost to join the conversation.