Grafana Dashboard
Architecture
Blocked Deployment
Alerts Discord Webhook

Falcon - Devpost Submission Draft (Polished)

Inspiration

We wanted to build more than a feature-complete URL shortener. Our goal was to build a service that behaves like a production system under real operational pressure: testable, observable, scalable, and recoverable when things fail.

What it does

Falcon is a URL shortener with production engineering guardrails built in.

Create, list, resolve, and deactivate short links
Expose health, live, and ready endpoints for liveness/readiness checks
Publish golden-signal metrics for traffic, latency, errors, and saturation
Support controlled incident simulation and alert testing
Scale horizontally behind Nginx with shared Redis caching
Include operational documentation: runbook, troubleshooting, deploy/rollback, and capacity planning

How we built it

Backend: Python, Flask, Peewee ORM, PostgreSQL
Scalability path: Redis cache + Nginx load balancing + multi-instance Docker Compose
Reliability path: pytest test suite, coverage gates, CI automation, deployment gating
Incident path: structured JSON logs, metrics endpoints, Prometheus scraping, Grafana dashboard, webhook alerts
Verification path: k6 load tests and reproducible evidence artifacts for each tier objective

Challenges we ran into

Tuning behavior under concurrent load while keeping latency low and error rate stable
Keeping alerting and observability configuration consistent across local and deployment environments
Preventing merge friction in generated evidence artifacts during parallel work
Balancing speed with production-quality standards, including strong docs and operational clarity

Accomplishments that we are proud of

Delivered a production-style service, not just a demo prototype
Enforced reliability gates with automated tests and coverage checks
Reached scale milestones with multi-instance architecture and cache-backed consistency
Implemented incident workflows with logs, metrics, dashboard visibility, alerts, and runbooks
Treated documentation as code with concrete operator-facing guides

Highlights from latest evidence:

Automated checks: 29/29 backend tests passed in submission portal
Test quality: 32 pytest tests passing, 78.56% coverage with a 70% gate
Scale evidence: 250-user scenario with p95 well under 2s
Gold scale evidence: 500-concurrent-user path achieved with error rate below 5%
Incident evidence: alert delivery path validated and under-5-minute objective met

What we learned

Production readiness is a systems problem, not a single feature
Observability added early saves significant debugging time later
Documentation quality directly affects incident response speed and team handoff quality
Reproducible evidence automation reduces ambiguity in performance and reliability claims
Small architectural decisions (health checks, cache strategy, load balancing) have large operational impact

What is next for Falcon

Add autoscaling policies and deeper performance tuning for sustained peak traffic
Improve write-path efficiency for visit-count updates under heavy load
Add authentication, rate limiting, and abuse protection
Add staged rollout and canary deployment workflows
Expand alert routing and on-call escalation policies
Add long-term metrics retention and trend-based capacity forecasting