Here’s the same Devpost writeup, clean and professional with no emojis:


Inspiration

Modern apps do not fail in obvious ways. They fail under load, during traffic spikes, or when a single component crashes. We wanted to build a system that does not just work for one user, but continues working under stress and failure.

This project was inspired by real production systems where uptime, scalability, and observability are critical. We focused on answering one question: what happens when everything goes wrong?


How we built it

We built a distributed URL shortener designed for resilience and scale.

  • Backend: Flask API for URL shortening and redirects
  • Database: PostgreSQL for persistent storage
  • Cache: Redis to reduce repeated database queries
  • Scaling: Multiple containerized app instances using Docker
  • Load balancing: Nginx distributes traffic across instances
  • Load testing: k6 simulates hundreds of concurrent users
  • Observability: Prometheus and Grafana for metrics and dashboards
  • Alerts: Automated alerts sent via Discord when failures occur

The architecture follows a horizontally scalable model where traffic is distributed across multiple services:

Client → Nginx → App Instances → Redis / PostgreSQL

What we learned

  • Horizontal scaling is more effective than increasing server power
  • Caching significantly reduces latency and database load
  • Systems must be designed to fail gracefully, not perfectly
  • Observability is essential for understanding system behavior
  • Reliability comes from testing failure scenarios, not avoiding them

Challenges we ran into

  • Container crashes: ensuring automatic recovery without downtime
  • Load balancing: correctly routing traffic across multiple instances
  • Caching strategy: avoiding stale data while improving performance
  • High concurrency: handling hundreds of simultaneous requests without errors
  • Debugging under load: identifying bottlenecks between CPU, database, and network

One key bottleneck was repeated database reads. By introducing Redis caching, we reduced unnecessary queries and improved response time under heavy load.


Accomplishments that we're proud of

  • Successfully handled 500+ concurrent users
  • Maintained low latency and under 5% error rate under load
  • Built a self-healing system that recovers from crashes automatically
  • Implemented real-time monitoring and alerting
  • Designed a system that reflects real-world production architecture

Built with

  • Python
  • Flask
  • PostgreSQL
  • Redis
  • Docker and Docker Compose
  • Nginx
  • k6
  • Prometheus
  • Grafana

Try it out

Built With

Share this project:

Updates