Flash Sale Reservation API

SRE dashbaord

Inspiration

Flash sales are one of the hardest concurrency problems in production engineering. When thousands of users hit "Buy" at the same instant, most naive systems oversell inventory. We wanted to build a system that provably never oversells — and prove it can survive being killed mid-transaction.

What it does

A ticket reservation API that handles flash sale events under extreme concurrent load. Users create events with a fixed ticket count, and thousands of concurrent users race to reserve tickets. The system guarantees:

Exactly N tickets sell — never more, never less
Duplicate users are blocked — one reservation per user per event
Crashes don't corrupt data — mid-flight transactions roll back automatically
Containers self-heal — kill any service and it auto-restarts in seconds

A real-time React dashboard provides SRE-style observability: database health, CPU/RAM, Gunicorn worker count, live RPS charting, and an error log feed.

How we built it

Flask + Peewee ORM on PostgreSQL with SELECT ... FOR UPDATE pessimistic row-level locking inside db.atomic() transactions
Gunicorn with gthread workers and a custom autoscaler that scales 1–6 workers based on real-time RPS
Docker Compose orchestrating API, PostgreSQL, and React frontend with restart: always and persistent volumes
GitHub Actions CI with a coverage gate that blocks any push below 70%
38 pytest tests at 92% coverage including crash recovery, data consistency, and input boundary validation

Challenges we faced

Duplicate reservation rollback — When a user tries to reserve twice, the ticket count was already decremented. We solved this with nested savepoints that restore the count on IntegrityError.
RPS measurement across forked workers — Gunicorn forks processes, so module-level counters are process-local. We used multiprocessing.Value with locks for the request counter and threading.Lock for the RPS computation.
Docker restart behavior on Windows — docker kill + restart: always has delayed recovery on Docker Desktop vs Linux. We documented this and verified recovery works correctly in production-like environments.

What we learned

Pessimistic locking is the only safe approach for inventory systems — optimistic retry loops can still oversell under burst load
Chaos engineering is more useful than unit tests for finding real production failures
A live dashboard makes reliability features 10x more compelling to demonstrate

Built With

docker
docker-compose
flask
github-actions
gunicorn
peewee
postgresql
pytest
python
react
vite

Updates

Einav Peer started this project — Apr 05, 2026 04:51 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.