This project was inspired by the real constraints at Smadex: making predictions in just a few milliseconds while dealing with massive datasets, sparse user behavior signals, and highly imbalanced revenue distributions. The challenge was to design something simple, fast, and accurate—a model that could realistically run in production.

🛠️ How we Built It

We built the solution around a FAISS-powered approximate KNN model designed for high-speed retrieval. The workflow includes:

Lightweight preprocessing (target encoding, numeric flags, scaling)
FAISS IVF indexing for fast neighbor search
A two-headed prediction formula:
- P(buyer) from neighbor buyer ratios
- E(revenue | buyer) from neighbor buyer-only revenues
Zero-aware logic to avoid diluting predictions with non-buyers
Streaming, low-memory processing of test data

The full pipeline is optimized for speed, simplicity, and interpretability.

🎓 What we Learned

How much value there is in neighbor-based models when engineered well
The tradeoffs between accuracy and inference latency
How critical feature preprocessing is for high-dimensional KNN
How FAISS indexing can turn a slow idea into a production-grade approach
That simple models can outperform complex ones when paired with the right constraints