Plec o UK

MarketPulse UK — Promotional Intelligence for Damm's UK Business

What it does

MarketPulse UK forecasts monthly hectolitre sales for Damm's 5 major UK retailers (Tesco, Sainsbury's, Asda, Morrisons, Waitrose) across 13 standardized SKUs. It compares predictions against budget targets, decomposes the gap into explainable components (seasonality, promo effect, base drift), and recommends specific promotional actions to close deviations.

How we built it

Data pipeline (build_clean_dataset.py): Parses two messy Excel files — historical sales (UK DATA.xlsx) with Spanish calendar dates and anonymized customer codes, and the 2026 Trade Plan with merged cells, irregular calendars, and non-numeric price strings. Standardizes 50+ raw SKU names into 13 canonical products across 5 retailers. Simulates daily promotions from weekly calendars and aggregates to monthly features.

Modeling (train_global_model.py): LightGBM with dual-track auto-selection (with/without lagged features), hyperparameter grid search over depth, learning rate, and regularization. Trains 5 per-retailer models plus one global pooled model, then auto-selects the best per retailer — improving Asda's R² from 0.04 to 0.40 and Sainsbury's from 0.17 to 0.28.

Inference engine (engine.py): Core logic for forecasting, what-if simulation (6 promo scenarios × 3 discount depths × 2 types), budget gap decomposition (4-component waterfall), and auto-generated commercial recommendations prioritized by gap severity.

Dashboard (app.py): Streamlit with 5 tabs — Forecast (time series with train/test split), Deviation (waterfall decomposition), Simulator (what-if scenarios with P&G), Recommendations (priority-sorted actions), and Info (assumptions and architecture).

Challenges

Parsing 5 different Trade Plan formats (Tesco weekly columns, Morrisons date ranges, Asda custom range strings)
SKU name inconsistencies across sheets (e.g., "Estrella 12x330ml", "Estrella 12x330ml NRB", "EX1312SA")
Budget not provided as a dataset — estimated as previous year + 5%
LightGBM's dual-track sometimes picking no-lag models that lose promo signal (mitigated with historical fallback in deviation analysis)

What we learned

Global pooled models significantly help retailers with sparse data
Promotion effect detection requires lagged features — without them, models ignore promo variables
A simple historical fallback can rescue promo effect estimates when the model is uncertain
Gross price per hectolitre should use Ventas Brutas (not Venta Neta) as confirmed by the Damm team

What's next

Integrate external data (UK holidays, weather, search trends) to improve model accuracy
Add weekly forecasting if weekly data becomes available
Real budget integration instead of the estimated proxy

Built With

lightgbm
matplotlib
numpy
pandas
python
scikit-learn
streamlit

Updates

Joel Sanfeliu started this project — May 24, 2026 05:16 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.