MarketPulse UK — Promotional Intelligence for Damm's UK Business

What it does

MarketPulse UK forecasts monthly hectolitre sales for Damm's 5 major UK retailers (Tesco, Sainsbury's, Asda, Morrisons, Waitrose) across 13 standardized SKUs. It compares predictions against budget targets, decomposes the gap into explainable components (seasonality, promo effect, base drift), and recommends specific promotional actions to close deviations.

How we built it

Data pipeline (build_clean_dataset.py): Parses two messy Excel files — historical sales (UK DATA.xlsx) with Spanish calendar dates and anonymized customer codes, and the 2026 Trade Plan with merged cells, irregular calendars, and non-numeric price strings. Standardizes 50+ raw SKU names into 13 canonical products across 5 retailers. Simulates daily promotions from weekly calendars and aggregates to monthly features.

Modeling (train_global_model.py): LightGBM with dual-track auto-selection (with/without lagged features), hyperparameter grid search over depth, learning rate, and regularization. Trains 5 per-retailer models plus one global pooled model, then auto-selects the best per retailer — improving Asda's R² from 0.04 to 0.40 and Sainsbury's from 0.17 to 0.28.

Inference engine (engine.py): Core logic for forecasting, what-if simulation (6 promo scenarios × 3 discount depths × 2 types), budget gap decomposition (4-component waterfall), and auto-generated commercial recommendations prioritized by gap severity.

Dashboard (app.py): Streamlit with 5 tabs — Forecast (time series with train/test split), Deviation (waterfall decomposition), Simulator (what-if scenarios with P&G), Recommendations (priority-sorted actions), and Info (assumptions and architecture).

Challenges

  • Parsing 5 different Trade Plan formats (Tesco weekly columns, Morrisons date ranges, Asda custom range strings)
  • SKU name inconsistencies across sheets (e.g., "Estrella 12x330ml", "Estrella 12x330ml NRB", "EX1312SA")
  • Budget not provided as a dataset — estimated as previous year + 5%
  • LightGBM's dual-track sometimes picking no-lag models that lose promo signal (mitigated with historical fallback in deviation analysis)

What we learned

  • Global pooled models significantly help retailers with sparse data
  • Promotion effect detection requires lagged features — without them, models ignore promo variables
  • A simple historical fallback can rescue promo effect estimates when the model is uncertain
  • Gross price per hectolitre should use Ventas Brutas (not Venta Neta) as confirmed by the Damm team

What's next

  • Integrate external data (UK holidays, weather, search trends) to improve model accuracy
  • Add weekly forecasting if weekly data becomes available
  • Real budget integration instead of the estimated proxy

Built With

Share this project:

Updates