Open in Streamlit


🛍️ Retail Sales Prediction with ML

Predict transaction-level sales for a retail business using an advanced machine learning model such as XGBoost regression model and time-series feature engineering with SHAP explainability.

Python Streamlit SHAP


📌 Inspiration

  • Retail teams drown in transactions but starve for timely demand signals.
  • I wanted a practical, deployable model that converts raw daily sales into actionable predictions—so planners stock the right SKUs, marketers target the right customers, and finance trusts the forecast.

🛠️ What it does

  • Predicts transaction-level Sales_Amount using engineered time-series features.
  • Compares Random Forest vs XGBoost and exposes SHAP explainability for trust.
  • Runs as a Streamlit app: upload your CSV or try the default dataset, view MAE, and inspect feature importance.

Results: XGBoost achieved MAE ≈ $8.67 (better than RF at $9.21) on the reference dataset.


🧱 How I built it

Data: Kaggle Retail Store Sales Transactions (Date, SKU, Quantity, Sales_Amount, etc.).

Feature engineering: lags (7/14/30), rolling means, day-of-week/month, holiday flags, quantity interactions.

Models: RandomForestRegressor and XGBRegressor with tuned depth/learning rate; MinMax scaling where appropriate.

Explainability: SHAP summary and force plots to show drivers of each prediction.

App: Streamlit UI for file upload, on-the-fly inference, metrics, and SHAP visuals.

Artifacts: Saved model_xgb.pkl, scaler.pkl, reproducible requirements.txt.


🧗‍♀️ Challenges I ran into

  • Getting stable MAE across user-uploaded files with different SKU mixes and price ranges.
  • Keeping SHAP plots responsive in a web app without GPU acceleration.
  • Preventing data leakage when creating time-based features (strict train/test split by date).

🏆 Accomplishments that I’m proud of

  • Deployed an end-to-end, explainable forecasting pipeline that non-ML stakeholders can use.
  • Improved accuracy ~6% moving from RF to tuned XGBoost on the same data.
  • Clear business mapping: inventory planning, promo timing, and SKU-level revenue targeting.
  • Clean repo with reproducible training notebook and app.

📚 What I learned

  • Why time-aware splits matter more than raw cross-validation for retail.
  • How SHAP changes stakeholder conversations from “black box” to “business levers.”
  • Practical tradeoffs between model complexity and app latency in Streamlit.

🚀 What’s next for Retail Sales Prediction with ML

  • Add price/promo features and external signals (weather, local events) for uplift.
  • Train a global + per-SKU hybrid to balance generalization with SKU idiosyncrasies.
  • Support batch scoring API + scheduled forecasts for production workflows.
  • Extend to weekly/monthly horizons and multi-step forecasting.
  • Add SHAP-based auto-insights: “Top 5 drivers of tomorrow’s variance.”

👩‍💼 About the Author

Sweety Seelam | Business Analyst | Aspiring Data Scientist | Passionate about building end-to-end ML solutions for real-world problems
Email: sweetyseelam2@gmail.com
LinkedIn
GitHub
Medium
My Portfolio


🔐 Proprietary & All Rights Reserved

© 2025 Sweety Seelam. All rights reserved.
This project, including its source code, trained models, datasets (where applicable), visuals, and dashboard assets, is protected under copyright and made available for educational and demonstrative purposes only.
Unauthorized commercial use, redistribution, or duplication of any part of this project is strictly prohibited.

Built With

Share this project:

Updates