"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 16, 2021

Forecasting Reads - Research papers - Retail

Paper #1 - An industry case of large-scale demand forecasting of hierarchical components

Key Notes

  • Demand forecasting system of electronic components in manufacturing
  1. Algos leveraged - 1) Adaboost, 2) ARIMAX, 3) ARIMA, 4) Bayesian Structural Time Series (BSTS), 5) Bayesian Structural Time Series with a Bayesian Classifier (BSTS Classifier), 6) Ensemble of Gradient Boosting (Ensemble), 7) Ridge regression (Ridge), 8) Kernel regression (Kernel), 9) Lasso, 10) Matrix Factorization from section VII (MF), 11) Neural Network (NN), 12) Poisson regression (Poisson), 13) Random Forest (RF), 14) Support Vector Regression (SVR).

  • Techniques on 1) data pre-processing, (2) prediction, and (3) model selection 
  • Symmetric Mean Absolute Percent Error (SMAPE) serves to evaluate the performance of the models

Paper #2 - Learnings from Kaggle’s Forecasting Competitions

Key Notes

  • High-frequency series at weekly, daily, and sub-daily levels
  • Frequency data in the form of weekly, daily and hourly data
  • Three full seasonal periods were required at each frequency  i) complex vs. simple models, ii) crosslearning, iii) prediction uncertainty and iv) ensembling
  • Walmart Store Sales and the Rossmann competitions
  • Sales by store/department/week and store/day
  • Forecasts of unit sales being required by product/store/day

Data Preprocessing

  • Set NA or Negative values to zero.
  • Remove time series with all zero values. 
  • Remove leading zeros.
  • To calculate the feature vectors, we use the R package feats
  • Apply principal components for dimensionality reduction using the prcomp algorithm


  • Most of the top performers used ensembles of global XGBoost models to create forecasts, but a few of them did include local XGBoost models as part of their ensemble
  • Holidays and promotion, turned out to be essential for obtaining high performance in this competition
  • Global ensemble models outperform local single models

Feature Extraction

  • Day of Week
  • Weekend
  • IsHoliday
  • Ispromotionday
  • IsMonthEnd
  • IsyearEnd
  • IsQuarterEnd
  • IsLocalHoliday
  • WeekofYear
  • Wolling Window
  • Average of 2 - 3 - Weeks
  • Moving Average Numbers
  • Mean Every 2 Weeks
  • Incremental Differences Everyday
  • Adding Averages / Means - Weekly Average, Daily Average

Paper #3 - An Empirical Analysis of Feature Engineering for Predictive Modeling

Following sixteen selected engineered features:

  • Counts
  • Differences
  • Distance Between Quadratic Roots
  • Distance Formula
  • Logarithms
  • Max of Inputs
  • Polynomials
  • Power Ratio (such as BMI)
  • Powers
  • Ratio of a Product
  • Rational Differences
  • Rational Polynomials
  • Ratios
  • Root Distance
  • Root of a Ratio (such as Standard Deviation)
  • Square Roots
  • Counts - count engineered feature counts the number of elements in the feature vector that satisfies a certain condition
  • Statisticians have long used logarithms and power functions to transform the inputs to linear regression

Paper #4 - VEST: Automatic Feature Engineering for Forecasting



  • sku,wkno,saleqty
  • cluster and forecast
  • DWT - Dynamic Time Wraping Metric for clustering timeseries
For offline Retail Stores my list of feature variables

Store level stats
  • Date
  • StoreId
  • Items in Store
  • Traffic Count
  • Holiday / Festival
  • Number of Item Categories
  • Weather
  • Out of Stock Items
  • Cost of Products, The price value of SKU
  • Promotional Offers / Seasonal Information
  • Weather Information on Store on that Day
  • Store operational timings
  • Store Labour Details
Data Product Thinking
  • With 20% more restock of this item, It might reduce 10% out of Stock, 5% improvement in Traffic (Instead of blind forecast provide a collated recommendation)
  • With 20% reduction in tomorrow traffic, corresponding items or % of Sale Can be presented
  • Multiple models will run behind these decisions to generate the recommendations
More Reads

Keep Thinking!!!

No comments: