"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 08, 2021

Forecasting Notes

Forecasting Notes

Paper #1 - Time Series Forecasting Principles with Amazon Forecast

Types of Forecasting

  • Long term - Strategic
  • Short term - Operations day to day business
  • Promotions - Seasonal based
  • Impact of price, promotion on sales numbers

Key parameters in Retail

  • Sku, Timestamp, units sold at sku level
  • Sku metadata - color, department, size
  • Price data - Price at that point in time
  • Promotional information of sku
  • Instock or purchased product

Could do at each SKU Level for sales forecast

Forecast (Target) - Units sold = (Day of week) + WeekendFlag + PromotionalFlag + IsSeasonalProduct + IsTop10SellerForseason + IsTop10inOnlinechannel + IsForAllAgegroups + IsforOld + IsforTeens + IsLowAlcholic + IsAllweatherItem + Weatherofday + ProductPriceontheDay + IsthereBundleOffer 

Additional Insights of time

‘Year’, ‘Month’, ‘Week’, ‘Day’, ‘Dayofweek’, ‘Dayofyear’, ‘Is_month_end’, ‘Is_month_start’, ‘Is_quarter_end’, ‘Is_quarter_start’, ‘Is_year_end’, and ‘Is_year_start’.

Data Insights 

  • Aggregate sales by week, day, quarter, holidays, weekends

Handling Missing Data

  • Zero filling
  • NaN

The weighted quantile loss (wQuantileLoss) calculates how far the forecast is from actual demand in either direction as a percentage of demand on average in each quantile 

For the p10 forecast, the true value is expected to be lower than the predicted value 10% of the time

For the p90 forecast, the true value is expected to be lower than the predicted value 90% of the time

Models 

  • Arima
  • prophet
  • DeepAR+
  • Vector Autoregressive Moving Average with eXogenous regressors model

Link #2 - Time series forecasting

Forecast multiple steps:

  • Single-shot: Make the predictions all at once.
  • Autoregressive: Make one prediction at a time and feed the output back to the model.

Evaluation of Time Series Forecasting Models for Estimation of PM2.5 Levels in Air

#pip install shap
#https://mljar.com/blog/feature-importance-in-random-forest/
#Regression-Enhanced Random Forests
#https://arxiv.org/pdf/1904.10416.pdf
#https://www.machinelearningplus.com/machine-learning/feature-selection/
#https://www.yourdatateacher.com/2021/05/05/feature-selection-in-machine-learning-using-lasso-regression/
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import permutation_importance
import shap
from matplotlib import pyplot as plt
plt.rcParams.update({'figure.figsize': (12.0, 8.0)})
plt.rcParams.update({'font.size': 14})
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = boston.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12)
rf = RandomForestRegressor(n_estimators=100, random_state = 0)
rf.fit(X_train, y_train)
print(rf.feature_importances_)
plt.barh(boston.feature_names, rf.feature_importances_)
sorted_idx = rf.feature_importances_.argsort()
plt.barh(boston.feature_names[sorted_idx], rf.feature_importances_[sorted_idx])
plt.xlabel("Random Forest Feature Importance")
#The permutation based importance can be used to overcome drawbacks of default feature importance computed with mean impurity decrease.
perm_importance = permutation_importance(rf, X_test, y_test)
sorted_idx = perm_importance.importances_mean.argsort()
plt.barh(boston.feature_names[sorted_idx], perm_importance.importances_mean[sorted_idx])
plt.xlabel("Permutation Importance")
explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, plot_type="bar")
shap.summary_plot(shap_values, X_test)


More Reads

Taxonomy of Time Series Forecasting Problems

Time Series Forecasting With Deep Learning: A Survey

Keep Thinking!!!

No comments: