Forecasting Notes
Paper #1 - Time Series Forecasting Principles with Amazon Forecast
Types of Forecasting
- Long term - Strategic
- Short term - Operations day to day business
- Promotions - Seasonal based
- Impact of price, promotion on sales numbers
Key parameters in Retail
- Sku, Timestamp, units sold at sku level
- Sku metadata - color, department, size
- Price data - Price at that point in time
- Promotional information of sku
- Instock or purchased product
Could do at each SKU Level for sales forecast
Forecast (Target) - Units sold = (Day of week) + WeekendFlag + PromotionalFlag + IsSeasonalProduct + IsTop10SellerForseason + IsTop10inOnlinechannel + IsForAllAgegroups + IsforOld + IsforTeens + IsLowAlcholic + IsAllweatherItem + Weatherofday + ProductPriceontheDay + IsthereBundleOffer
Additional Insights of time
‘Year’, ‘Month’, ‘Week’, ‘Day’, ‘Dayofweek’, ‘Dayofyear’, ‘Is_month_end’, ‘Is_month_start’, ‘Is_quarter_end’, ‘Is_quarter_start’, ‘Is_year_end’, and ‘Is_year_start’.
Data Insights
- Aggregate sales by week, day, quarter, holidays, weekends
Handling Missing Data
- Zero filling
- NaN
The weighted quantile loss (wQuantileLoss) calculates how far the forecast is from actual demand in either direction as a percentage of demand on average in each quantile
For the p10 forecast, the true value is expected to be lower than the predicted value 10% of the time
For the p90 forecast, the true value is expected to be lower than the predicted value 90% of the time
Models
- Arima
- prophet
- DeepAR+
- Vector Autoregressive Moving Average with eXogenous regressors model
Link #2 - Time series forecasting
Forecast multiple steps:
- Single-shot: Make the predictions all at once.
- Autoregressive: Make one prediction at a time and feed the output back to the model.
Evaluation of Time Series Forecasting Models for Estimation of PM2.5 Levels in Air
#pip install shap | |
#https://mljar.com/blog/feature-importance-in-random-forest/ | |
#Regression-Enhanced Random Forests | |
#https://arxiv.org/pdf/1904.10416.pdf | |
#https://www.machinelearningplus.com/machine-learning/feature-selection/ | |
#https://www.yourdatateacher.com/2021/05/05/feature-selection-in-machine-learning-using-lasso-regression/ | |
import numpy as np | |
import pandas as pd | |
from sklearn.datasets import load_boston | |
from sklearn.model_selection import train_test_split | |
from sklearn.ensemble import RandomForestRegressor | |
from sklearn.inspection import permutation_importance | |
import shap | |
from matplotlib import pyplot as plt | |
plt.rcParams.update({'figure.figsize': (12.0, 8.0)}) | |
plt.rcParams.update({'font.size': 14}) | |
boston = load_boston() | |
X = pd.DataFrame(boston.data, columns=boston.feature_names) | |
y = boston.target | |
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12) | |
rf = RandomForestRegressor(n_estimators=100, random_state = 0) | |
rf.fit(X_train, y_train) | |
print(rf.feature_importances_) | |
plt.barh(boston.feature_names, rf.feature_importances_) | |
sorted_idx = rf.feature_importances_.argsort() | |
plt.barh(boston.feature_names[sorted_idx], rf.feature_importances_[sorted_idx]) | |
plt.xlabel("Random Forest Feature Importance") | |
#The permutation based importance can be used to overcome drawbacks of default feature importance computed with mean impurity decrease. | |
perm_importance = permutation_importance(rf, X_test, y_test) | |
sorted_idx = perm_importance.importances_mean.argsort() | |
plt.barh(boston.feature_names[sorted_idx], perm_importance.importances_mean[sorted_idx]) | |
plt.xlabel("Permutation Importance") | |
explainer = shap.TreeExplainer(rf) | |
shap_values = explainer.shap_values(X_test) | |
shap.summary_plot(shap_values, X_test, plot_type="bar") | |
shap.summary_plot(shap_values, X_test) | |
More Reads
Taxonomy of Time Series Forecasting Problems
Time Series Forecasting With Deep Learning: A Survey
Keep Thinking!!!
No comments:
Post a Comment