Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Forecasting Notes

July 08, 2021

Forecasting Notes

Paper #1 - Time Series Forecasting Principles with Amazon Forecast

Types of Forecasting

Long term - Strategic
Short term - Operations day to day business
Promotions - Seasonal based
Impact of price, promotion on sales numbers

Key parameters in Retail

Sku, Timestamp, units sold at sku level
Sku metadata - color, department, size
Price data - Price at that point in time
Promotional information of sku
Instock or purchased product

Could do at each SKU Level for sales forecast

Forecast (Target) - Units sold = (Day of week) + WeekendFlag + PromotionalFlag + IsSeasonalProduct + IsTop10SellerForseason + IsTop10inOnlinechannel + IsForAllAgegroups + IsforOld + IsforTeens + IsLowAlcholic + IsAllweatherItem + Weatherofday + ProductPriceontheDay + IsthereBundleOffer

Additional Insights of time

‘Year’, ‘Month’, ‘Week’, ‘Day’, ‘Dayofweek’, ‘Dayofyear’, ‘Is_month_end’, ‘Is_month_start’, ‘Is_quarter_end’, ‘Is_quarter_start’, ‘Is_year_end’, and ‘Is_year_start’.

Data Insights

Aggregate sales by week, day, quarter, holidays, weekends

Handling Missing Data

Zero filling
NaN

The weighted quantile loss (wQuantileLoss) calculates how far the forecast is from actual demand in either direction as a percentage of demand on average in each quantile

For the p10 forecast, the true value is expected to be lower than the predicted value 10% of the time

For the p90 forecast, the true value is expected to be lower than the predicted value 90% of the time

Models

Arima
prophet
DeepAR+
Vector Autoregressive Moving Average with eXogenous regressors model

Link #2 - Time series forecasting

Forecast multiple steps:

Single-shot: Make the predictions all at once.
Autoregressive: Make one prediction at a time and feed the output back to the model.

Evaluation of Time Series Forecasting Models for Estimation of PM2.5 Levels in Air

	#pip install shap
	#https://mljar.com/blog/feature-importance-in-random-forest/
	#Regression-Enhanced Random Forests
	#https://arxiv.org/pdf/1904.10416.pdf
	#https://www.machinelearningplus.com/machine-learning/feature-selection/
	#https://www.yourdatateacher.com/2021/05/05/feature-selection-in-machine-learning-using-lasso-regression/

	import numpy as np
	import pandas as pd
	from sklearn.datasets import load_boston
	from sklearn.model_selection import train_test_split
	from sklearn.ensemble import RandomForestRegressor
	from sklearn.inspection import permutation_importance
	import shap
	from matplotlib import pyplot as plt

	plt.rcParams.update({'figure.figsize': (12.0, 8.0)})
	plt.rcParams.update({'font.size': 14})
	boston = load_boston()
	X = pd.DataFrame(boston.data, columns=boston.feature_names)
	y = boston.target
	X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=12)

	rf = RandomForestRegressor(n_estimators=100, random_state = 0)
	rf.fit(X_train, y_train)

	print(rf.feature_importances_)
	plt.barh(boston.feature_names, rf.feature_importances_)

	sorted_idx = rf.feature_importances_.argsort()
	plt.barh(boston.feature_names[sorted_idx], rf.feature_importances_[sorted_idx])
	plt.xlabel("Random Forest Feature Importance")

	#The permutation based importance can be used to overcome drawbacks of default feature importance computed with mean impurity decrease.
	perm_importance = permutation_importance(rf, X_test, y_test)
	sorted_idx = perm_importance.importances_mean.argsort()
	plt.barh(boston.feature_names[sorted_idx], perm_importance.importances_mean[sorted_idx])
	plt.xlabel("Permutation Importance")

	explainer = shap.TreeExplainer(rf)
	shap_values = explainer.shap_values(X_test)
	shap.summary_plot(shap_values, X_test, plot_type="bar")
	shap.summary_plot(shap_values, X_test)

view raw forecast_feature_imp.py hosted with ❤ by GitHub

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

July 08, 2021

Forecasting Notes

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts