- Predict Apple Sales (Linear Trend)
- Examples - Add features indicating week number, GBDT will consider min calculated value for each week
- Created Generated Tree
- Tree based Methods (Decision Tree)
- Non Tree based Methods (NN, Linear Model, KNN)
- Apply Regularization in equal amounts
- Do proper scaling
- To [0,1]
- sklearn.preprocessing.MinMaxScaler
- X = (X-X.min())/(X.max()-X.min())
- To mean = 0, std = 1
- sklearn.preprocessing.StandardScaler
- X = (X-X.mean())/X.std()
Preprocessing Outliers
- Calculate lower and upper bound values
- Rank transformation
- Better option than Min-Max Scale
- scipy.stats.rankdata
- Log transformation - np.log(1+x)
- Raising to power < 1 - np.sqrt(x+2/3)
- Creating new features
- Engineer using prior knowledge and logic
- Example, Adding price per square feet if price and size of plot is provided
- Tree based methods don't depend on scaling
- Non-Tree methods hugely depend on scaling
- MinMaxScaler - to [0,1]
- StandardScaler - to mean==0, std==1
- Rank - sets spaces between sorted values to be equal
- np.log(1+x) and np.sqrt(1+x)
No comments:
Post a Comment