"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

January 27, 2018

Day #100 - Ensemble Methods

It took more than a year to reach 100 posts. This is a significant milestone. Hoping to reach 200 soon.
  • Combining different machine learning models  for more powerful prediction
  • Averaging or blending
  • Weighted averaging
  • Conditional averaging
  • Bagging
  • Boosting
  • Stacking
  • Stacknet
Averaging ensemble methods
  • Combine two results with simple averaging
  • (model1+model2)/2
  • Considerable improvements with averaging can be achieved
  • Perform better when combined but not individually
  • Weighted average - (model1*0.7 + model2*0.3)
  • Conditional average (If < 50 use model1 else model2)
Bagging
  • Averaging slightly different versions of same model to improve accuracy
  • Example - Random Forest
  • Underfitting - Error in bias
  • Overfitting - Errors in variance
  • Parameters that control bagging - Seed, Subsampling or Bootstrapping, Shuffling, Column Subsampling, Model specific parameters, bags (number of models), More bags better results, parallelism
  • BaggingClassifier and BaggingRegressor from sklearn
  • Independant of each other
Boosting
  • Weight based boosting
  • Form of weighted averaging of models where each model is built sequentially via taking into account of past model performance
  • Add sequentially how well previous models have done
Weight based boosting
  • Number of times certain row appears in data
  • Contribution to error / recalculate weights
  • Parameters - Learning rate, shrinkage, trust many models, number of estimators
  • Parameters - Adaboost (sklearn - python), LogitBoost (Weka - java)
Residual based boosting
  • For videos mostly dominant
  • Calculate error of predictions / direction of error
  • Make Error new target variable
  • Parameters - Learning Rate, Shrinkage, ETA
  • Number of estimators
  • Row sub sampling
  • Column sub sampling
  • Sub boosting type - Fully gradient based, Dart
  • XGboost
  • Lightgbm
  • H2O GBM (Handle categorical variables out of box)
  • Catboost
  • Sklearn's GBM
Stacking
  • Making several predictions of a number of models in a hold out set and then using a different meta model to train these predictions
  • Stacking predictions
  • Splitting training set into two disjoint sets
  • Train several base learners on the first part
  • Make predictions with the base learners on the second (validation) part
  • Using predictions from (3) as the input to train a higher level learner
  • Train Algo 0 on A and make predictions for B and C and Save to B1, C1
  • Train Algo 1 on A and make predictions for B and C and Save to B1, C1
  • Train Algo 2 on A and make predictions for B and C and Save to B1, C1
  • Train Algorithm3 on B1 and make predictions for C1

Happy Learning!!!

Day #99 - Statistics and distance based features

Stats
  • Calculate statistics of derived features from neighborhood analysis
  • User_id / Page_id / Ad_price / Ad_position
  • Use label encoder
  • Treat data points implicitly
  • Add lowest and highest price for position of add
  • maximum and minimum price values
  • Pages user visited
  • Standard deviation of prices
  • Most visited page
  • Many more features
  • Introduce new information
Neighbors
  • Number of houses in 500m, 1000m
  • Average price per sq.m
  • Number of schools / supermarkets / parking lots in 500m / 1000m
  • Distance to closest substation
  • Embrace both group-by and nearest neighbor methods
Matrix Factorizations
  • Approach for feature extraction
  • User / Items mapping matrix
  • User - Attributes matrix
  • U X M = R
  • Row and column related features
  • BOW represent larger parse vector
  • Document represented by small dense vector (Dimensionality reduction)
  • Matrix Factorizations
  • SVD, PCA, TruncatedSVD for sparse matrices
  • NMF (Non-Negative Matrix Factorization) - Zero or Positive Number
  • NMF makes data suitable for decision trees
  • Used for Dimensionality reduction
Example Code
x_all = np.cancatenate([x_train,x_test])
pca.fit(x_all)
x_train_pca = pca.transform(x_train)
x_test_pca = pca.transform(x_test)

Happy Learning!!!

January 25, 2018

Day #98 - Advanced Hyperparameter tuning

Neural Network Libraries
  • Keras (Easy to learn)
  • Tensorflow (For production this is used)
  • MxNet
  • PyTorch (Popular in community)
  • sklearn's MLP
Neural Nets
  • Number of neurons per layer
  • Number of layers
  • Optimizers
  • SGD + momentum
  • Adam / Adadelta / Adagrad (In practice lead to more overfitting)
  • Batch size (Huge batch size leads to overfitting)
  • Epochs impact
  • Learning rate - not too high not too low, Rate where network converges
  • Regularization
    • L2/L1 for weights
    • Dropout / Dropconnect
    • Static dropconnect
Linear Models (Scikit-learn)
  • SVC / SVR
  • Sklearn wraps libLinear and libSVM
  • Compile yourself for multicore support
  • LogisticRegression / LinearRegression + regularizers
  • SGDClassifier / SGDRegressor
  • Vowpal Rabbit
  • Regularization parameter (C, alpha, lambda)
  • Start with very small value and increase it
  • SVC starts to work slower as C increases
  • Regularization type
    • L1/L2/L1+L2 - try each
    • L1 can be used for feature selection
Happy Learning!!!

January 24, 2018

Day #97 - Hyperparameter tuning

How to tune hyper parameters ?
  • Which parameters affect most
  • Observe impact of change of value of parameter
  • Examine and iterate to find change of impacts
Automatic Hyper-parameter tuning libraries
  • Hyperopt
  • Scikit-optimize
  • Spearmint
  • GPyOpt
  • RoBO
  • SMAC3
Hyper parameter tuning
  • Tree Based Models (Gradient Boosted Decision Trees - XGBoost, LightGBM, CatBoost)
  • RandomForest / ExtraTrees
Neural Nets
  • Pytorch, Tensorflow, Keras
Linear Models
  • SVM, Logistic Regression
  • Vowpal, Wabbitm FTRL
Approach
  • Define function that will run our model
  • Specify range of hyper parameter
  • Adequate range for search
Results
  • Underfitting
  • Overfitting
  • Good Fit and Generalization
Tree based Models
  • GBDT - XGBoost, LightGBM, CatBoost
  • RandomForest, ExtraTrees - Scikit-learn
  • Others - RGF(baidu / fast_rgf)
GBDT
  • XGBoost - max_depth, subsample, colsample_bytree, colsample_bylevel, min_child_weight, lambda, alpha, eta num_round, seed
  • LightGBM - max_depth / num_leaves, bagging_fraction, feature_fraction, min_data_in_leaf, lambda_l1, lamda_l2, learning_rate num_iterations, seed
  • sklearn.RandomForest/ExtraTrees - N_estimators, max_depth, max_features, min_samples_leaf, n_jobs, random_state

Happy Learning!!!

January 22, 2018

Day #96 - Mean Encoding - Extensions and Generalizations

  • Compact transformation of categorical variables
  • Powerful basis of feature engineering
Using target variable in different tasks. Regression, Multi-class
  • More stats - Percentiles, std, distribution bins
  • Introducing new information from one vs all classifiers in multi-class tasks (N Different encodings)
Domains with many-to-many relationships
  • User to Apps relationships
  • Row for user-app relationship
  • Vector for each app`
Time-series
  • Presence of mean prev da, prev week, prev day
  • Based on data create more complicated features
Encoding interactions and numerical features
  • model structure, analyzing trees
  • Extract from decision trees (If they are in neighboring nodes)
  • xgboost, row features
  • Use split points to identify new features
  • Manually add more mean encoded interactions
  • Involving categorical variables evaluate variable interactions
Correct validation reminder
Local experiments
  • Estimate encodings on X_tr
  • Map them to X_tr and X_val
  • Regularize on X_tr
  • Validate mode on X_tr / X_val split
Submission
  • Estimate Encoding on whole Train data
  • Map them to Train and Test
  • Regularize on Train
  • Fit on Train
Happy Learning!!!