"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 08, 2017

Day #93 - Regularizations

Four methods of Regularization
  • Cross Validation inside training data
    • 4 to 5 folds of K-Fold Validations
    • Split into K non-intersecting subsets
    • Leave one out scheme
    • Target variable leakage is still present in K Fold Scheme
  • Smoothing based on size of category
    • Category big lot of data points
    • Formula = (mean(target)*nrows+globalmean*alpha)/(nrows+alpha)
    • alpha = category size we can trust
  • Add Random Noise
    • Unstable, Hard to make it work
    • Too much noise
    • LOO, Leave one out Regularization
  • Sorting and calculating mean on some type of data
    • Fix sorting order of data
    • Use Rows 0 to N-1 to calculate mean for N-1
    • Least Leakage
 Happy Learning!!!

No comments: