December 31, 2017
December 08, 2017
Day #93 - Regularizations
Four methods of Regularization
- Cross Validation inside training data
- 4 to 5 folds of K-Fold Validations
- Split into K non-intersecting subsets
- Leave one out scheme
- Target variable leakage is still present in K Fold Scheme
- Smoothing based on size of category
- Category big lot of data points
- Formula = (mean(target)*nrows+globalmean*alpha)/(nrows+alpha)
- alpha = category size we can trust
- Add Random Noise
- Unstable, Hard to make it work
- Too much noise
- LOO, Leave one out Regularization
- Sorting and calculating mean on some type of data
- Fix sorting order of data
- Use Rows 0 to N-1 to calculate mean for N-1
- Least Leakage
Labels:
Data Science,
Data Science Tips
Subscribe to:
Posts (Atom)