- Train Data (Past), Unseen Test Data (Future)
- Divide into three parts - Train (Past), Validation (Past), Test (Future)
- Underfitting (High Error on Both Training and Validation)
- Overfitting (Doesn't generalize to test data, Low Error on Train, High Error on Validation)
- Ideas (Lowest Error on both Training and Testing Data)
- Hold Out (divide data into training / testing, No overlap between training / testing data ) - Used on Shuffle Data
- K-Fold (Repeated hold out because we split our data) - Good Choice for medium amount of data, K- 1 training, one subset - Used on Shuffle Data
- Leave one out : ngroups = len(train) - Too Little data (Special case of K fold, K = number of samples)
- Stratification - Similar target distribution over different folds
- Small datasets (Do Random Splits)
- Unbalanced datasets
- Multiclass classification
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Tip #1 | |
sklearn.model_selection.ShuffleSplit | |
#Tip #2 | |
sklearn.model_selection.Kfold | |
#Tip #3 | |
sklearn.model_selection.LeaveOneOut |
No comments:
Post a Comment