- Train Data (Past), Unseen Test Data (Future)
- Divide into three parts - Train (Past), Validation (Past), Test (Future)
- Underfitting (High Error on Both Training and Validation)
- Overfitting (Doesn't generalize to test data, Low Error on Train, High Error on Validation)
- Ideas (Lowest Error on both Training and Testing Data)
- Hold Out (divide data into training / testing, No overlap between training / testing data ) - Used on Shuffle Data
- K-Fold (Repeated hold out because we split our data) - Good Choice for medium amount of data, K- 1 training, one subset - Used on Shuffle Data
- Leave one out : ngroups = len(train) - Too Little data (Special case of K fold, K = number of samples)
- Stratification - Similar target distribution over different folds
- Small datasets (Do Random Splits)
- Unbalanced datasets
- Multiclass classification
Happy Coding and Learning!!!
No comments:
Post a Comment