- Time based splits
- Validation to mimic train / test pic
- Time based trend - differs significantly, Time based patterns important
- In generated features
- In a way model will rely on that features
- In Some kind of target leak
- Random Split (Split randomly by rows, Rows independent of each other), Row wise
- Device special features for dependency cases
- Timewise - Before particular date as training, After date as testing data. Useful features based on target
- Moving window validation
- By Id - (By Clustering pictures, grouping them and then finding features)
- Combined (Split date for each shop independently)
- In most cases split by Rownumber, Time, Id
- Logic for feature generation depends on data splitting strategy
- Set up your validation to mimic the train / test split of competition
No comments:
Post a Comment