- Mimic Train / Test Splot as the test data
- Perform KFold Validations
- Choose best parameters for models
- Submission Stage (Can't mimic exact train / test split)
- Calculate mean and standard deviations of leader board scores
- Unexpected information in data that lets you make good predictions
- Unusable in real world
- Results of unintentional error
- Incorrect timesplits still exists
- Check public and private splits
- Missing feature columns are data leaks
- Use File creation dates
- Resize features / change creation date
- ID's no sense to include in model
No comments:
Post a Comment