"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 05, 2017

Day #84 - Data Leaks and Validations

  • Mimic Train / Test Splot as the test data
  • Perform KFold Validations
  • Choose best parameters for models
  • Submission Stage (Can't mimic exact train / test split)
  • Calculate mean and standard deviations of leader board scores
Data Leaks
  • Unexpected information in data that lets you make good predictions
  • Unusable in real world
  • Results of unintentional error
Time Series
  • Incorrect timesplits still exists
  • Check public and private splits
  • Missing feature columns are data leaks
Unexpected Information
  • Use File creation dates
  • Resize features / change creation date
  • ID's no sense to include in model
Happy Learning and Coding!!!

No comments: