"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

March 27, 2016

Day #11 - Data Science Learning Notes - Evaluating a model

R Square - Goodness of Fit Test
  • R square = (1- (Sum of Squares of Error/Sum of Squares of Total))
  • SST - Variance of dependent variable
  • SSE - Variance of Actual vs Predicted Values
Adjusted R Square 
  • Adjusted R Square = (1-((n-1)/(n-p-1)))(1-RSquare)
  • P - Number of independent variables
  • n - records in dataset
RMSE (Root mean square error)
  • For every record predicted compute error 
  • Square it and find mean
  • RMSE error should be same for training and testing dataset
Bias (Underfit)
  • Model can't explain the dataset
  • R Square value very less
  • Add more Independent variable
Variance
  • RMSE High for test dataset, RMSE low for training dataset
  • Cut down Independent variable
Collinearity Problem
  • Conduct P test to validate null hypothesis is valid
Next Pending Reads
  • Subset Selection Technique
  • Cross Validation Technique
  • Z test / P Test
Happy Learning!!!

No comments: