"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

March 22, 2016

Day #10 Data Science Learning - Correlations

Correlation
  • If you have correlation you can use machine learning to predict variables
  • Mutual relationship connection between two or more things
  • Correlation shows inter dependence between two variables
  • Measure - How much one changes when other also changes ?
  • Popularly Used - Pearson Correlation coefficient
  • Value ranges from -1 to +1
  • Negative correlation (Closer to -1) - One value goes up other goes down
  • Closer to Zero (No Correlation)
  • Closer to 1 (Positive Correlation)
Correlation - Relationship between two values
Causation - Reason for change in value (Cholesterol vs weight, Dress Size Vs Cholesterol). Identify if it is incidental.

Handling highly correlated variables
  • Remove two correlated variables, unless correlation=1 or -1, in which case one of the variables is redundant.
  • Perform a PCA
  • Permutation feature importance (Link)
  • Greedy Elimination (GE): iteratively eliminate feature of highest correlated feature pair
  • Recursive Feature Elimination (RFE): recursive elimination of features with respect to their importance
  • Lasso Regulariosion (LR): use L1 regularisation to remove features with zero weight
  • Principle Component Analysis (PCA): transform data set with PCA and choose components with highest variations
Ref  - Link1Link2 , Link3

Happy Learning!!!

No comments: