- If you have correlation you can use machine learning to predict variables
- Mutual relationship connection between two or more things
- Correlation shows inter dependence between two variables
- Measure - How much one changes when other also changes ?
- Popularly Used - Pearson Correlation coefficient
- Value ranges from -1 to +1
- Negative correlation (Closer to -1) - One value goes up other goes down
- Closer to Zero (No Correlation)
- Closer to 1 (Positive Correlation)
Causation - Reason for change in value (Cholesterol vs weight, Dress Size Vs Cholesterol). Identify if it is incidental.
Handling highly correlated variables
- Remove two correlated variables, unless correlation=1 or -1, in which case one of the variables is redundant.
- Perform a PCA
- Permutation feature importance (Link)
- Greedy Elimination (GE): iteratively eliminate feature of highest correlated feature pair
- Recursive Feature Elimination (RFE): recursive elimination of features with respect to their importance
- Lasso Regulariosion (LR): use L1 regularisation to remove features with zero weight
- Principle Component Analysis (PCA): transform data set with PCA and choose components with highest variations
No comments:
Post a Comment