K-means
- Prone to outliers (Squared Euclidean gives greater weight to more distant points)
- Can't handle categorical data
- Work with Euclidean only
- Restrict centre to data points
- Centre picked up only from data points
- We use same sum of squares for cost function but distance is not Euclidean distance
- Use your own custom distance functions when involved with numerical and categorical variables
- Example (25 languages, 24 columns, M/F/N - 2 columns) - Compute your own custom distance functions. It is one less because all zero combinations will also be treated as one attribute
Distance measure for numerical variables
- Euclidean based distance
- Correlation based distance
- Mahalanobis distance
Distance measure for category variables
- Matching coef and Jaquard’s coef
Happy Learning!!!
No comments:
Post a Comment