Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): K Means vs K

March 06, 2023

K Means vs K - Modes

The average taken for a set of numbers is called a mean. The middle value in the data set is called the Median.

The number that occurs the most in a given list of numbers is called a mode.

K-modes is really only applicable for categoricial data. Not for sparse numerical data like bag-of-words or tf-idf vectors.

Silhouette Method - This method measure the distance from points in one cluster to the other clusters. Then visually you have silhouette plots that let you choose K.

K-means clustering for numerical data.
K-prototype clustering on mixed data. (numerical + categorical data)

Handle categorical variable

It depends on your categorical variable being used. For ordinal variables, say like bad,average and good, it makes sense just to use one variable and have values 0,1,2

Algos List

Partitioning-based algorithms: k-Prototypes, Squeezer
Hierarchical algorithms: ROCK, Agglomerative single, average, and complete linkage
Density-based algorithms: HIERDENC, MULIC, CLIQUE
Model-based algorithms: SVM clustering, Self-organizing maps
Cluster using e.g., k-means or DBSCAN, based on only the continuous features
Use k-prototypes to directly cluster the mixed data
Use FAMD (factor analysis of mixed data) to reduce the mixed data to a set of derived continuous features which can then be clustered.

The k-means algorithm is the most widely used centre based partitional clustering algorithm.

K modes changes

using a simple matching dissimilarity measure for categorical objects,
replacing means of clusters by modes, and
using a frequency-based method to update the modes.

Ref - Link1, Link2

Keep Exploring!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

March 06, 2023

K Means vs K - Modes

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts