Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): K-medoids, K-means

February 23, 2016

K-medoids, K-means

Great Learning and lot of revisions needed to really deep dive and understand the fundamentals.

K-means

Prone to outliers (Squared Euclidean gives greater weight to more distant points)
Can't handle categorical data
Work with Euclidean only

K-Medoids

Restrict centre to data points
Centre picked up only from data points
We use same sum of squares for cost function but distance is not Euclidean distance
Use your own custom distance functions when involved with numerical and categorical variables
Example (25 languages, 24 columns, M/F/N - 2 columns) - Compute your own custom distance functions. It is one less because all zero combinations will also be treated as one attribute

Distance measure for numerical variables

Euclidean based distance
Correlation based distance
Mahalanobis distance

Distance measure for category variables

Matching coef and Jaquard’s coef

	#K medioids (Non-Hierarchical Clustering)
	#data frame
	food = read.csv("protein.csv")
	#Pass DF, Number of Clusters
	pam.result <- pam(food[,-1],2)
	pam.result$clustering
	summary(pam.result)

	#use manhattan measure
	#Pass DF, Number of Clusters
	#Argument diss=TRUE indicates that we use the dissimilarity matrix
	#Partitioning Around Medoids
	pam.result <- pam(food[,-1],k=2,diss=FALSE,metric="manhattan")
	pam.result$clustering
	summary(pam.result)

	plot(food$RedMeat, food$WhiteMeat, type="n", xlim=c(3,19), xlab="Red Meat",ylab="White Meat")
	text(x=food$RedMeat, y=food$WhiteMeat, labels=food$Country,col=pam.result$clustering+1)

view raw Kmediods.R hosted with ❤ by GitHub

Happy Learning!!!

Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database)

February 23, 2016

K-medoids, K-means

No comments:

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist

Blog Archive

Live Traffic

Total Pageviews

Popular Posts