"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 21, 2014

Machine Learning Notes - Anomaly Detection - Entropy Computation


This post is on my learning's from Machine Learning Session conducted by my colleague Gopi. It was really a good introduction and a lot of motivation towards learning the topic.

Concepts Discussed
  • Homogeneity - Is my data homogeneous
  • Pick the odd one out (Anomaly detection)
  • Entropy Computation
Wide variety of examples to find odd sets, variations. Example from below set identify the anomaly one
1,1,1,2
1,2,2,1
1,2,1,1
1,0,1,2

The last row involving zero is a odd one. Identifying them using entropy computation was very useful

Entropy Formula



Formula detailed notes from link

For row (1,1,1,2)
 = -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
 = -[-0.311 -0.5]
 = .811
 For row (1,2,2,1)
 = -[((2/4)*log2(2/4)) + ((2/4)*log2(2/4))]
 = -[-.5-.5]
 = 1
 For row (1,2,1,1)
 = -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
 = -[-0.311 -0.5]
 = .811 
  
 For row (1,0,1,2)
 = -[((2/4)*log2(2/4)) + ((1/4)*log2(1/4)) + ((1/4)*log2(1/4))]
 = -[-0.5 -0.311 -0.311]
 = 1.12

By excluding the row with higher values we will have homogeneous data set, The one last row with high entropy is the anomaly  
If Data set is homogeneous after removing a particular record set then that particular record set is the anomaly one

More Concepts Introduced
  • Conditional Probability
  • ID3 Algorithm
  • Measure Entropy
  • Decision Tree
  • Random Forest
  • Bagging Technique
Happy Learning!!! 

No comments: