This post is on my learning's from Machine Learning Session conducted by my colleague Gopi. It was really a good introduction and a lot of motivation towards learning the topic.
Concepts Discussed
- Homogeneity - Is my data homogeneous
- Pick the odd one out (Anomaly detection)
- Entropy Computation
Wide variety of examples to find odd sets, variations. Example from below set identify the anomaly one
1,1,1,2
1,2,2,1
1,2,1,1
1,0,1,2
The last row involving zero is a odd one. Identifying them using entropy computation was very useful
Formula detailed notes from link
For row (1,1,1,2)
= -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
= -[-0.311 -0.5]
= .811
For row (1,2,2,1)
= -[((2/4)*log2(2/4)) + ((2/4)*log2(2/4))]
= -[-.5-.5]
= 1
For row (1,2,1,1)
= -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
= -[-0.311 -0.5]
= .811
For row (1,0,1,2)
= -[((2/4)*log2(2/4)) + ((1/4)*log2(1/4)) + ((1/4)*log2(1/4))]
= -[-0.5 -0.311 -0.311]
= 1.12
By excluding the row with higher values we will have homogeneous data set, The one last row with high entropy is the anomaly
If Data set is homogeneous after removing a particular record set then that particular record set is the anomaly one
More Concepts Introduced
- Conditional Probability
- ID3 Algorithm
- Measure Entropy
- Decision Tree
- Random Forest
- Bagging Technique
Happy Learning!!!