A decision tree is a type of supervised learning algorithm that is mostly used in classification problems but can also be used for regression
Entropy(p) = - p*log2(p) - (1-p)*log2(1-p)
A high entropy value indicates a high degree of disorder or impurity in the data
Gini(p) = 1 - (p^2 + (1-p)^2)
The lower the value of Gini impurity, the better it is.
We look at Entropy/Gini, Basically depending or target class which features help us with good distribution. The goal is to build a tree with the right choices of root, next level upto leaf nodes.
Information Gain is a statistical property that measures how well a given feature separates the training examples according to their target classification. It is calculated by comparing the entropy of the dataset before and after a transformation. The feature with the highest information gain after the split will be chosen as the node. The process is then repeated for further branch nodes.
- Information gain is the decrease in entropy after a dataset is split on an attribute.
- Information Gain measures the reduction in entropy (or impurity) achieved because of the split.
- In the decision tree algorithm, at each node, the feature that provides the highest Information Gain is chosen for the split.
Keep Exploring!!!
No comments:
Post a Comment