"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 21, 2023

Decision tree - Summary

A decision tree is a type of supervised learning algorithm that is mostly used in classification problems but can also be used for regression

Entropy(p) = - p*log2(p) - (1-p)*log2(1-p)

A high entropy value indicates a high degree of disorder or impurity in the data

Gini(p) = 1 - (p^2 + (1-p)^2)

The lower the value of Gini impurity, the better it is.

We look at Entropy/Gini, Basically depending or target class which features help us with good distribution. The goal is to build a tree with the right choices of root, next level upto leaf nodes.

Information Gain is a statistical property that measures how well a given feature separates the training examples according to their target classification. It is calculated by comparing the entropy of the dataset before and after a transformation. The feature with the highest information gain after the split will be chosen as the node. The process is then repeated for further branch nodes.

  • Information gain is the decrease in entropy after a dataset is split on an attribute.
  • Information Gain measures the reduction in entropy (or impurity) achieved because of the split. 
  • In the decision tree algorithm, at each node, the feature that provides the highest Information Gain is chosen for the split.

Keep Exploring!!!

No comments: