- Add new variables based on certain features
- Label encoding is done usually
- Mean encoding is done as variable count / distinct unique variables
- The proportion of label encoding also is included in this step
- Min encoding with label encoding
- Label encoding - No logical order
- Mean encoding - Classes are separable
- We can reach better loss with sorted trees
- Trees need huge number of splits
- Model tries to treat all categories differently
- Goods - Number of ones in a group
- Bads - Number of zeros
Weight of Evidence = In(Goods/Bads)*100
Count = Goods = sum(target)
Diff = Goods-Bads
Happy Learning!!!