A large coefficient will result in overfitting. To avoid we perform regularization. Regularization - To avoid overfitting
- L1 - Sum of values (Lasso - Least absolute shrinkage and selection operator). L1 will be meeting in co-ordinates and result in one of the dimensions zero. This would result in variable elimination. The features that minimally contribute will be ignored.
- L2 - Sum of squares of values (Ridge). L2 is kind of circle shaped. This will shrink all coefficient in same proportion but eliminate none
- Discriminative - In SVM we use hyperplane to classify the classes. This is example for discriminative approach
- Probabilistic - Generated by Gauss Distribution. This is again based on Central Limit Theorem. Infinite points will fit into a Normal distribution. Here we apply gauss distribution model
- Max Likelihood - Probability that the point p belongs to one distribution.
Good Read for
L2 - Indeed, using the L2 loss comes from the assumption that the data is drawn from a Gaussian distribution
Another
Read -
- L1 Loss function minimizes the absolute differences between the estimated values and the existing target values. L1 loss function is more robust and is generally not affected by outliers
- L2 loss function minimizes the squared differences between the estimated and existing target values. L2 error will be much larger in the case of outliers
Happy Learning!!!