These are frequently occurring terms with respect to performance of model against training and testing data sets.
Classification error = Bias + Variance
Bias (Under-fitting)
- Bias is high if the concept class cannot model the true data distribution well, and does not depend on training set size.
- High Bias will lead to under-fitting
How to identify High Bias
- Training Error will be high
- Cross Validation error also will be high (Both will be nearly the same)
Variance(Over-fitting)
- High Variance will lead to over-fitting
How to identify High Variance
- Training Error will be high
- Cross Validation error also will be Very Very High compared to training error
Hot to Fix ?
Variance decreases with more training data, and increases with more complicated classifiers
Happy Learning!!!