Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): Weekend lessons - Bias and Fairness

May 09, 2021

Weekend lessons - Bias and Fairness

Key Lessons

What is Algo Bias
How we can identify Bias / Mitigate Bias

Appreciate and recognize this severity
Image - Watermelon

Based on culture, we may be inbuilt perceptions
Categorize, simplify, general representations

Sources of Algo Bias
Facial Bias across demographics
Age Detection - Performed worst on darker females

Different cultures different interpretations

Object recognition

Bias correlation with Income and Geography
World population vs dataset distribution

Types of Bias in Deep Learning Systems
Data does not include all representations
Data is not real-world scenarios
General conclusions

Interpretation Driven

Trends in two variables
cs graduates PhD trend
unrelated correlations

Does not capture fundamental driving force
Overgeneralization
Different perspectives

The improved dataset that accounts distribution 16

Procuring data of only certain situations
Not covering complete 100% options

Class/ Feature Imbalances in Data

Real-world distribution vs Model distribution
Frequency in dataset vs real world

Binary classification class
Moving decision boundary
Decision boundary shifts due to class imbalance

Cancer from medical images MRI Scan

Mitigation Techniques

Select and Feed-in batches of class balance
During learning, they will see equal distributions
Reasonable decision boundary

Weight likelihood of individual data points for training
More frequent - lower weight
Less frequent - Higher weight
Inverse of frequency

Lack of diversity in feature spaces
Hair color of images

1. Ground truth distribution of hair color

2. Ground truth distribution of Lip stick

3. Ground truth distribution of Face type

4. Ground truth distribution of Skin color

Bias exists in commercial-grade systems

Improve Fairness

Bias Mitigation
Bias model dataset learning pipeline

Evaluate Bias / Fairness
Fair with respect to variables when conditioned

Multitask learning / Adversarial Training
Start by specifying the attribute
Train model to jointly predict output

Skin color, pose, illumination
VAE to learn the underlying distribution
Find the distribution of latent variables

Approximate distribution by histogram
Estimated joint distribution
Adaptive Adjustment of Resampling probability
Distribution of dark vs light skin tone distribution

Ref - Link

Happy Learning!!!

No comments:

Subscribe to: Post Comments (Atom)