Paper #1 - Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI
Key Notes
- Poor data quality in high-stakes domains can have outsized effects on vulnerable communities and context
- Data Cascades: compounding events causing negative, downstream effects from data issues, resulting in technical debt over time
- Many researchers have pointed to the undervalued human labour that powers AI models
- Practitioners often work with a set of assumptions about their data during analysis and visualisation
- Other frameworks to discover data bugs and clean data include ActiveClean and BoostClean
- Data cascades are complex, long-term, occur frequently and persistently
- Under-valuing of data work is common to all of AI development
- Practitioners viewed data as operations, moved fast, hacked model performance (through hyperparameters rather than data quality)
- Everyone wants to do the model work, not the data work
- It was difficult to get buy-in from clients and funders to invest in good quality data collection and annotation work
- Lack of adequate training on AI data quality
- Cascades triggered by ‘hardware drifts’
- Cascades triggered by ‘environmental drifts’
Paper #2 - Re-imagining Algorithmic Fairness in India and Beyond
Key Notes
- While Indians are part of the AI workforce, a majority work in services, and engineers do not entirely represent marginalities,limiting re-mediation of distances
- While other axes of discrimination and injustices such as disability status
- Algorithmic powerful in India, where the distance between models and oppressed communities is large
- “rich people problems like cardiac disease and cancer, not poor people’s Tuberculosis, prioritised in AI"
More Reads
- Non-portability of Algorithmic Fairness in India
- Auto-Detect: Data-Driven Error Detection in Tables
- BoostClean: Automated Error Detection and Repair for Machine Learning
- Machine Learning-Based Data Cleaning : Current Solutions and Challenges
- BoostClean: Automated Error Detection and Repair for Machine Learning
- Auto-Data Cleaning
- ActiveClean: Interactive Data Cleaning For Statistical Modeling
Keep Thinking!!!
No comments:
Post a Comment