Very good thread, Summarizing insights
Can you detect COVID-19 using Machine Learning? 🤔
— Vladimir Haltakov (@haltakov) June 9, 2021
You have an X-ray or CT scan and the task is to detect if the patient has COVID-19 or not. Sounds doable, right?
None of the 415 ML papers published on the subject in 2020 was usable. Not a single one!
Let's see why 👇 pic.twitter.com/Vrd91ZpXy3
Observations from papers ?
- None of the 415 ML papers published on the subject in 2020 was usable. Not a single one!
- Black small square 2212 papers, Black small square 415 after initial screening, Black small square 62 chosen for detailed analysis, Black small square 0 with potential for clinical use
- Many papers were using very small datasets often collected from a single hospital - not enough for real evaluation
- Some papers used a dataset that contained non-COVID images from children and COVID images from adults. These methods probably learned to distinguish children from adults
- Training and testing on the same data
- Many papers failed to disclose the amount of data they were tested or important aspects of how their models work leading to poor reproducibility and biased results
- Many papers didn't even consult with radiologists.
- Rushing to publish results based on small and bad quality datasets undermines the credibility of ML
- At some point people start figuring out how to fine tune on the test set
- Dataset is not diverse enough and bias-free
- Authors find that covid-19 detectors often attend to the position of the shoulders and not the lungs. Models can easily learn shortcuts as opposed to robust features
Take everything with a pinch of salt. Real world data is not kaggle data. Kaggle does not reflect the reality or quality or the challenges we spot on data.
Keep Exploring!!!
No comments:
Post a Comment