"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

June 10, 2021

The challenges to put ML models in production (Healthcare)

 Very good thread, Summarizing insights

Observations from papers ?

  • None of the 415 ML papers published on the subject in 2020 was usable. Not a single one!
  • Black small square 2212 papers, Black small square 415 after initial screening, Black small square 62 chosen for detailed analysis, Black small square 0 with potential for clinical use
  • Many papers were using very small datasets often collected from a single hospital - not enough for real evaluation
  • Some papers used a dataset that contained non-COVID images from children and COVID images from adults. These methods probably learned to distinguish children from adults
  • Training and testing on the same data 
  • Many papers failed to disclose the amount of data they were tested or important aspects of how their models work leading to poor reproducibility and biased results
  • Many papers didn't even consult with radiologists.
  • Rushing to publish results based on small and bad quality datasets undermines the credibility of ML
  • At some point people start figuring out how to fine tune on the test set
  • Dataset is not diverse enough and bias-free
  • Authors find that covid-19 detectors often attend to the position of the shoulders and not the lungs. Models can easily learn shortcuts as opposed to robust features

Take everything with a pinch of salt. Real world data is not kaggle data. Kaggle does not reflect the reality or quality or the challenges we spot on data.

Keep Exploring!!!

No comments: