"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;
Showing posts with label ML Challenges. Show all posts
Showing posts with label ML Challenges. Show all posts

June 10, 2021

The challenges to put ML models in production (Healthcare)

 Very good thread, Summarizing insights

Observations from papers ?

  • None of the 415 ML papers published on the subject in 2020 was usable. Not a single one!
  • Black small square 2212 papers, Black small square 415 after initial screening, Black small square 62 chosen for detailed analysis, Black small square 0 with potential for clinical use
  • Many papers were using very small datasets often collected from a single hospital - not enough for real evaluation
  • Some papers used a dataset that contained non-COVID images from children and COVID images from adults. These methods probably learned to distinguish children from adults
  • Training and testing on the same data 
  • Many papers failed to disclose the amount of data they were tested or important aspects of how their models work leading to poor reproducibility and biased results
  • Many papers didn't even consult with radiologists.
  • Rushing to publish results based on small and bad quality datasets undermines the credibility of ML
  • At some point people start figuring out how to fine tune on the test set
  • Dataset is not diverse enough and bias-free
  • Authors find that covid-19 detectors often attend to the position of the shoulders and not the lungs. Models can easily learn shortcuts as opposed to robust features

Take everything with a pinch of salt. Real world data is not kaggle data. Kaggle does not reflect the reality or quality or the challenges we spot on data.

Keep Exploring!!!