Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

ML, biomedical data & trust

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 20 Anzeige

ML, biomedical data & trust

Herunterladen, um offline zu lesen

Journal club and talk given to Health Data Analytics MSc, February 2023. Reflecting on how to do good machine learning over biomedical data, the pitfalls and good practices

Journal club and talk given to Health Data Analytics MSc, February 2023. Reflecting on how to do good machine learning over biomedical data, the pitfalls and good practices

Anzeige
Anzeige

Weitere Verwandte Inhalte

Ähnlich wie ML, biomedical data & trust (20)

Anzeige

Aktuellste (20)

ML, biomedical data & trust

  1. 1. gsk.com AI & Big Data Expo, London Machine learning, biomedical data & trust Paul Agapow (Statistics & Data Science Innovation Hub)
  2. 2. Background & disclaimer • Previously a health informatician, biomedical ML researcher, bioinformatician, “computer guy”, disease chaser, epi-informatician, phylogeneticist, evolutionary biologist, immunologist, biochemist … • Now a director @GSK • This presentation does not reflect thought, policy or projects in progress at GSK • There are no conflicts of interest
  3. 3. 10 June 2021 3 “AI will not replace drug hunters, but drug hunters who don’t use AI will be replaced by those who do.” -Andrew Hopkins, CEO Exscientia
  4. 4. 4
  5. 5. 5 07 February 2023 3 hurdles to using AI/ML in therapy development Biological & physiological complexity Insufficient & uneven data A gap between AI/ML practice & medical needs
  6. 6. To make a new drug, you must first solve for everything 6
  7. 7. 12 July 2021 7 The complexity of biology: About 50 trillion cells of 200 types Each cell has 23 pairs of chromosomes In total 6.4 billion basepairs (positions) Organised into about 18,000 genes (Or maybe more like 40,000 genes) Genetic material elsewhere in the cell Epigenetic modification 1 million different types of molecules Lifestyle & history Exposure & environment Immune system repertoire & priming … Of which we know only a fraction
  8. 8. The data types and sources we need are myriad & varied 8 Hughes et al. (2010) ”Principles of early drug discovery”
  9. 9. • There are many different modalities of intervention • With different (data) considerations & different levels of ML experience 07 February 2023 9 There are many different means to the same end McKinsey, EvaluatePharma 2022
  10. 10. It’s often not the right data • Difficult / expensive to generate • Unstructured • Unlabeled • The wrong type • Sparse, unevenly sampled • WEIRD • In different formats and silos 10
  11. 11. 07 February 2023 11 Melanie Mitchell via Dagmar Monett A disconnect between AI/ML practice and medical needs Academic focus on problems with low medical value
  12. 12. • There are many models that work perfectly … in the lab • Why? - Unrealistic or poor training data - Emphasis on hitting metrics 07 February 2023 12 A disconnect between AI/ML practice and medical needs A tendency to treat biomedicine as simply a data / ML problem
  13. 13. The classic analytical tension 13 What we need to solve What we tend to solve Easy things Available, ideal data Ground truth Simplify “Interesting” “Table-land” Useful things Incomplete messy data Unclear biological reality Uncertain findings Needful “Network-land”
  14. 14. 14 Laure Wynants via Maarten van Smeden A disconnect between AI/ML practice and medical needs Many ”good” models are not fit for production
  15. 15. 07 February 2023 15 • The pandemic prompted a flood of publications & preprints • Most plagued by the usual biomedical AI problems • … and also produced by those outside the field • As a general principle, any paper applying ML to COVID is terrible • Bad models in a crisis situation are not neutral, they distract, expend effort, are an opportunity cost COVID was a lightning rod for bad biomedical ML
  16. 16. 07 February 2023 16 • What does it purport to do: Find risk factors associated with deterioration of COVID patients • Why? Better / faster assessment of incoming patients • Who? Patients admitted to two hospitals with +ve PCR test for COVID with CT scan with lesions • Data? Demographics, bloods, labs, breathing/ oxygen scores, CT scans manually scored “Interpretable Prediction of Severity & Crucial Factors of COVID Patients” Zheng et al. BioMed Research International (2021), DOI: 10.1155/2021/8840835
  17. 17. 07 February 2023 17 • Conflates diagnosis & prognosis • The cohort: - Suggested this can replace PCR but cohort are selected by PCR result - The act of taking a CT scan in some ways selects for cohort - Unclear when some readings taken, when we are looking at deterioration - Are the training set the set that a model might be used on in the clinic? - Not many critical – so actually testing for severe cases - What’s the split between hospitals - Patients are different already, pre-existing conditions - Association with age & general health - Old patients running a temperature with lesioned lungs do poorly • Clinical use: - Will all this data be available in a timely fashion for a model in the clinic - If the severity is based of bloods & oxygenation readings, why not just use them - Information complexity? • Validation: - Would it work for another time period at same hospitals? At other hospitals? • Analytics - “The impenetrable wall of math” - XGBoost is always a good place to start - Ensemble methods usually are - Feature interaction? - Some features overlap (neutrophils, n. ratio, NLR) - What features correlate? - No attempt to simplify model - Any model is interpretable with SHAP • Still useful for intrinsic / research purposes Thoughts and questions Not necessarily faults, not all easily answerable
  18. 18. 07 February 2023 18 • Models will always tell you the truth - But it’s the truth conditioned on the data they’ve seen - It might not be the truth you think • Biomedical data is complex, it always come with a context • Patients are complex, they always come with a medical history • How were these patients selected? • What is this model actually saying and why? • Does this model replicate in other populations? • But despite all this, we have to make and actionably interpret models Some principles for better biomedical ML
  19. 19. Click to enter title here Why not join us? 19 Academic Press (2021)
  20. 20. Click to enter title here Some light reading 20 Academic Press (2021)

×