GTC Europe 2017 - How to predict ICU mortality with digital health data

•

0 gefällt mir•84 views

In this presentation, we will discuss modeling electronic health record (EHR) data with deep learning and Deeplearning4j (DL4J). We draw inspiration from recent research showing that carefully designed neural network architectures can learn effectively from the complex, messy data collected in EHRs. Specifically, we describe how to train a long short-term memory recurrent neural network (LSTM RNN) to predict in-hospital mortality among patients hospitalized in the intensive care unit (ICU). Of particular note, our results show that even for a dataset of moderate size, the LSTM is competitive with alternative approaches, including decision trees and multilayer perceptrons, using hand-engineering features. We will also show how to parallelize model training on a Spark cluster. Finally, we will highlight potential extensions of this work and other use cases for EHR data and deep learning. All code and data are publicly available, so that attendees may reproduce our work.

Technologie

How to predict ICU mortality with
digital health data
Max Pumperla
Munich, October 11th 2017

Founded 2014
Clients 14 Enterprises
3,500 GH Forks, 7,200 Stars
300,000+ DL4J downloads/mo.
Team ~35; mostly engineers; 6 PhDs
COMPANY OVERVIEW
Deeplearning4j
Build, train, and deploy neural
networks on JVM
ND4J
High performance linear algebra CPU
and GPU libraries
DataVec
Data ingestion, normalization, and
vectorization

$HEALTHCARE AT SKYMIND ● Part of “Healthy China 2030” initiative ○ Expand health service industry to $2.35 trillion ○ Pilot project with 3600 hospitals in Fuzhou ○ 20 leading industry partners around CEC ○ Fatty liver disease detection ○ Bone fracture detection ○ Other use cases to come ● Intensive care unit (ICU) mortality prediction$

DIGITAL HEALTH DATA ADOPTION
Randall Wetzel, Virtual PICU, Children’s Hospital LA
“The patient is data” The patient!

DATA RECORDED IN EHR IN ICU
● Patient-level info (e.g., age, gender)
● Physiologic measurements (e.g., heart rate)
● Lab results (e.g., glucose)
● Clinical assessments (e.g., glasgow coma scale)
● Medications and treatments (one treatment:
mechanical ventilation)
● Clinical notes
● Diagnoses
● Outcomes (one outcome: mortality)

PHYSIONET DATA
● Data from 12000 ICU stays published 2012
○ Only 4000 labeled records publicly available
○ 4000 unlabeled records used for tuning during
competition (we didn’t use)
○ 4000 test examples not available
● Binary outcome: in-hospital survival or mortality
(~13% mortality)
● Sequences vary in length from hours to weeks
● Observations begin at time of admission, not at
onset of illness
General descriptors
• ID
• age
• gender
• weight
• height
• ICU type (cardiac,
medical, surgical, trauma)
Time-series data
• blood pressure
• blood glucose
• etc.

PHYSIONET CHALLENGE
Goal: Advances toward accurate
patient-specific predictive models
Task: Predict mortality from first 48
hours of data
Challenges:
• Long-term dependencies
• Temporal outcome
• Class imbalances
• Measuring treatment effects
• Missing data
• Irregularly sampled data

PREPROCESSING & DATA IMPUTATION
1. Flatten data: (timestamp, variable name, value)
2. One-hot encoding of categorical variables
3. Rescaling to range [0,1] of real-valued variables
4. Missing values:
a. Carry forward imputation (suggested here)
b. “Missing” flag to indicate imputation
5. Replicate descriptors across time
6. Do not resample to fixed timestamps

RECURRENT NEURAL NETWORKS
● Recurrent neural networks (RNNs) great for modeling temporal
structure in data
● Very flexible in input & output data
•Predict next word (language
modeling)
•Translate from English to
French (machine translation)
•Predict patient outcome from
sequence (today)
Generate beer review from
category, score

BUILDING A MODEL WITH DL4J
● Using long short-term memory networks (LSTMs,
Hochreiter & Schmidhuber, 1997)
● Training done with mini-batch stochastic gradient
descent
source: http://colah.github.io

DISTRIBUTED TRAINING WITH SPARK
● DL4J scale-out module using Spark
● Data-parallel training procedure
● Parameter averaging on master
● Deployment on CDH cluster (Spark on
YARN)

REFERENCES
● Blog article on cloudera blog
● O’Reilly AI course on deep learning for time
series
● Strata NYC 2017 Tutorial: Securely building
deep learning models for digital health data
● Machine Learning for Healthcare
Conference: http://mucmd.org/
● Lipton, et al.: A Critical Review of RNNs
● Harutyunyan, et al. Multitask Learning and
Benchmarking with Clinical Time Series
Data.

max@skymind.io
twitter: maxpumperla
github: maxpumperla

Empfohlen

Predicting Hospital ReadmissionsDerek Christensen

Final-PresentationRevanth Malay

Natural Language Processing to Curate Unstructured Electronic Health RecordsMMS Holdings

Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite

Cri big dataPutchong Uthayopas

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith

Machine learning and Internet of Things, the future of medical preventionPierre Gutierrez

High Performance Computing and the Opportunity with Cognitive TechnologyIBM Watson

Empfohlen

Predicting Hospital ReadmissionsDerek Christensen

Final-PresentationRevanth Malay

Natural Language Processing to Curate Unstructured Electronic Health RecordsMMS Holdings

Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite

Cri big dataPutchong Uthayopas

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith

Machine learning and Internet of Things, the future of medical preventionPierre Gutierrez

High Performance Computing and the Opportunity with Cognitive TechnologyIBM Watson

Computer-Aided Detection (1).pptxMohammedMasliuddin

Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols

Diabetes prediciton model ppt.pptsatvikpatil5

HPC and Precision Medicine: A New Framework for Alzheimer's and Parkinson'sinside-BigData.com

Introduction to High-performance In-memory Genome Project at HPI Matthieu Schapranow

Big Medical Data – Challenge or Potential?Matthieu Schapranow

A Platform for Integrated Genome Data AnalysisMatthieu Schapranow

Supporting high throughput high-biotechnologies in today’s research environme...Ed Dodds

Predicting Patient Outcomes in Real-Time at HCASri Ambati

Deep Learning for AI (3)Dongheon Lee

Pistoia alliance debates analytics 15-09-2015 16.00Pistoia Alliance

10th Annual Utah's Health Services Research Conference - Data Quality in Mult...Utah's Annual Health Services Research Conference

Working With Large-Scale Clinical DatasetsCraig Smail

Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...Data Con LA

Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Kevin Mader

Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...Nick Brown

Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit

LIMS in Modern Molecular Pathology by Dr. Perry MaxwellCirdan

Meaningful use stage 3 - Nalashaa capabilitiesNalashaa Healthcare Solutions

Informatics and the merging of research and quality measures with bedside careMike Hogarth, MD, FACMI, FACP

Data science in practiceMax Pumperla

Snakes on a plane - Ship your Python on enterprise machinesMax Pumperla

Weitere ähnliche Inhalte

Ähnlich wie GTC Europe 2017 - How to predict ICU mortality with digital health data

Computer-Aided Detection (1).pptxMohammedMasliuddin

Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols

Diabetes prediciton model ppt.pptsatvikpatil5

HPC and Precision Medicine: A New Framework for Alzheimer's and Parkinson'sinside-BigData.com

Introduction to High-performance In-memory Genome Project at HPI Matthieu Schapranow

Big Medical Data – Challenge or Potential?Matthieu Schapranow

A Platform for Integrated Genome Data AnalysisMatthieu Schapranow

Supporting high throughput high-biotechnologies in today’s research environme...Ed Dodds

Predicting Patient Outcomes in Real-Time at HCASri Ambati

Deep Learning for AI (3)Dongheon Lee

Pistoia alliance debates analytics 15-09-2015 16.00Pistoia Alliance

10th Annual Utah's Health Services Research Conference - Data Quality in Mult...Utah's Annual Health Services Research Conference

Working With Large-Scale Clinical DatasetsCraig Smail

Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...Data Con LA

Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Kevin Mader

Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...Nick Brown

Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit

LIMS in Modern Molecular Pathology by Dr. Perry MaxwellCirdan

Meaningful use stage 3 - Nalashaa capabilitiesNalashaa Healthcare Solutions

Informatics and the merging of research and quality measures with bedside careMike Hogarth, MD, FACMI, FACP

Ähnlich wie GTC Europe 2017 - How to predict ICU mortality with digital health data (20)

Computer-Aided Detection (1).pptx

Meaningful (meta)data at scale: removing barriers to precision medicine research

Diabetes prediciton model ppt.ppt

HPC and Precision Medicine: A New Framework for Alzheimer's and Parkinson's

Introduction to High-performance In-memory Genome Project at HPI

Big Medical Data – Challenge or Potential?

A Platform for Integrated Genome Data Analysis

Supporting high throughput high-biotechnologies in today’s research environme...

Predicting Patient Outcomes in Real-Time at HCA

Deep Learning for AI (3)

Pistoia alliance debates analytics 15-09-2015 16.00

10th Annual Utah's Health Services Research Conference - Data Quality in Mult...

Working With Large-Scale Clinical Datasets

Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...

Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...

Impact of Big Data & Artificial Intelligence in Drug Discovery & Development ...

Big Data at Geisinger Health System: Big Wins in a Short Time

LIMS in Modern Molecular Pathology by Dr. Perry Maxwell

Meaningful use stage 3 - Nalashaa capabilities

Informatics and the merging of research and quality measures with bedside care

Mehr von Max Pumperla

Data science in practiceMax Pumperla

Snakes on a plane - Ship your Python on enterprise machinesMax Pumperla

Bridging the gap in enterprise AIMax Pumperla

Hitchhiker's guide to the kerasverseMax Pumperla

From Experimentation to Production - Scala & Python APIs for DL4JMax Pumperla

Deep Recommender systems - Shibsted, OsloMax Pumperla

EclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4jMax Pumperla

Mehr von Max Pumperla (7)

Data science in practice

Snakes on a plane - Ship your Python on enterprise machines

Bridging the gap in enterprise AI

Hitchhiker's guide to the kerasverse

From Experimentation to Production - Scala & Python APIs for DL4J

Deep Recommender systems - Shibsted, Oslo

EclipseCon 2017 - Introduction to Machine Learning with Eclipse Deeplearning4j

Kürzlich hochgeladen

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Artificial Intelligence: Facts and MythsJoaquim Jorge

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

A Call to Action for Generative AI in 2024Results

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

How to convert PDF to text with Nanonetsnaman860154

A Year of the Servo Reboot: Where Are We Now?Igalia

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script

How to Troubleshoot Apps for the Modern Connected Worker

Finology Group – Insurtech Innovation Award 2024

Artificial Intelligence: Facts and Myths

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

2024: Domino Containers - The Next Step. News from the Domino Container commu...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

A Call to Action for Generative AI in 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Breaking the Kubernetes Kill Chain: Host Path Mount

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

The Codex of Business Writing Software for Real-World Solutions 2.pptx

The 7 Things I Know About Cyber Security After 25 Years | April 2024

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

How to convert PDF to text with Nanonets

A Year of the Servo Reboot: Where Are We Now?

Data Cloud, More than a CDP by Matt Robison

Scaling API-first – The story of a global engineering organization

GTC Europe 2017 - How to predict ICU mortality with digital health data

1. How to predict ICU mortality with digital health data Max Pumperla Munich, October 11th 2017

2. Founded 2014 Clients 14 Enterprises 3,500 GH Forks, 7,200 Stars 300,000+ DL4J downloads/mo. Team ~35; mostly engineers; 6 PhDs COMPANY OVERVIEW Deeplearning4j Build, train, and deploy neural networks on JVM ND4J High performance linear algebra CPU and GPU libraries DataVec Data ingestion, normalization, and vectorization

3. HEALTHCARE AT SKYMIND ● Part of “Healthy China 2030” initiative ○ Expand health service industry to $2.35 trillion ○ Pilot project with 3600 hospitals in Fuzhou ○ 20 leading industry partners around CEC ○ Fatty liver disease detection ○ Bone fracture detection ○ Other use cases to come ● Intensive care unit (ICU) mortality prediction

4. DIGITAL HEALTH DATA ADOPTION Randall Wetzel, Virtual PICU, Children’s Hospital LA “The patient is data” The patient!

5. DATA RECORDED IN EHR IN ICU ● Patient-level info (e.g., age, gender) ● Physiologic measurements (e.g., heart rate) ● Lab results (e.g., glucose) ● Clinical assessments (e.g., glasgow coma scale) ● Medications and treatments (one treatment: mechanical ventilation) ● Clinical notes ● Diagnoses ● Outcomes (one outcome: mortality)

6. PHYSIONET DATA ● Data from 12000 ICU stays published 2012 ○ Only 4000 labeled records publicly available ○ 4000 unlabeled records used for tuning during competition (we didn’t use) ○ 4000 test examples not available ● Binary outcome: in-hospital survival or mortality (~13% mortality) ● Sequences vary in length from hours to weeks ● Observations begin at time of admission, not at onset of illness General descriptors • ID • age • gender • weight • height • ICU type (cardiac, medical, surgical, trauma) Time-series data • blood pressure • blood glucose • etc.

7. PHYSIONET CHALLENGE Goal: Advances toward accurate patient-specific predictive models Task: Predict mortality from first 48 hours of data Challenges: • Long-term dependencies • Temporal outcome • Class imbalances • Measuring treatment effects • Missing data • Irregularly sampled data

8. PREPROCESSING & DATA IMPUTATION 1. Flatten data: (timestamp, variable name, value) 2. One-hot encoding of categorical variables 3. Rescaling to range [0,1] of real-valued variables 4. Missing values: a. Carry forward imputation (suggested here) b. “Missing” flag to indicate imputation 5. Replicate descriptors across time 6. Do not resample to fixed timestamps

9. RECURRENT NEURAL NETWORKS ● Recurrent neural networks (RNNs) great for modeling temporal structure in data ● Very flexible in input & output data •Predict next word (language modeling) •Translate from English to French (machine translation) •Predict patient outcome from sequence (today) Generate beer review from category, score

10. RNNS FOR TIME-SERIES DATA

11. RNNS FOR TIME-SERIES DATA

12. RNNS FOR TIME-SERIES DATA

13. RNNS FOR TIME-SERIES DATA

14. RNNS FOR TIME-SERIES DATA

15. RNNS FOR TIME-SERIES DATA

16. RNNS FOR TIME-SERIES DATA

17. BUILDING A MODEL WITH DL4J ● Using long short-term memory networks (LSTMs, Hochreiter & Schmidhuber, 1997) ● Training done with mini-batch stochastic gradient descent source: http://colah.github.io

18. DISTRIBUTED TRAINING WITH SPARK ● DL4J scale-out module using Spark ● Data-parallel training procedure ● Parameter averaging on master ● Deployment on CDH cluster (Spark on YARN)

19. EVALUATION & EXPERIMENTAL RESULTS

20. REFERENCES ● Blog article on cloudera blog ● O’Reilly AI course on deep learning for time series ● Strata NYC 2017 Tutorial: Securely building deep learning models for digital health data ● Machine Learning for Healthcare Conference: http://mucmd.org/ ● Lipton, et al.: A Critical Review of RNNs ● Harutyunyan, et al. Multitask Learning and Benchmarking with Clinical Time Series Data.

21. max@skymind.io twitter: maxpumperla github: maxpumperla