SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Modeling electronic health
records with recurrent
neural networks
David C. Kale,1,2
Zachary C. Lipton,3
Josh Patterson4
STRATA - San Jose - 2016
1
University of Southern California
2
Virtual PICU, Children’s Hospital Los Angeles
3
University of California San Diego
4
Patterson Consulting
Outline
• Machine (and deep) learning
• Sequence learning with recurrent neural networks
• Clinical sequence classification using LSTM RNNs
• A real world case study using DL4J
• Conclusion and looking forward
We need functions, brah
Various Inputs and Outputs
“Time underlies many
interesting human behaviors”
{0,1}
{A,B,C…}
captions,
email mom,
fire nukes,
eject pop tart
But how do we produce functions?
We need a function for that…
One function-generator:
Programmers
Which are expensive
When/why does this
fail?
• Sometimes the correct function cannot be
encoded a priori — (what is spam?)
• The optimal solution might change over time
• Programmers are expensive
Sometimes We Need to
Learn These Functions From
Data
One Class of Learnable Functions:
Feedforward Neural Network
Artificial Neurons
Activation Functions
• At internal nodes common choices for the activation
function are the sigmoid, tanh, and ReLU functions.
• At output, activation function could be linear
(regression), sigmoid (multilabel classification) or
softmax (multi-class classification)
Training w Backpropagation
• Goal: calculate the rate of change of the loss
function with respect to each parameter (weight) in
the model
• Update the weights by gradient following:
Forward Pass
Backward Pass
Deep Networks
• Used to be difficult (seemed impossible) to train
nets with many layers of hidden layers
• TLDR: Turns out we just needed to do everything
1000x faster…
Outline
• Machine (and deep) learning
• Sequence learning with recurrent neural networks
• Clinical sequence classification using LSTM RNNs
• A real world case study using DL4J
• Conclusion and looking forward
Feedforward Nets work for
Fixed-Size Data
Less Suitable for Text
We would like to capture
temporal/sequential dynamics in
the data
• Standard approaches address sequential structure:
Markov models
Conditional Random Fields
Linear dynamical systems
• Problem:
We desire a system to learn representations,
capture nonlinear structure,
and capture long term sequential relationships.
To Model Sequential Data:
Recurrent Neural Networks
Recurrent Net (Unfolded)
Vanishing / Exploding
Gradients
LSTM Memory Cell
(Hochreiter & Schmidhuber, 1997)
Memory Cell with Forget Gate
(Gers et al., 2000)
LSTM Forward Pass
LSTM (full network)
Large Scale Architecture
Standard
supervised
learning
Image
captioning
Sentiment
analysis
Video captioning,
Natural language
translation
Part of speech
tagging
Generative models
for text
Outline
• Machine (and deep) learning
• Sequence learning with recurrent neural networks
• Clinical sequence classification using LSTM RNNs
• A real world case study using DL4J
• Conclusion and looking forward
ICU data generated in hospital
• Patient-level info (e.g., age, gender)
• Physiologic measurements (e.g., heart rate)
– Manually verified observations
– High-frequency measurements
– Waveforms
• Lab results (e.g., glucose)
• Clinical assessments (e.g., glasgow coma scale)
• Medications and treatments
• Clinical notes
• Diagnoses
• Outcomes
• Billing codes
ICU data gathered in EHR
• Patient-level info (e.g., age, gender)
• Physiologic measurements (e.g., vital signs)
– Manually verified observations
– High-frequency measurements
– Waveforms
• Lab results (e.g., glucose)
• Clinical assessments (e.g., glasgow coma scale)
• Medications and treatments
• Clinical notes
• Diagnoses (often buried in free text notes)
• Outcomes
• Billing codes
ICU data in our experiments
• Patient-level info (e.g., age, gender)
• Physiologic measurements (e.g., vital signs)
– Manually verified observations
– High-frequency measurements
– Waveforms
• Lab results (e.g., glucose)
• Clinical assessments (e.g., cognitive function)
• One treatment: mechanical ventilation
• Clinical notes
• Diagnoses (often buried in free text notes)
• Outcomes: in-hospital mortality
• Billing codes
• Sparse, irregular, unaligned sampling in time, across variables
• Sample selection bias (e.g., more likely to record abnormal)
• Entire sequences (non-random) missing
HR
RR
Admit Discharge
Challenges: sampling rates, missingness
ETCO2
Figures courtesy of Ben Marlin, UMass Amherst
HR
HR
Admit
Admit
Discharge
Discharge
Challenges: alignment, variable length
• Observations begin at time of admission, not at onset of illness
• Sequences vary in length from hours to weeks (or longer)
• Variable dynamics across patients, even with same disease
• Longterm dependencies: future state depends on earlier condition
Figures courtesy of Ben Marlin, UMass Amherst
PhysioNet Challenge 2012
• Task: predict mortality from only first 48 hours of data
• Classic models (SAPS, Apache, PRISM): experts features + regression
• Useful: quantifying illness at admission, standardized performance
• Not accurate enough to be used for decision support
• Each record includes
• patient descriptors (age, gender, weight, height, unit)
• irregular sequences of ~40 vitals, labs from first 48 hours
• One treatment variable: mechanical ventilation
• Binary outcome: in-hospital survival or mortality (~13% mortality)
• Only 4000 labeled records publicly available (“set A”)
• 4000 unlabeled records (“set B”) used for tuning during competition (we didn’t use)
• 4000 test examples (“set C”) not available
• Very challenging task: temporal outcome, unobserved treatment effects
• Winning entry score: minimum(Precision, Recall) = 0.5353
https://www.physionet.org/challenge/2012/
yt = σ(Vst + c)
st = φ(Wst-1 + Uxt + b)
PhysioNet Challenge 2012: predict in-hospital mortality from
observations x1, x2, x3, …, xT during first 48 hours of ICU stay.
Solution: recurrent neural network (RNN)*
p(ymort = 1 | x1, x2, x3, …, xT) ≈ p(ymort = 1 | sT), with st = f(st-1, xt)
• Efficient parameterization: st represents exponential # states vs. # nodes
• Can encode (“remember”) longer histories
• During learning, pass future info backward via backprop through time
sT
yT
s2
y2
s1
y1
s0
x1 x2 xT
* We actually use
a long short-term
memory network
Outline
• Machine (and deep) learning
• Sequence learning with recurrent neural networks
• Clinical sequence classification using LSTM RNNs
• A real world case study using DL4J
• Conclusion and looking forward
PhysioNet Raw Data
• Set-a
– Directory of single files
– One file per patient
– 48 hours of ICU data
• Format
– Header Line
– 6 Descriptor Values at 00:00
• Collected at Admission
– 37 Irregularly sampled columns
• Over 48 hours
Time,Parameter,Value
00:00,RecordID,132601
00:00,Age,74
00:00,Gender,1
00:00,Height,177.8
00:00,ICUType,2
00:00,Weight,75.9
00:15,pH,7.39
00:15,PaCO2,39
00:15,PaO2,137
00:56,pH,7.39
00:56,PaCO2,37
00:56,PaO2,222
01:26,Urine,250
01:26,Urine,635
01:31,DiasABP,70
01:31,FiO2,1
01:31,HR,103
01:31,MAP,94
01:31,MechVent,1
01:31,SysABP,154
01:34,HCT,24.9
01:34,Platelets,115
01:34,WBC,16.4
01:41,DiasABP,52
01:41,HR,102
01:41,MAP,65
01:41,SysABP,95
01:56,DiasABP,64
01:56,GCS,3
01:56,HR,104
01:56,MAP,85
01:56,SysABP,132
…
Preparing Input Data
• Input was 3D Tensor (3d Matrix)
– Mini-batch as first dimension
– Feature Columns as second dimension
– Timesteps as third dimension
• At Mini-batch size of 20, 43 columns, and 202 Timesteps
– We have 173,720 values per Tensor input
A Single Training
Example
0 1 2 3 4 …
albumin 0.0 0.0 0.5 0.0 0.0
alp 0.0 0.1 0.0 0.0 0.2
alt 0.0 0.0 0.0 0.9 0.0
ast 0.0 0.0 0.0 0.0 0.4
…
timesteps
Vectorcolumns
Values
albumin 0.0
alp 1.0
alt 0.5
ast 0.0
…
Vectorcolumns
A single training example gets the added dimension of
timesteps for each column
PhysioNet Timeseries Vectorization
@RELATION UnitTest_PhysioNet_Schema_ZUZUV
@DELIMITER ,
@MISSING_VALUE -1
@ATTRIBUTE recordid NOMINAL DESCRIPTOR !SKIP !ZERO
@ATTRIBUTE age NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !AVG
@ATTRIBUTE gender NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !ZERO
@ATTRIBUTE height NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !AVG
@ATTRIBUTE weight NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !AVG
@ATTRIBUTE icutype NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !ZERO
@ATTRIBUTE albumin NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS
@ATTRIBUTE alp NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS
@ATTRIBUTE alt NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS
@ATTRIBUTE ast NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS
@ATTRIBUTE bilirubin NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS
[ more … ]
Uneven Time Steps and Masking
0 1 2 3 4 …
albumin 0.0 0.0 0.5 0.0 0.0
alp 0.0 0.1 0.0 0.0 0.0
alt 0.0 0.0 0.0 0.9 0.0
ast 0.0 0.0 0.0 0.0 0.0
…
1.0 1.0 1.0 1.0 0.0 0.0
Single Input
(columns + timesteps)
Input Mask
(only timesteps)
DL4J
• “The Hadoop of Deep Learning”
– Command line driven
– Java, Scala, and Python APIs
– ASF 2.0 Licensed
• Java implementation
– Parallelization (Yarn, Spark)
– GPU support
• Also Supports multi-GPU per host
• Runtime Neutral
– Local
– Hadoop / YARN + Spark
– AWS
• https://github.com/deeplearning4j/deeplearning4j
RNNs in DL4J
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
.learningRate( learningRate )
.rmsDecay(0.95)
.seed(12345)
.regularization(true)
.l2(0.001)
.list(3)
.layer(0, new GravesLSTM.Builder().nIn(iter.inputColumns()).nOut(lstmLayerSize)
.updater(Updater.RMSPROP)
.activation("tanh").weightInit(WeightInit.DISTRIBUTION)
.dist(new UniformDistribution(-0.08, 0.08)).build())
.layer(1, new GravesLSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
.updater(Updater.RMSPROP)
.activation("tanh").weightInit(WeightInit.DISTRIBUTION)
.dist(new UniformDistribution(-0.08, 0.08)).build())
.layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT).activation("softmax”)
.updater(Updater.RMSPROP)
.nIn(lstmLayerSize).nOut(nOut).weightInit(WeightInit.DISTRIBUTION)
.dist(new UniformDistribution(-0.08, 0.08)).build())
.pretrain(false).backprop(true)
.build();
for (int epoch = 0; epoch < max_epochs; ++epoch)
net.fit(dataset_iter);
Experimental Results
• Winning entry: min(P,R) = 0.5353 (two others over 0.5)
• Trained on full set A (4K), tuned on set B (4K), tested
on set C
• All used extensively hand-engineered features
• Our best model so far: min(P,R) = 0.4907
• 60/20/20 training/validation/test split of set A
• LSTM with 2 x 300-cell layers on inputs
• Different test sets so not directly comparable
• Disadvantage: much smaller training set
• Required no feature engineering or domain knowledge
Map sequences into fixed vector representation
• Not perfectly separable in 2D but some cluster structure related to mortality
• Can repurpose “representation” for other tasks (e.g., searching for similar
patients, clustering, etc.)
Final comments
• We believe we could improve performance to well over 0.5
• overfitting: training min(P,R) > 0.6 (vs. test: 0.49)
• smaller or simpler RNN layers, adding dropout, multitask training
• Flexible NN architectures well suited to complex clinical data
• but likely will demand much larger data sets
• may be better matched to “raw” signals (e.g., waveforms)
• More general challenges
• missing (or unobserved) inputs and outcomes
• treatment effects confound predictive models
• outcomes often have temporal components
(posing as binary classification ignores that)
• You can try it out: https://github.com/jpatanooga/dl4j-rnn-timeseries-examples/
See related paper to appear at ICLR 2016: http://arxiv.org/abs/1511.03677
Questions?
Thank you for your time and attention
Gibson & Patterson. Deep Learning: A
Practitioner’s Approach. O’Reilly, Q2 2016.
Lipton, et al. A Critical Review of
RNNs. arXiv.
Lipton & Kale. Learning to Diagnose
with LSTM RNNs. ICLR 2016.
Sepp Hochreiter
Father of LSTMs,* renowned beer thief
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9 (8): 1735-1780, 1997.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Deep learning
Deep learningDeep learning
Deep learning
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
Lstm
LstmLstm
Lstm
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
 
TensorFlow Tutorial Part2
TensorFlow Tutorial Part2TensorFlow Tutorial Part2
TensorFlow Tutorial Part2
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
Time-series forecasting of indoor temperature using pre-trained Deep Neural N...
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 

Ähnlich wie Modeling Electronic Health Records with Recurrent Neural Networks

Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognition
Hưng Đặng
 
The Future of Metabolic Phenotyping Using data bandwidth to maximize N, analy...
The Future of Metabolic Phenotyping Using data bandwidth to maximize N, analy...The Future of Metabolic Phenotyping Using data bandwidth to maximize N, analy...
The Future of Metabolic Phenotyping Using data bandwidth to maximize N, analy...
InsideScientific
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
butest
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
AbhijitVenkatesh1
 

Ähnlich wie Modeling Electronic Health Records with Recurrent Neural Networks (20)

recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 
Deep Learning for EHR Data
Deep Learning for EHR DataDeep Learning for EHR Data
Deep Learning for EHR Data
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognition
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognition
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
 
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYCTed Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
 
Classification of indoor actions through deep neural networks
Classification of indoor actions through deep neural networksClassification of indoor actions through deep neural networks
Classification of indoor actions through deep neural networks
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Deep Learning for AI (3)
Deep Learning for AI (3)Deep Learning for AI (3)
Deep Learning for AI (3)
 
Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0Tsinghua invited talk_zhou_xing_v2r0
Tsinghua invited talk_zhou_xing_v2r0
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 
Statistics
StatisticsStatistics
Statistics
 
The Future of Metabolic Phenotyping Using data bandwidth to maximize N, analy...
The Future of Metabolic Phenotyping Using data bandwidth to maximize N, analy...The Future of Metabolic Phenotyping Using data bandwidth to maximize N, analy...
The Future of Metabolic Phenotyping Using data bandwidth to maximize N, analy...
 
Golden Rules of Bioinformatics
Golden Rules of BioinformaticsGolden Rules of Bioinformatics
Golden Rules of Bioinformatics
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
Learning to Search Henry Kautz
Learning to Search Henry KautzLearning to Search Henry Kautz
Learning to Search Henry Kautz
 
6.2 Jamie Zeitzer
6.2 Jamie Zeitzer6.2 Jamie Zeitzer
6.2 Jamie Zeitzer
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
 
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...
Freedom: The Promise of Telemetry Revisited - Stellar Telemetry Webinar (TSE ...
 

Mehr von Josh Patterson

Knitting boar atl_hug_jan2013_v2
Knitting boar atl_hug_jan2013_v2Knitting boar atl_hug_jan2013_v2
Knitting boar atl_hug_jan2013_v2
Josh Patterson
 

Mehr von Josh Patterson (20)

Patterson Consulting: What is Artificial Intelligence?
Patterson Consulting: What is Artificial Intelligence?Patterson Consulting: What is Artificial Intelligence?
Patterson Consulting: What is Artificial Intelligence?
 
What is Artificial Intelligence
What is Artificial IntelligenceWhat is Artificial Intelligence
What is Artificial Intelligence
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
 
Building Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4JBuilding Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4J
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015
 
Enterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4JEnterprise Deep Learning with DL4J
Enterprise Deep Learning with DL4J
 
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
Deep Learning Intro - Georgia Tech - CSE6242 - March 2015
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
 
Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242Intro to Vectorization Concepts - GaTech cse6242
Intro to Vectorization Concepts - GaTech cse6242
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
 
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNMLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
 
Knitting boar atl_hug_jan2013_v2
Knitting boar atl_hug_jan2013_v2Knitting boar atl_hug_jan2013_v2
Knitting boar atl_hug_jan2013_v2
 
Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012Knitting boar - Toronto and Boston HUGs - Nov 2012
Knitting boar - Toronto and Boston HUGs - Nov 2012
 
LA HUG Dec 2011 - Recommendation Talk
LA HUG Dec 2011 - Recommendation TalkLA HUG Dec 2011 - Recommendation Talk
LA HUG Dec 2011 - Recommendation Talk
 

Kürzlich hochgeladen

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 

Kürzlich hochgeladen (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 

Modeling Electronic Health Records with Recurrent Neural Networks

  • 1. Modeling electronic health records with recurrent neural networks David C. Kale,1,2 Zachary C. Lipton,3 Josh Patterson4 STRATA - San Jose - 2016 1 University of Southern California 2 Virtual PICU, Children’s Hospital Los Angeles 3 University of California San Diego 4 Patterson Consulting
  • 2. Outline • Machine (and deep) learning • Sequence learning with recurrent neural networks • Clinical sequence classification using LSTM RNNs • A real world case study using DL4J • Conclusion and looking forward
  • 4. Various Inputs and Outputs “Time underlies many interesting human behaviors” {0,1} {A,B,C…} captions, email mom, fire nukes, eject pop tart
  • 5. But how do we produce functions? We need a function for that…
  • 8. When/why does this fail? • Sometimes the correct function cannot be encoded a priori — (what is spam?) • The optimal solution might change over time • Programmers are expensive
  • 9. Sometimes We Need to Learn These Functions From Data
  • 10. One Class of Learnable Functions: Feedforward Neural Network
  • 12. Activation Functions • At internal nodes common choices for the activation function are the sigmoid, tanh, and ReLU functions. • At output, activation function could be linear (regression), sigmoid (multilabel classification) or softmax (multi-class classification)
  • 13. Training w Backpropagation • Goal: calculate the rate of change of the loss function with respect to each parameter (weight) in the model • Update the weights by gradient following:
  • 16. Deep Networks • Used to be difficult (seemed impossible) to train nets with many layers of hidden layers • TLDR: Turns out we just needed to do everything 1000x faster…
  • 17. Outline • Machine (and deep) learning • Sequence learning with recurrent neural networks • Clinical sequence classification using LSTM RNNs • A real world case study using DL4J • Conclusion and looking forward
  • 18. Feedforward Nets work for Fixed-Size Data
  • 20. We would like to capture temporal/sequential dynamics in the data • Standard approaches address sequential structure: Markov models Conditional Random Fields Linear dynamical systems • Problem: We desire a system to learn representations, capture nonlinear structure, and capture long term sequential relationships.
  • 21. To Model Sequential Data: Recurrent Neural Networks
  • 24. LSTM Memory Cell (Hochreiter & Schmidhuber, 1997)
  • 25. Memory Cell with Forget Gate (Gers et al., 2000)
  • 28. Large Scale Architecture Standard supervised learning Image captioning Sentiment analysis Video captioning, Natural language translation Part of speech tagging Generative models for text
  • 29. Outline • Machine (and deep) learning • Sequence learning with recurrent neural networks • Clinical sequence classification using LSTM RNNs • A real world case study using DL4J • Conclusion and looking forward
  • 30. ICU data generated in hospital • Patient-level info (e.g., age, gender) • Physiologic measurements (e.g., heart rate) – Manually verified observations – High-frequency measurements – Waveforms • Lab results (e.g., glucose) • Clinical assessments (e.g., glasgow coma scale) • Medications and treatments • Clinical notes • Diagnoses • Outcomes • Billing codes
  • 31. ICU data gathered in EHR • Patient-level info (e.g., age, gender) • Physiologic measurements (e.g., vital signs) – Manually verified observations – High-frequency measurements – Waveforms • Lab results (e.g., glucose) • Clinical assessments (e.g., glasgow coma scale) • Medications and treatments • Clinical notes • Diagnoses (often buried in free text notes) • Outcomes • Billing codes
  • 32. ICU data in our experiments • Patient-level info (e.g., age, gender) • Physiologic measurements (e.g., vital signs) – Manually verified observations – High-frequency measurements – Waveforms • Lab results (e.g., glucose) • Clinical assessments (e.g., cognitive function) • One treatment: mechanical ventilation • Clinical notes • Diagnoses (often buried in free text notes) • Outcomes: in-hospital mortality • Billing codes
  • 33. • Sparse, irregular, unaligned sampling in time, across variables • Sample selection bias (e.g., more likely to record abnormal) • Entire sequences (non-random) missing HR RR Admit Discharge Challenges: sampling rates, missingness ETCO2 Figures courtesy of Ben Marlin, UMass Amherst
  • 34. HR HR Admit Admit Discharge Discharge Challenges: alignment, variable length • Observations begin at time of admission, not at onset of illness • Sequences vary in length from hours to weeks (or longer) • Variable dynamics across patients, even with same disease • Longterm dependencies: future state depends on earlier condition Figures courtesy of Ben Marlin, UMass Amherst
  • 35. PhysioNet Challenge 2012 • Task: predict mortality from only first 48 hours of data • Classic models (SAPS, Apache, PRISM): experts features + regression • Useful: quantifying illness at admission, standardized performance • Not accurate enough to be used for decision support • Each record includes • patient descriptors (age, gender, weight, height, unit) • irregular sequences of ~40 vitals, labs from first 48 hours • One treatment variable: mechanical ventilation • Binary outcome: in-hospital survival or mortality (~13% mortality) • Only 4000 labeled records publicly available (“set A”) • 4000 unlabeled records (“set B”) used for tuning during competition (we didn’t use) • 4000 test examples (“set C”) not available • Very challenging task: temporal outcome, unobserved treatment effects • Winning entry score: minimum(Precision, Recall) = 0.5353 https://www.physionet.org/challenge/2012/
  • 36. yt = σ(Vst + c) st = φ(Wst-1 + Uxt + b) PhysioNet Challenge 2012: predict in-hospital mortality from observations x1, x2, x3, …, xT during first 48 hours of ICU stay. Solution: recurrent neural network (RNN)* p(ymort = 1 | x1, x2, x3, …, xT) ≈ p(ymort = 1 | sT), with st = f(st-1, xt) • Efficient parameterization: st represents exponential # states vs. # nodes • Can encode (“remember”) longer histories • During learning, pass future info backward via backprop through time sT yT s2 y2 s1 y1 s0 x1 x2 xT * We actually use a long short-term memory network
  • 37. Outline • Machine (and deep) learning • Sequence learning with recurrent neural networks • Clinical sequence classification using LSTM RNNs • A real world case study using DL4J • Conclusion and looking forward
  • 38. PhysioNet Raw Data • Set-a – Directory of single files – One file per patient – 48 hours of ICU data • Format – Header Line – 6 Descriptor Values at 00:00 • Collected at Admission – 37 Irregularly sampled columns • Over 48 hours Time,Parameter,Value 00:00,RecordID,132601 00:00,Age,74 00:00,Gender,1 00:00,Height,177.8 00:00,ICUType,2 00:00,Weight,75.9 00:15,pH,7.39 00:15,PaCO2,39 00:15,PaO2,137 00:56,pH,7.39 00:56,PaCO2,37 00:56,PaO2,222 01:26,Urine,250 01:26,Urine,635 01:31,DiasABP,70 01:31,FiO2,1 01:31,HR,103 01:31,MAP,94 01:31,MechVent,1 01:31,SysABP,154 01:34,HCT,24.9 01:34,Platelets,115 01:34,WBC,16.4 01:41,DiasABP,52 01:41,HR,102 01:41,MAP,65 01:41,SysABP,95 01:56,DiasABP,64 01:56,GCS,3 01:56,HR,104 01:56,MAP,85 01:56,SysABP,132 …
  • 39. Preparing Input Data • Input was 3D Tensor (3d Matrix) – Mini-batch as first dimension – Feature Columns as second dimension – Timesteps as third dimension • At Mini-batch size of 20, 43 columns, and 202 Timesteps – We have 173,720 values per Tensor input
  • 40. A Single Training Example 0 1 2 3 4 … albumin 0.0 0.0 0.5 0.0 0.0 alp 0.0 0.1 0.0 0.0 0.2 alt 0.0 0.0 0.0 0.9 0.0 ast 0.0 0.0 0.0 0.0 0.4 … timesteps Vectorcolumns Values albumin 0.0 alp 1.0 alt 0.5 ast 0.0 … Vectorcolumns A single training example gets the added dimension of timesteps for each column
  • 41. PhysioNet Timeseries Vectorization @RELATION UnitTest_PhysioNet_Schema_ZUZUV @DELIMITER , @MISSING_VALUE -1 @ATTRIBUTE recordid NOMINAL DESCRIPTOR !SKIP !ZERO @ATTRIBUTE age NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !AVG @ATTRIBUTE gender NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !ZERO @ATTRIBUTE height NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !AVG @ATTRIBUTE weight NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !AVG @ATTRIBUTE icutype NUMERIC DESCRIPTOR !ZEROMEAN_ZEROUNITVARIANCE !ZERO @ATTRIBUTE albumin NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS @ATTRIBUTE alp NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS @ATTRIBUTE alt NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS @ATTRIBUTE ast NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS @ATTRIBUTE bilirubin NUMERIC TIMESERIES !ZEROMEAN_ZEROUNITVARIANCE !PAD_TAIL_WITH_ZEROS [ more … ]
  • 42. Uneven Time Steps and Masking 0 1 2 3 4 … albumin 0.0 0.0 0.5 0.0 0.0 alp 0.0 0.1 0.0 0.0 0.0 alt 0.0 0.0 0.0 0.9 0.0 ast 0.0 0.0 0.0 0.0 0.0 … 1.0 1.0 1.0 1.0 0.0 0.0 Single Input (columns + timesteps) Input Mask (only timesteps)
  • 43. DL4J • “The Hadoop of Deep Learning” – Command line driven – Java, Scala, and Python APIs – ASF 2.0 Licensed • Java implementation – Parallelization (Yarn, Spark) – GPU support • Also Supports multi-GPU per host • Runtime Neutral – Local – Hadoop / YARN + Spark – AWS • https://github.com/deeplearning4j/deeplearning4j
  • 44. RNNs in DL4J MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1) .learningRate( learningRate ) .rmsDecay(0.95) .seed(12345) .regularization(true) .l2(0.001) .list(3) .layer(0, new GravesLSTM.Builder().nIn(iter.inputColumns()).nOut(lstmLayerSize) .updater(Updater.RMSPROP) .activation("tanh").weightInit(WeightInit.DISTRIBUTION) .dist(new UniformDistribution(-0.08, 0.08)).build()) .layer(1, new GravesLSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize) .updater(Updater.RMSPROP) .activation("tanh").weightInit(WeightInit.DISTRIBUTION) .dist(new UniformDistribution(-0.08, 0.08)).build()) .layer(2, new RnnOutputLayer.Builder(LossFunction.MCXENT).activation("softmax”) .updater(Updater.RMSPROP) .nIn(lstmLayerSize).nOut(nOut).weightInit(WeightInit.DISTRIBUTION) .dist(new UniformDistribution(-0.08, 0.08)).build()) .pretrain(false).backprop(true) .build(); for (int epoch = 0; epoch < max_epochs; ++epoch) net.fit(dataset_iter);
  • 45. Experimental Results • Winning entry: min(P,R) = 0.5353 (two others over 0.5) • Trained on full set A (4K), tuned on set B (4K), tested on set C • All used extensively hand-engineered features • Our best model so far: min(P,R) = 0.4907 • 60/20/20 training/validation/test split of set A • LSTM with 2 x 300-cell layers on inputs • Different test sets so not directly comparable • Disadvantage: much smaller training set • Required no feature engineering or domain knowledge
  • 46. Map sequences into fixed vector representation • Not perfectly separable in 2D but some cluster structure related to mortality • Can repurpose “representation” for other tasks (e.g., searching for similar patients, clustering, etc.)
  • 47. Final comments • We believe we could improve performance to well over 0.5 • overfitting: training min(P,R) > 0.6 (vs. test: 0.49) • smaller or simpler RNN layers, adding dropout, multitask training • Flexible NN architectures well suited to complex clinical data • but likely will demand much larger data sets • may be better matched to “raw” signals (e.g., waveforms) • More general challenges • missing (or unobserved) inputs and outcomes • treatment effects confound predictive models • outcomes often have temporal components (posing as binary classification ignores that) • You can try it out: https://github.com/jpatanooga/dl4j-rnn-timeseries-examples/ See related paper to appear at ICLR 2016: http://arxiv.org/abs/1511.03677
  • 48. Questions? Thank you for your time and attention Gibson & Patterson. Deep Learning: A Practitioner’s Approach. O’Reilly, Q2 2016. Lipton, et al. A Critical Review of RNNs. arXiv. Lipton & Kale. Learning to Diagnose with LSTM RNNs. ICLR 2016.
  • 49. Sepp Hochreiter Father of LSTMs,* renowned beer thief S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9 (8): 1735-1780, 1997.

Hinweis der Redaktion

  1. But how do we produce functions? We need a function for that
  2. Neural network is a graphical model of computation. Graph composed of “nodes” and “edges” which informally model “neurons” and “synapses”
  3. A little computation takes place at each node. For a node j: First; calculate a linear combination of the outputs from neurons connected by an incoming edge. we call “a” the input activation Then apply a (usually nonlinear) activation function: in this example we show the logit function.
  4. Some examples of activation functions are the sigmoidal units (sigmoid, hyperbolic tangent) And rectifier (learns faster because has more seriously nonzero derivative)
  5. Simple training by stochastic gradient descent. First randomly sample from dataset. Second, calculate derivative of some objective / loss function on an example Third, update weights to minimize loss on that example.
  6. Forward pass. First we set the value of each input node to the input vector. We calculate the output prediction and calculate that loss with respect to the true label.
  7. Then we calculate the delta value for each node. Delta is the derivative of the loss with respect to that nodes linear input.
  8. Image classification is a case where this has been dramatically successful. Each image is provided without context (think Facebook upload) and is assigned to accurate object categories.
  9. It’s harder to imagine how someone might represent an arbitrarily sized document as a fixed length vector.
  10. A recurrent net is like a feedforward neural network but augmented by the inclusion of recurrent edges. At any given “time step” computation is feedforward, but recurrent edges span adjacent time steps.
  11. Can view the recurrent neural network as a deep network with an output at each layers, and weight tying across time steps. Each hidden layer depends both on the input and the previous state’s hidden layer. Notice that it’s essentially a feedforward network in this view.
  12. With ReLU activation in hidden node, effect of input on output decays if weight on recurrent edge is small, explodes if it’s large.
  13. LSTM cells composed to form a network in the same way as ordinary hidden nodes in normal RNNs
  14. All patients were adults who were admitted for a wide variety of reasons to cardiac, medical, surgical, and trauma ICUs. ICU stays of less than 48 hours have been excluded. Up to 42 variables were recorded at least once during the first 48 hours after admission to the ICU. Not all variables are available in all cases, however. Six of these variables are general descriptors (collected on admission), and the remainder are time series, for which multiple observations may be available.
  15. One possible solution: recurrent neural nets, which combine Markov structure with hidden states consisting of learned latent features. Real valued, distributed states can encode many more states/histories for the same number of nodes (vs. discrete hidden states). Learning via backpropagation through time.
  16. All patients were adults who were admitted for a wide variety of reasons to cardiac, medical, surgical, and trauma ICUs. ICU stays of less than 48 hours have been excluded. Up to 42 variables were recorded at least once during the first 48 hours after admission to the ICU. Not all variables are available in all cases, however. Six of these variables are general descriptors (collected on admission), and the remainder are time series, for which multiple observations may be available.
  17. No alignment attempted per timestep across records, just indexing each recorded timestep (simpler way to find long term dependencies) Alternative was: (60sec) x (60min) x (48h) == 172,800 timesteps (not easy to model)
  18. Dataset statistics inform the vectorization process for ZMZUV and normalization
  19. Todo: [ notes on ZMZUV ]