Thierry Chassaulet: Predictive risk modelling: Does technique or time matter?

Predictive risk modelling
Does technique or time matter?

Professor Thierry Chaussalet
Department of Business Information Systems, ECS
University of Westminster, London
www.healthcareinformatics.org.uk

Nuffield Trust, 13 June 2012

Acknowledgements

- Ian Winkworth, who conducted the analysis

- The Nuffield Trust for advice throughout

2

Motivation

• If patients at risk of (re-)admission could be identified and
offered early interventions then their lives and long term health
may be improved by reducing the chances of readmission, and
hopefully their cost of care reduced
• This has led to the development of a flurry of predictive risk
modelling tools:
o Most are based on logistic regression such as the PARR+
tool (J. Billings et al. 2006); however there exist many other
algorithms such as neural networks or decision trees
o Most are concerned with predicting the risk of (re-)
admission within the following year; however readmission
within different time intervals is also of interest
3

Our objectives

1. To develop and compare alternative statistical/data
mining algorithms (Logistic Regression, Classification
Tree and Neural Network) in order to predict the
likelihood of a readmission within 12 months, based on
England hospital inpatient admissions data
2. To develop and compare predictive risk models based on
the three methodologies (logistic regression,
classification trees, neural network) within shorter
timeframes, i.e. 1, 3, 6, and 9 months.
3. In addition to explore the benefit of adding a measure of
condition severity in a “PARR-like” model

4

Standard PARR Model Timeframe

Prior Prediction
hospital time period
utilisation
period Triggering
year
01/04/1999 31/03/2004

01/04/2002 31/03/2003
5

Data Extraction and Manipulation

• Data source: Hospital Episode Statistics (HES) which holds all
inpatient episodes of care.
• Software used to extract the data: MySQL were used to
extract a sample of just over 100,000 emergency inpatient
admissions that started and ended between 01/04/2002 and
31/03/2003. The data were then split into training (70%) and
validation (30%) data sets
• Software used to fit models to the data: SAS Enterprise
Miner was used to fit models to the extracted data [but SPSS
and open source software could be used e.g. R, Rapid Miner,
etc.].
6

Independent variables

The following independent variables were used in the models
–Age group at triggering admission, gender and ethnic origin
–Presence of certain diseases/conditions in the triggering
admission or in the previous three years.
–The summed total of disease severity calculated by the
Charlson Comorbidity Severity Index. Determined by looking
at all diseases/conditions that the patient had over the previous
three years. The list of diseases used in this measure are on the
next slide
–Variables like the number of emergency inpatient admissions in
the previous three years.
7

Condition Charlson ICD 10 codes
Comorbidity
Severity
Index
Ischaemic heart disease 1 I21-I25
Congestive heart failure (CHF) 1 I50, I110, I130
Peripheral vascular disease 1 I700-I702, I71-I72, I731-I739,
(PVD) I709, I792, I771, R2
Cerebrovascular disease (CVD) 1 I60-I67, I69, G45, H340, R298,
R470
Mental illness 1 F00-F09, F17-F69, F90-F99
Chronic obstructive pulmonary 1 J43-J44
disease (COPD)
Connective tissue 1 M32-M36, M05, M06, M08, I39,
disease/rheumatoid arthritis I528, I418, I328, J990, G737
(CTDRA)
Peptic Ulcer 1 K25-K28
Mild Liver Disease 1 K703, K743-K746, K760, K769
Diabetes without complications 1 E100, E10l, E106, E108, E109,
E110, E111, E116, E118, E119,
E120, E121, El26, E128, El29,
E130, E131, E136, E138, E139,
E140, E141, E146, E148, E149
Hemiplegia 2 G041, G114, G801, G802, G81,
G82, G830-G834, G839
Renal Failure 2 N18-N20, Z940
Diabetes with complications 2 E102-E105, E107, E112, E115,
E117, E122-E125, E127, E132-
E135, E137, E142-E145, E147
Cancer 2 All codes beginning with C, D00-
D48
Moderate to severe Liver 3 I850, I859, I864, I982, K704,
Disease K711, K721, K729, K765, K766,
K767
Metastatic Cancer 6 C77-C80 8
HIV/AIDS 6 B20-B24

Effect of severity index

Patients are more likely to have a readmission if they
have a high severity index total score

9

The methods used
• Logistic Regression

1
variable; will lead to: 𝑃𝑃( 𝑅𝑅) =
o Somewhat like regression but with binary dependent
1 + 𝑒𝑒 −�𝛽𝛽0 +∑ 𝑛𝑛 𝛽𝛽 𝑛𝑛 𝑋𝑋 𝑛𝑛 �
1

• Decision Trees
o Partitions the independent variables into a set of
homogeneous regions
o Popular algorithms are CART, CHAID, C4.5
o C4.5 uses the idea of information gain (entropy)
• Neural Network
o Aims at mimicking the brain with many neurons in
hidden layers that connect through “synapses”
o Mathematically is a generalisation of logistic regression
10

Logistic Regression - Results

• Most significant variables
o Number of emergency admissions within the
previous 3 years
o Age 75 plus at admission
o Number of emergency admissions within the
previous 6 months
o Average number of episodes per emergency
admission spell
o Reference condition in the previous 3 years
o The severity index is also significant

11

Decision tree – Results
Factor Factor name in tree Relative
importance
in model
The number of emergency admissions NumberOfEMAD_within_3years 1.000
within the previous 3 years
The severity index total score for Severity_Index 0.246
conditions in the current admission and
in the previous 3 years
The number of emergency admissions NumberOfEMAD_within_6months 0.068
within the previous 6 months
Whether the patient had an emergency COPD 0.062
admission due to COPD in the previous
3 years
Whether the patient had a reference Ref_condition_prev_3_yrs 0.060
condition in the current admission or in
the previous 3 years

These factors were also found significant with logistic regression,
however factors such as age, ethnic origin and some conditions were
significant in the regression model but are not significant in the tree model

12

Decision Trees –Results

If a patient had 2 emergency admissions within the previous 3 years and a severity index of 4 or
more in the previous 3 years then s/he is predicted to have a emergency readmission within 12
months. 62.3% of the 780 patients in this group who were predicted to have a readmission
actually had a readmission.
13

Neural Network
Number of hidden layers 1
Number of hidden neurons 9
Network architecture Multilayer Perceptron

Due to their complex structure neural network results are a lot more
difficult to interpret
9 nodes

Neural Network vs Logistic
Regression Results
Percentage of patients flagged by the neural network and logistic regression
models to have a emergency readmission within 12 months that did have a
readmission
100%
Logistic Regression
Percentage of Flagged Patients who were Readmitted

90%

80%

70%

60%

50%

40%
Neural Network
30%

20%

10%

0%
40 45 50 55 60 65 70 75 80 85 90 95

Risk Score Threshold

This project - Training data (Neural network model) This project - Validation data (Neural network model)
This project - Training data (Logistic regression model) This project - Validation data (Logistic regression model)
2006 PARR paper 15

Algorithms comparison for different timeframes
Percentage accuracy in classification of the three modelling
techniques at predicting readmission within 1, 3, 6, 9 and 12 months

12 months

9 months
Readmission within

6 months

3 months

1 month

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Percentage accuracy in classification (%)

Neural network model Logistic regression model Classification tree model
16

Algorithms comparison for different timeframes
Positive predictive values of the three modelling techniques for predicting
readmission within 1, 3, 6, 9 and 12 months

12 months

9 months
Readmission within

6 months

3 months

1 month

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Positive predictive value (%)

Neural network model Logistic regression model Classification tree model
17

Conclusions (1)

• The accuracy (and PPV) in classification of the three models
predicting readmission within 12 months is almost identical

Logistic Regression Classification Tree Neural Network
Accuracy 71.5% 71.6% 72.1%
PPV 67.4% 66.8% 66.2%
Sensitivity 40.1% 41.7% 45.4%

• Neural networks were the best models for accurately identifying
the highest number of actual readmissions with a sensitivity of
45.4% , possibly due to their nonlinear nature

18

Conclusions (2)

• Number of emergency admissions in the three years prior to the
triggering emergency admission is the strongest factor in
predicting readmission within 12 months in ALL models. So is the
number of emergency admissions in the previous 6 months.

• Severity and number of conditions that a patient has also plays a
role in accurately predicting readmission in all the models, with
those patients who have a reference condition or COPD being
more likely to have a readmission.

19

Conclusions (3)
• Although the neural network model gives good results at higher risk
scores, the results of the technique are much more difficult to
explain to a non technical audience.

• Classification trees have a strong advantage as they allow us to
visualise the important factors immediately.

• However, classification trees are not designed to allocate
probabilities of readmission for individuals as patients are sorted
into groups and then the groups are allocated with a probability.

• For these reasons, Logistic Regression often remains the method
which gives the most easily understandable results to a non
technical audience.
20

Conclusions (4)
• As the prediction interval to readmission decreases the performance
of the logistic regression model in terms of PPV decreases, while the
other two models retain relatively stable values irrespective of the
timeframe to readmission. This is particularly true of decision trees.

• This study suggests that alternative algorithms have great potential
in terms of performance, ease of use, and robustness over timeframe

• This also opens the door for exploring the benefits of newer more
sophisticated machine learning type of techniques: support vector
machines, fuzzy approaches, etc.

• However greater prediction improvement would probably be
achieved with better and more comprehensive data (e.g. GP, social
care, etc.) 21

Thierry Chassaulet: Predictive risk modelling: Does technique or time matter?

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (11)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Thierry Chassaulet: Predictive risk modelling: Does technique or time matter?

Ähnlich wie Thierry Chassaulet: Predictive risk modelling: Does technique or time matter? (20)

Mehr von Nuffield Trust

Mehr von Nuffield Trust (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Thierry Chassaulet: Predictive risk modelling: Does technique or time matter?