Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Â
Thierry Chassaulet: Predictive risk modelling: Does technique or time matter?
1. Predictive risk modelling
Does technique or time matter?
Professor Thierry Chaussalet
Department of Business Information Systems, ECS
University of Westminster, London
www.healthcareinformatics.org.uk
Nuffield Trust, 13 June 2012
3. Motivation
⢠If patients at risk of (re-)admission could be identified and
offered early interventions then their lives and long term health
may be improved by reducing the chances of readmission, and
hopefully their cost of care reduced
⢠This has led to the development of a flurry of predictive risk
modelling tools:
o Most are based on logistic regression such as the PARR+
tool (J. Billings et al. 2006); however there exist many other
algorithms such as neural networks or decision trees
o Most are concerned with predicting the risk of (re-)
admission within the following year; however readmission
within different time intervals is also of interest
3
4. Our objectives
1. To develop and compare alternative statistical/data
mining algorithms (Logistic Regression, Classification
Tree and Neural Network) in order to predict the
likelihood of a readmission within 12 months, based on
England hospital inpatient admissions data
2. To develop and compare predictive risk models based on
the three methodologies (logistic regression,
classification trees, neural network) within shorter
timeframes, i.e. 1, 3, 6, and 9 months.
3. In addition to explore the benefit of adding a measure of
condition severity in a âPARR-likeâ model
4
5. Standard PARR Model Timeframe
Prior Prediction
hospital time period
utilisation
period Triggering
year
01/04/1999 31/03/2004
01/04/2002 31/03/2003
5
6. Data Extraction and Manipulation
⢠Data source: Hospital Episode Statistics (HES) which holds all
inpatient episodes of care.
⢠Software used to extract the data: MySQL were used to
extract a sample of just over 100,000 emergency inpatient
admissions that started and ended between 01/04/2002 and
31/03/2003. The data were then split into training (70%) and
validation (30%) data sets
⢠Software used to fit models to the data: SAS Enterprise
Miner was used to fit models to the extracted data [but SPSS
and open source software could be used e.g. R, Rapid Miner,
etc.].
6
7. Independent variables
The following independent variables were used in the models
âAge group at triggering admission, gender and ethnic origin
âPresence of certain diseases/conditions in the triggering
admission or in the previous three years.
âThe summed total of disease severity calculated by the
Charlson Comorbidity Severity Index. Determined by looking
at all diseases/conditions that the patient had over the previous
three years. The list of diseases used in this measure are on the
next slide
âVariables like the number of emergency inpatient admissions in
the previous three years.
7
9. Effect of severity index
Patients are more likely to have a readmission if they
have a high severity index total score
9
10. The methods used
⢠Logistic Regression
1
variable; will lead to: đđ( đ đ ) =
o Somewhat like regression but with binary dependent
1 + đđ âďż˝đ˝đ˝0 +â đđ đ˝đ˝ đđ đđ đđ ďż˝
1
⢠Decision Trees
o Partitions the independent variables into a set of
homogeneous regions
o Popular algorithms are CART, CHAID, C4.5
o C4.5 uses the idea of information gain (entropy)
⢠Neural Network
o Aims at mimicking the brain with many neurons in
hidden layers that connect through âsynapsesâ
o Mathematically is a generalisation of logistic regression
10
11. Logistic Regression - Results
⢠Most significant variables
o Number of emergency admissions within the
previous 3 years
o Age 75 plus at admission
o Number of emergency admissions within the
previous 6 months
o Average number of episodes per emergency
admission spell
o Reference condition in the previous 3 years
o The severity index is also significant
11
12. Decision tree â Results
Factor Factor name in tree Relative
importance
in model
The number of emergency admissions NumberOfEMAD_within_3years 1.000
within the previous 3 years
The severity index total score for Severity_Index 0.246
conditions in the current admission and
in the previous 3 years
The number of emergency admissions NumberOfEMAD_within_6months 0.068
within the previous 6 months
Whether the patient had an emergency COPD 0.062
admission due to COPD in the previous
3 years
Whether the patient had a reference Ref_condition_prev_3_yrs 0.060
condition in the current admission or in
the previous 3 years
These factors were also found significant with logistic regression,
however factors such as age, ethnic origin and some conditions were
significant in the regression model but are not significant in the tree model
12
13. Decision Trees âResults
If a patient had 2 emergency admissions within the previous 3 years and a severity index of 4 or
more in the previous 3 years then s/he is predicted to have a emergency readmission within 12
months. 62.3% of the 780 patients in this group who were predicted to have a readmission
actually had a readmission.
13
14. Neural Network
Number of hidden layers 1
Number of hidden neurons 9
Network architecture Multilayer Perceptron
Due to their complex structure neural network results are a lot more
difficult to interpret
9 nodes
15. Neural Network vs Logistic
Regression Results
Percentage of patients flagged by the neural network and logistic regression
models to have a emergency readmission within 12 months that did have a
readmission
100%
Logistic Regression
Percentage of Flagged Patients who were Readmitted
90%
80%
70%
60%
50%
40%
Neural Network
30%
20%
10%
0%
40 45 50 55 60 65 70 75 80 85 90 95
Risk Score Threshold
This project - Training data (Neural network model) This project - Validation data (Neural network model)
This project - Training data (Logistic regression model) This project - Validation data (Logistic regression model)
2006 PARR paper 15
16. Algorithms comparison for different timeframes
Percentage accuracy in classification of the three modelling
techniques at predicting readmission within 1, 3, 6, 9 and 12 months
12 months
9 months
Readmission within
6 months
3 months
1 month
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Percentage accuracy in classification (%)
Neural network model Logistic regression model Classification tree model
16
17. Algorithms comparison for different timeframes
Positive predictive values of the three modelling techniques for predicting
readmission within 1, 3, 6, 9 and 12 months
12 months
9 months
Readmission within
6 months
3 months
1 month
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Positive predictive value (%)
Neural network model Logistic regression model Classification tree model
17
18. Conclusions (1)
⢠The accuracy (and PPV) in classification of the three models
predicting readmission within 12 months is almost identical
Logistic Regression Classification Tree Neural Network
Accuracy 71.5% 71.6% 72.1%
PPV 67.4% 66.8% 66.2%
Sensitivity 40.1% 41.7% 45.4%
⢠Neural networks were the best models for accurately identifying
the highest number of actual readmissions with a sensitivity of
45.4% , possibly due to their nonlinear nature
18
19. Conclusions (2)
⢠Number of emergency admissions in the three years prior to the
triggering emergency admission is the strongest factor in
predicting readmission within 12 months in ALL models. So is the
number of emergency admissions in the previous 6 months.
⢠Severity and number of conditions that a patient has also plays a
role in accurately predicting readmission in all the models, with
those patients who have a reference condition or COPD being
more likely to have a readmission.
19
20. Conclusions (3)
⢠Although the neural network model gives good results at higher risk
scores, the results of the technique are much more difficult to
explain to a non technical audience.
⢠Classification trees have a strong advantage as they allow us to
visualise the important factors immediately.
⢠However, classification trees are not designed to allocate
probabilities of readmission for individuals as patients are sorted
into groups and then the groups are allocated with a probability.
⢠For these reasons, Logistic Regression often remains the method
which gives the most easily understandable results to a non
technical audience.
20
21. Conclusions (4)
⢠As the prediction interval to readmission decreases the performance
of the logistic regression model in terms of PPV decreases, while the
other two models retain relatively stable values irrespective of the
timeframe to readmission. This is particularly true of decision trees.
⢠This study suggests that alternative algorithms have great potential
in terms of performance, ease of use, and robustness over timeframe
⢠This also opens the door for exploring the benefits of newer more
sophisticated machine learning type of techniques: support vector
machines, fuzzy approaches, etc.
⢠However greater prediction improvement would probably be
achieved with better and more comprehensive data (e.g. GP, social
care, etc.) 21