Data Analytics to Predict Hospital Readmissions

Data Analytics for Readmission:
Temporal features, predictive
modeling

Joel Saltz, Andrew Post, Doris
Gao, Sharath Cholleti, Mark
Grand: Emory
David Levine, Sam Hohmann:
UHC

Analytic Information Warehouse Project: Tools
and Analytics to Answer Questions such as:
• What fraction of patients with a given category of
principal diagnosis will be readmitted within 30 days?
• What fraction of patients with a given set of diseases
will be readmitted within 30 days?
• How does severity and time course of co-morbidities
affect readmissions?
• How can we best use history of prior hospitalizations
to predict readmissions?
• What are the medical and socio-economic
characteristics of frequently readmitted patients?
• Can we translate insight derived from our patient
population into rules that can be used to manage
patients?

Emory Clinical Data Warehouse
• EUH, EUHM and WW (inpatient encounters)
• Excludes Psych and Rehab encounters

• Encounter location (entity, pavilion, unit)
• Providers
• Discharge disposition
• Primary and secondary ICD9 codes
• Procedure codes
• DRGs
• Medication orders
• Labs
• Vitals
• Insurance status
• Geographic information

Identifying Variables Associated with 30-day
Readmits
• Problem: “Raw” variables in the CDW are difficult to use
for prediction
– Too many diagnosis codes, procedure codes
– Continuous variables (e.g., labs) require interpretation
– Temporal relationships between variables are implicit
• Solution: Transform the data into a much smaller set of
variables using heuristic knowledge
– Categorize diagnosis and procedure codes using code
hierarchies
– Classify continuous variables using standard interpretations
(e.g., high, normal, low)
– Identify temporal patterns (e.g., frequency, duration,
sequence)
– Apply standard data mining techniques

Clinical Data Warehouse/Analytic Information
Warehouse (AIW)

Cloned
periodically

Clinical Analytic
Data Warehouse Information
Derived information Warehouse
returned

The CDW/AIW Relationship

• CDW as source of clinical and administrative
data – cloned periodically (e.g., monthly)

• AIW as incubator of algorithms that generate
derived information

AIW Workflow

Cloned
periodically Periodic data
extraction
Analytic Data subset,
Multiple Databases Information mapped to a
Warehouse standard model

Calculation of
Make derived
analyses variables
available (transform)
in existing
tools

Augmented data
set

Load into multiple
output forms

Readmissions Analyses (Emory Healthcare)

Derived Variables
• 30-day readmit
• The 9 Emory Enhanced Risk Assessment Tool diagnosis categories
• UHC product lines
• “Disease indicators” (combinations of diagnosis codes, procedure codes, labs
and/or med orders that indicate a condition)
– Obesity
– Uncontrolled diabetes
– End-stage renal disease (ESRD)
– Pressure ulcer
– Sickle cell disease
• Temporal variables derived over multiple encounters
– Multiple MI
– Multiple past 30-day readmissions
– Sickle cell disease
– Diabetes/uncontrolled diabetes
– CKD/ESRD

Emory Enhanced Risk Assessment
Tool (ERAT) Diagnoses
• Diabetes
• Heart Failure
• Chronic Kidney Disease
• Chronic Obstructive Pulmonary Disease
• Acute Myocardial Infarction
• Stroke
• History of Transplant
• Cancer
• Pulmonary Hypertension

Identifying Variables Associated
with 30-day Readmits
• No variables in the CDW are broadly associated with
(or predictive of) readmits across the entire EHC
population
• Need to drill-down into subpopulations to identify
variables that are associated with readmits
• Ultimately, may be able to derive subpopulation-
specific predictive models of readmissions

3-year+ subset (2008-3/2011)

Analytic Information Warehouse

Association of CKD with 30-day Readmissions
Overall Emory Readmission Rate = 15%

CKD?

Subsequent 30-day readmit? FALSE TRUE Grand Total

30 Day Readmission 19386 7017 26403 Readmission
Rate = 21%
No 30 Day Readmission 110058 23460 133518

Grand Total 129444 30477 159921

ESRD?
Rate =27%
Grand Total 147609 12312 159921

Association of Multiple MI with 30-day Readmissions

Multiple MI?


30 Day Readmission 685 167 852


Grand Total 6457 376 6833

Readmission Rate = 44%

Uncontrolled Diabetes (total n=8696, readmit n=1844,
Readmit Rate = 21%)
Has Pressure Ulcer
Pressure ulcer?
No 30 Day Readmission 1053 260 1313 Rate = 33%
Has ESRD

ESRD?
Readmission
Grand Total 4691 1039 5730

Sickle Cell Anemia and 30-day
Readmits
Sickle Cell Anemia
Sickle Cell Anemia?
Readmission
Grand Total 158455 1466 159921

Sickle Cell Crisis
SS Crisis?
Rate = 36%
Grand Total 158731 1190 159921

Association of MRSA with 30-day
Readmissions
Overall
MRSA?
30 Day Readmission 25982 421 26403 Readmission Rate = 27%
Grand Total 158344 1577 159921

Stroke
MRSA?
30 Day Readmission 1203 16 1219 Readmission Rate=
Grand Total 5199 42 5241 38%

MI
MRSA?

Readmission Ra
29%

Use of Temporal Variables in creating
useful subsets of data (5 year dataset)
Patient Number of Number of
Population Encounters Readmissions Readmission Rate

Overall Emory 232645 34270 15%

Single MI 17992 2804 16%

Multiple MI 1355 492 36%

CKD 45664 10818 24%

>=4 readmissions 17550 9459 54%
Multiple MI and
>= 4 readmissions 900 465 52%
CKD and >=4
readmissions 6997 3606 52%

Predictive Modeling for Readmission

• Classify inpatient encounters into high, medium,
low risk groups of 30-day readmission based on
patients’ characteristics
• Data preprocessing and mapping of codes
• Predictive modeling
– Random forests (ensemble of decision trees)
– Ranking of the predictions into high to low risk
• Emory specific data sets

Random Forests

• Random forests: an ensemble of tree predictors
• Each tree is created using a random subset of the
variables in the dataset
• A large number of trees are generated
• All of them vote to classify a test example
• Reference: Leo Breiman, Random Forests, Machine
Learning, 45, 5-32, 2001

Random Forest (cont)

• Generalization error depends on the strength of
individual trees and the correlation between them
• Its accuracy is as good as AdaBoost (another robust
algorithm)
• It is relatively robust to noise and outliers
• It gives useful internal estimates of error,
correlation, strength and variable importance

Variables used in Predictive Modeling

• Age, gender, race
• Census tract data: population, population by race,
average household income, persons per household
• Primary and secondary diagnosis codes grouped
using ontologies
• Lab procedure codes grouped using ontologies
• Vitals like heart rate, blood pressure, temperature,
respiratory rate, BMI
• Medications
• Derived variables (next slide)

Derived Variables

• Disease flags
– CKD, MI, HF, COPD, Diabetes, etc.
• Medication flags
– Diabetes medication count, ACE inhibitor, beta
blocker, diuretic, inotropic agent, etc.
• Treatment flags
– Radiotherapy, chemotherapy
• Patient history
– Encounter 90 days earlier, 180 day earlier

BMI Using WHO Simple Classification (1
year subset 4/2010-3/2011)
Percent BMI Category for CKD patients Percent BMI Category for CKD female patients
with multiple readmits (n=386) with multiple readmits (n=197)

RR=1.2

“30 Day Readmission” represents encounters that were followed by a 30 day readmit
“No 30 Day Readmission” represents other encounters that were not followed by a 30 day readmit

Predictive Modeling Results with
Temporal Variable Constrained
Dataset: MI data (Emory)
All MI data and Multiple MI data

Predict 30-day
ed Risk # of # of Readmission
Data encounters Readmissions rate

All MI data High 968 360 37%

Multiple MI High 68 35 51%
All MI data (no
predictive modeling) 9674 1648 17%
Multiple MI (no

Predictive Modeling Results with
Temporal Variable Constrained
Dataset: CKD data (Emory)
All CKD data and End Stage Renal CKD
Predicted # of # of Readmission
Data Risk encounters Readmissions rate

CKD High 2284 950 42%
End Stage
Renal High 952 444 47%
All CKD (no predictive
modeling) 45664 10818 24%
End Stage Renal (no

UHC Data Analyses

• Much larger dataset
• Much less detailed information about each patient
• UHC only has coded data sent by institutions so co-
morbidity related ICD-9 codes may be missing
• Analyses across patient encounters can pick up
chronic co-morbidities that might not be coded in a
particular encounter

Missing Diagnosis Codes in UHC
dataset 10/1/2006 - 4/30/2011
Disease Number of Total number Number of Total number
Patients with of patients Encounters of encounters
missing codes with missing
in future codes
encounters
Diabetes 144806 (8.01%) 1807322 311403 (9.4%) 3300804
Heart Failure 197043 (20.1%) 976041 366926 (20.7%) 1765203
MI 171213 (21.8%) 784559 301673 (25.8%) 1168056
Sickle Cell 2870 (10.5%) 27210 11162 (9.9%) 112268

UHC
Use of Temporal Variables in Sub setting Data
Patient # Total # Readmitted Proportion of Patients
Population Encounters Patients Readmitted

MI 310954 47210 15.2%

Multiple MI 73227 29017 39.6%

Non-ESRD 13023536 1735308 13.3%

ESRD 510702 142622 27.9%

CKD 1334617 316399 23.7%

UHC
Use of Temporal Variables in Sub setting Data
Patient # Total # Readmitted Proportion of Patients
Population Patients Patients Readmitted

Diabetes 2465049 465526 18.8%
Uncontrolled
Diabetes 388417 78005 20.0%

ESRD 510702 142622 27.9%
Uncontrolled
Diabetes and
ESRD 48583 14224 29.8%

UHC “Readmission Hot Spots”
1000000

900000

800000

700000

600000

Encounters
500000
Patients

400000

300000

200000

100000

0
1 2 3 4 5 6 7 8

Conclusion

• Integrative dataset analysis can leverage patient
information gathered over many encounters
• Temporal analyses can generate derived variables
that appear to correlate with readmissions
• Hot spots appear to be an important phenomenon
and have the potential of leading to patient-level
interventions
• Predictive modeling has promise of providing
decision support
• Future analysis will look at temporal patterns of
encounters and relationship between LOS and
readmission

Data Analytics to Predict Hospital Readmissions

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Joel Saltz

Mehr von Joel Saltz (20)

Data Analytics to Predict Hospital Readmissions

Hinweis der Redaktion