SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Data Analytics for Readmission:
Temporal features, predictive
modeling


 Joel Saltz, Andrew Post, Doris
 Gao, Sharath Cholleti, Mark
 Grand: Emory
 David Levine, Sam Hohmann:
 UHC
Analytic Information Warehouse Project: Tools
and Analytics to Answer Questions such as:
• What fraction of patients with a given category of
  principal diagnosis will be readmitted within 30 days?
• What fraction of patients with a given set of diseases
  will be readmitted within 30 days?
• How does severity and time course of co-morbidities
  affect readmissions?
• How can we best use history of prior hospitalizations
  to predict readmissions?
• What are the medical and socio-economic
  characteristics of frequently readmitted patients?
• Can we translate insight derived from our patient
  population into rules that can be used to manage
  patients?
Emory Clinical Data Warehouse
• EUH, EUHM and WW (inpatient encounters)
• Excludes Psych and Rehab encounters

•   Encounter location (entity, pavilion, unit)
•   Providers
•   Discharge disposition
•   Primary and secondary ICD9 codes
•   Procedure codes
•   DRGs
•   Medication orders
•   Labs
•   Vitals
•   Insurance status
•   Geographic information
Identifying Variables Associated with 30-day
Readmits
• Problem: “Raw” variables in the CDW are difficult to use
  for prediction
   – Too many diagnosis codes, procedure codes
   – Continuous variables (e.g., labs) require interpretation
   – Temporal relationships between variables are implicit
• Solution: Transform the data into a much smaller set of
  variables using heuristic knowledge
   – Categorize diagnosis and procedure codes using code
     hierarchies
   – Classify continuous variables using standard interpretations
     (e.g., high, normal, low)
   – Identify temporal patterns (e.g., frequency, duration,
     sequence)
   – Apply standard data mining techniques
Clinical Data Warehouse/Analytic Information
  Warehouse (AIW)

                             Cloned
                             periodically

               Clinical                             Analytic
           Data Warehouse                         Information
                            Derived information   Warehouse
                            returned


            The CDW/AIW Relationship

• CDW as source of clinical and administrative
data – cloned periodically (e.g., monthly)

• AIW as incubator of algorithms that generate
derived information
AIW Workflow

                        Cloned
                        periodically                 Periodic data
                                                     extraction
                                         Analytic                           Data subset,
   Multiple Databases                  Information                          mapped to a
                                       Warehouse                          standard model

                                                                                    Calculation of
Make                                                                                derived
analyses                                                                            variables
available                                                                           (transform)
in existing
tools




                                                                           Augmented data
                                                                                set

                                                          Load into multiple
                                                          output forms
Readmissions Analyses (Emory Healthcare)
Derived Variables
•   30-day readmit
•   The 9 Emory Enhanced Risk Assessment Tool diagnosis categories
•   UHC product lines
•   “Disease indicators” (combinations of diagnosis codes, procedure codes, labs
    and/or med orders that indicate a condition)
     – Obesity
     – Uncontrolled diabetes
     – End-stage renal disease (ESRD)
     – Pressure ulcer
     – Sickle cell disease
•   Temporal variables derived over multiple encounters
     – Multiple MI
     – Multiple past 30-day readmissions
     – Sickle cell disease
     – Diabetes/uncontrolled diabetes
     – CKD/ESRD
Emory Enhanced Risk Assessment
Tool (ERAT) Diagnoses
• Diabetes
• Heart Failure
• Chronic Kidney Disease
• Chronic Obstructive Pulmonary Disease
• Acute Myocardial Infarction
• Stroke
• History of Transplant
• Cancer
• Pulmonary Hypertension
Identifying Variables Associated
with 30-day Readmits
• No variables in the CDW are broadly associated with
  (or predictive of) readmits across the entire EHC
  population
• Need to drill-down into subpopulations to identify
  variables that are associated with readmits
• Ultimately, may be able to derive subpopulation-
  specific predictive models of readmissions
3-year+ subset (2008-3/2011)




         Analytic Information Warehouse
Association of CKD with 30-day Readmissions
      Overall Emory Readmission Rate = 15%

                                CKD?

Subsequent 30-day readmit?                    FALSE TRUE Grand Total

30 Day Readmission                             19386 7017       26403    Readmission
                                                                         Rate = 21%
No 30 Day Readmission                         110058 23460     133518

Grand Total                                   129444 30477     159921


                                   ESRD?
Subsequent 30-day readmit?                      FALSE TRUE Grand Total
30 Day Readmission                                23091 3312    26403    Readmission
                                                                         Rate =27%
No 30 Day Readmission                           124518 9000    133518
Grand Total                                     147609 12312   159921
                             Analytic Information Warehouse
Association of Multiple MI with 30-day Readmissions




                             Multiple MI?

Subsequent 30-day readmit?             FALSE TRUE Grand Total

30 Day Readmission                          685     167       852

No 30 Day Readmission                       5772    209      5981

Grand Total                                 6457    376      6833

                                                   Readmission Rate = 44%
Uncontrolled Diabetes (total n=8696, readmit n=1844,
                  Readmit Rate = 21%)
                             Has Pressure Ulcer
                                   Pressure ulcer?
Subsequent 30-day readmit?                    FALSE TRUE Grand Total
30 Day Readmission                                387    128     515 Readmission
No 30 Day Readmission                             1053   260    1313 Rate = 33%
Grand Total                                       1440   388    1828
                                Has ESRD

                                 ESRD?
Subsequent 30-day readmit?                  FALSE TRUE Grand Total
30 Day Readmission                             1200      327    1527
                                                                     Readmission
No 30 Day Readmission                          3491      712    4203 Rate = 32%
Grand Total                                    4691 1039        5730
Sickle Cell Anemia and 30-day
                      Readmits
                             Sickle Cell Anemia
                             Sickle Cell Anemia?
Subsequent 30-day readmit?   FALSE                   TRUE Grand Total
30 Day Readmission                          25905      498      26403
                                                                        Readmission
No 30 Day Readmission                       132550     968     133518   Rate = 34%
Grand Total                                 158455 1466        159921

                              Sickle Cell Crisis
                               SS Crisis?
Subsequent 30-day readmit?     FALSE                 TRUE Grand Total
30 Day Readmission                           25972     431      26403   Readmission
                                                                        Rate = 36%
No 30 Day Readmission                       132759     759     133518
Grand Total                                 158731 1190        159921
Association of MRSA with 30-day
                      Readmissions
                             Overall
                                  MRSA?
Subsequent 30-day readmit?        FALSE            TRUE Grand Total
30 Day Readmission                            25982 421       26403    Readmission Rate = 27%
No 30 Day Readmission                        132362 1156     133518
Grand Total                                  158344 1577     159921



                                                          Stroke
                                                            MRSA?
                             Subsequent 30-day readmit?     FALSE            TRUE Grand Total
                             30 Day Readmission                         1203    16       1219   Readmission Rate=
                             No 30 Day Readmission                      3996    26       4022
                             Grand Total                                5199    42       5241   38%

                                                                                         MI
                                                                                          MRSA?
                                                            Subsequent 30-day readmit?    FALSE        TRUE Grand Total
                                                            30 Day Readmission                     836    16        852
                                                            No 30 Day Readmission                 5942    39       5981
                                                            Grand Total                           6778    55       6833

                                                                                                        Readmission Ra
                                                                                                        29%
Use of Temporal Variables in creating
useful subsets of data (5 year dataset)
 Patient             Number of Number of
 Population          Encounters Readmissions   Readmission Rate

 Overall Emory           232645       34270               15%

 Single MI                17992       2804                16%

 Multiple MI               1355        492                36%

 CKD                      45664       10818               24%

 >=4 readmissions         17550       9459                54%
 Multiple MI and
 >= 4 readmissions          900        465                52%
 CKD and >=4
 readmissions              6997       3606                52%
Predictive Modeling for Readmission

• Classify inpatient encounters into high, medium,
  low risk groups of 30-day readmission based on
  patients’ characteristics
• Data preprocessing and mapping of codes
• Predictive modeling
  – Random forests (ensemble of decision trees)
  – Ranking of the predictions into high to low risk
• Emory specific data sets
Random Forests

• Random forests: an ensemble of tree predictors
• Each tree is created using a random subset of the
  variables in the dataset
• A large number of trees are generated
• All of them vote to classify a test example
• Reference: Leo Breiman, Random Forests, Machine
  Learning, 45, 5-32, 2001
Random Forest (cont)

• Generalization error depends on the strength of
  individual trees and the correlation between them
• Its accuracy is as good as AdaBoost (another robust
  algorithm)
• It is relatively robust to noise and outliers
• It gives useful internal estimates of error,
  correlation, strength and variable importance
Variables used in Predictive Modeling

• Age, gender, race
• Census tract data: population, population by race,
  average household income, persons per household
• Primary and secondary diagnosis codes grouped
  using ontologies
• Lab procedure codes grouped using ontologies
• Vitals like heart rate, blood pressure, temperature,
  respiratory rate, BMI
• Medications
• Derived variables (next slide)
Derived Variables

• Disease flags
   – CKD, MI, HF, COPD, Diabetes, etc.
• Medication flags
   – Diabetes medication count, ACE inhibitor, beta
     blocker, diuretic, inotropic agent, etc.
• Treatment flags
   – Radiotherapy, chemotherapy
• Patient history
   – Encounter 90 days earlier, 180 day earlier
BMI Using WHO Simple Classification (1
         year subset 4/2010-3/2011)
Percent BMI Category for CKD patients               Percent BMI Category for CKD female patients
   with multiple readmits (n=386)                          with multiple readmits (n=197)




                                                                                          RR=1.2


“30 Day Readmission” represents encounters that were followed by a 30 day readmit
“No 30 Day Readmission” represents other encounters that were not followed by a 30 day readmit
                                    Analytic Information Warehouse
Predictive Modeling Results with
 Temporal Variable Constrained
   Dataset: MI data (Emory)
                 All MI data and Multiple MI data

               Predict                                 30-day
               ed Risk      # of          # of       Readmission
    Data                 encounters   Readmissions      rate

 All MI data     High       968           360           37%

 Multiple MI     High       68             35           51%
    All MI data (no
 predictive modeling)      9674          1648           17%
   Multiple MI (no
 predictive modeling)       376           167           44%
Predictive Modeling Results with
 Temporal Variable Constrained
   Dataset: CKD data (Emory)
              All CKD data and End Stage Renal CKD
             Predicted      # of          # of     Readmission
   Data         Risk    encounters Readmissions       rate

    CKD        High       2284          950           42%
 End Stage
   Renal       High       952           444           47%
All CKD (no predictive
      modeling)          45664         10818          24%
 End Stage Renal (no
 predictive modeling)     3312         12312          27%
UHC Data Analyses

• Much larger dataset
• Much less detailed information about each patient
• UHC only has coded data sent by institutions so co-
  morbidity related ICD-9 codes may be missing
• Analyses across patient encounters can pick up
  chronic co-morbidities that might not be coded in a
  particular encounter
Missing Diagnosis Codes in UHC
          dataset 10/1/2006 - 4/30/2011
Disease         Number of       Total number   Number of       Total number
                Patients with   of patients    Encounters      of encounters
                missing codes                  with missing
                in future                      codes
                encounters
Diabetes        144806 (8.01%) 1807322         311403 (9.4%)   3300804
Heart Failure   197043 (20.1%) 976041          366926 (20.7%) 1765203
MI              171213 (21.8%) 784559          301673 (25.8%) 1168056
Sickle Cell     2870 (10.5%)    27210          11162 (9.9%)    112268
UHC
Use of Temporal Variables in Sub setting Data
Patient       # Total    # Readmitted   Proportion of Patients
Population    Encounters Patients       Readmitted

MI                310954       47210               15.2%

Multiple MI        73227       29017               39.6%



Non-ESRD        13023536      1735308              13.3%

ESRD              510702      142622               27.9%

CKD              1334617      316399               23.7%
UHC
Use of Temporal Variables in Sub setting Data
Patient        # Total       # Readmitted   Proportion of Patients
Population     Patients      Patients       Readmitted

Diabetes          2465049         465526               18.8%
Uncontrolled
Diabetes            388417         78005               20.0%

ESRD                510702        142622               27.9%
Uncontrolled
Diabetes and
ESRD                 48583         14224               29.8%
Readmission Hot Spots
UHC “Readmission Hot Spots”
1000000


 900000


 800000


 700000


 600000


                                          Encounters
 500000
                                          Patients

 400000


 300000


 200000


 100000


      0
          1   2   3   4   5   6   7   8
Conclusion

• Integrative dataset analysis can leverage patient
  information gathered over many encounters
• Temporal analyses can generate derived variables
  that appear to correlate with readmissions
• Hot spots appear to be an important phenomenon
  and have the potential of leading to patient-level
  interventions
• Predictive modeling has promise of providing
  decision support
• Future analysis will look at temporal patterns of
  encounters and relationship between LOS and
  readmission

Weitere ähnliche Inhalte

Mehr von Joel Saltz

AI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersAI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersJoel Saltz
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillanceJoel Saltz
 
Learning, Training,  Classification,  Common Sense and Exascale Computing
Learning, Training,  Classification,  Common Sense and Exascale ComputingLearning, Training,  Classification,  Common Sense and Exascale Computing
Learning, Training,  Classification,  Common Sense and Exascale ComputingJoel Saltz
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataJoel Saltz
 
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Joel Saltz
 
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure CancerExtreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure CancerJoel Saltz
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeJoel Saltz
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeJoel Saltz
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineJoel Saltz
 
Machine Learning and Deep Contemplation of Data
Machine Learning and Deep Contemplation of DataMachine Learning and Deep Contemplation of Data
Machine Learning and Deep Contemplation of DataJoel Saltz
 
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...Joel Saltz
 
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...Joel Saltz
 
Generation and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeGeneration and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeJoel Saltz
 
Big Data and Extreme Scale Computing
Big Data and Extreme Scale Computing Big Data and Extreme Scale Computing
Big Data and Extreme Scale Computing Joel Saltz
 
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...Joel Saltz
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Joel Saltz
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataJoel Saltz
 
High Dimensional Fused-Informatics
High Dimensional Fused-InformaticsHigh Dimensional Fused-Informatics
High Dimensional Fused-InformaticsJoel Saltz
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Joel Saltz
 
Data Science, Big Data and You
Data Science, Big Data and YouData Science, Big Data and You
Data Science, Big Data and YouJoel Saltz
 

Mehr von Joel Saltz (20)

AI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersAI and whole slide imaging biomarkers
AI and whole slide imaging biomarkers
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer Surveillance
 
Learning, Training,  Classification,  Common Sense and Exascale Computing
Learning, Training,  Classification,  Common Sense and Exascale ComputingLearning, Training,  Classification,  Common Sense and Exascale Computing
Learning, Training,  Classification,  Common Sense and Exascale Computing
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
 
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
 
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure CancerExtreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase Change
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase Change
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision Medicine
 
Machine Learning and Deep Contemplation of Data
Machine Learning and Deep Contemplation of DataMachine Learning and Deep Contemplation of Data
Machine Learning and Deep Contemplation of Data
 
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
 
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
 
Generation and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeGeneration and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology Phenotype
 
Big Data and Extreme Scale Computing
Big Data and Extreme Scale Computing Big Data and Extreme Scale Computing
Big Data and Extreme Scale Computing
 
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor Data
 
High Dimensional Fused-Informatics
High Dimensional Fused-InformaticsHigh Dimensional Fused-Informatics
High Dimensional Fused-Informatics
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
 
Data Science, Big Data and You
Data Science, Big Data and YouData Science, Big Data and You
Data Science, Big Data and You
 

Data Analytics to Predict Hospital Readmissions

  • 1. Data Analytics for Readmission: Temporal features, predictive modeling Joel Saltz, Andrew Post, Doris Gao, Sharath Cholleti, Mark Grand: Emory David Levine, Sam Hohmann: UHC
  • 2. Analytic Information Warehouse Project: Tools and Analytics to Answer Questions such as: • What fraction of patients with a given category of principal diagnosis will be readmitted within 30 days? • What fraction of patients with a given set of diseases will be readmitted within 30 days? • How does severity and time course of co-morbidities affect readmissions? • How can we best use history of prior hospitalizations to predict readmissions? • What are the medical and socio-economic characteristics of frequently readmitted patients? • Can we translate insight derived from our patient population into rules that can be used to manage patients?
  • 3. Emory Clinical Data Warehouse • EUH, EUHM and WW (inpatient encounters) • Excludes Psych and Rehab encounters • Encounter location (entity, pavilion, unit) • Providers • Discharge disposition • Primary and secondary ICD9 codes • Procedure codes • DRGs • Medication orders • Labs • Vitals • Insurance status • Geographic information
  • 4. Identifying Variables Associated with 30-day Readmits • Problem: “Raw” variables in the CDW are difficult to use for prediction – Too many diagnosis codes, procedure codes – Continuous variables (e.g., labs) require interpretation – Temporal relationships between variables are implicit • Solution: Transform the data into a much smaller set of variables using heuristic knowledge – Categorize diagnosis and procedure codes using code hierarchies – Classify continuous variables using standard interpretations (e.g., high, normal, low) – Identify temporal patterns (e.g., frequency, duration, sequence) – Apply standard data mining techniques
  • 5. Clinical Data Warehouse/Analytic Information Warehouse (AIW) Cloned periodically Clinical Analytic Data Warehouse Information Derived information Warehouse returned The CDW/AIW Relationship • CDW as source of clinical and administrative data – cloned periodically (e.g., monthly) • AIW as incubator of algorithms that generate derived information
  • 6. AIW Workflow Cloned periodically Periodic data extraction Analytic Data subset, Multiple Databases Information mapped to a Warehouse standard model Calculation of Make derived analyses variables available (transform) in existing tools Augmented data set Load into multiple output forms
  • 8. Derived Variables • 30-day readmit • The 9 Emory Enhanced Risk Assessment Tool diagnosis categories • UHC product lines • “Disease indicators” (combinations of diagnosis codes, procedure codes, labs and/or med orders that indicate a condition) – Obesity – Uncontrolled diabetes – End-stage renal disease (ESRD) – Pressure ulcer – Sickle cell disease • Temporal variables derived over multiple encounters – Multiple MI – Multiple past 30-day readmissions – Sickle cell disease – Diabetes/uncontrolled diabetes – CKD/ESRD
  • 9. Emory Enhanced Risk Assessment Tool (ERAT) Diagnoses • Diabetes • Heart Failure • Chronic Kidney Disease • Chronic Obstructive Pulmonary Disease • Acute Myocardial Infarction • Stroke • History of Transplant • Cancer • Pulmonary Hypertension
  • 10. Identifying Variables Associated with 30-day Readmits • No variables in the CDW are broadly associated with (or predictive of) readmits across the entire EHC population • Need to drill-down into subpopulations to identify variables that are associated with readmits • Ultimately, may be able to derive subpopulation- specific predictive models of readmissions
  • 11. 3-year+ subset (2008-3/2011) Analytic Information Warehouse
  • 12. Association of CKD with 30-day Readmissions Overall Emory Readmission Rate = 15% CKD? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 19386 7017 26403 Readmission Rate = 21% No 30 Day Readmission 110058 23460 133518 Grand Total 129444 30477 159921 ESRD? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 23091 3312 26403 Readmission Rate =27% No 30 Day Readmission 124518 9000 133518 Grand Total 147609 12312 159921 Analytic Information Warehouse
  • 13. Association of Multiple MI with 30-day Readmissions Multiple MI? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 685 167 852 No 30 Day Readmission 5772 209 5981 Grand Total 6457 376 6833 Readmission Rate = 44%
  • 14. Uncontrolled Diabetes (total n=8696, readmit n=1844, Readmit Rate = 21%) Has Pressure Ulcer Pressure ulcer? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 387 128 515 Readmission No 30 Day Readmission 1053 260 1313 Rate = 33% Grand Total 1440 388 1828 Has ESRD ESRD? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 1200 327 1527 Readmission No 30 Day Readmission 3491 712 4203 Rate = 32% Grand Total 4691 1039 5730
  • 15. Sickle Cell Anemia and 30-day Readmits Sickle Cell Anemia Sickle Cell Anemia? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 25905 498 26403 Readmission No 30 Day Readmission 132550 968 133518 Rate = 34% Grand Total 158455 1466 159921 Sickle Cell Crisis SS Crisis? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 25972 431 26403 Readmission Rate = 36% No 30 Day Readmission 132759 759 133518 Grand Total 158731 1190 159921
  • 16. Association of MRSA with 30-day Readmissions Overall MRSA? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 25982 421 26403 Readmission Rate = 27% No 30 Day Readmission 132362 1156 133518 Grand Total 158344 1577 159921 Stroke MRSA? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 1203 16 1219 Readmission Rate= No 30 Day Readmission 3996 26 4022 Grand Total 5199 42 5241 38% MI MRSA? Subsequent 30-day readmit? FALSE TRUE Grand Total 30 Day Readmission 836 16 852 No 30 Day Readmission 5942 39 5981 Grand Total 6778 55 6833 Readmission Ra 29%
  • 17. Use of Temporal Variables in creating useful subsets of data (5 year dataset) Patient Number of Number of Population Encounters Readmissions Readmission Rate Overall Emory 232645 34270 15% Single MI 17992 2804 16% Multiple MI 1355 492 36% CKD 45664 10818 24% >=4 readmissions 17550 9459 54% Multiple MI and >= 4 readmissions 900 465 52% CKD and >=4 readmissions 6997 3606 52%
  • 18. Predictive Modeling for Readmission • Classify inpatient encounters into high, medium, low risk groups of 30-day readmission based on patients’ characteristics • Data preprocessing and mapping of codes • Predictive modeling – Random forests (ensemble of decision trees) – Ranking of the predictions into high to low risk • Emory specific data sets
  • 19. Random Forests • Random forests: an ensemble of tree predictors • Each tree is created using a random subset of the variables in the dataset • A large number of trees are generated • All of them vote to classify a test example • Reference: Leo Breiman, Random Forests, Machine Learning, 45, 5-32, 2001
  • 20. Random Forest (cont) • Generalization error depends on the strength of individual trees and the correlation between them • Its accuracy is as good as AdaBoost (another robust algorithm) • It is relatively robust to noise and outliers • It gives useful internal estimates of error, correlation, strength and variable importance
  • 21. Variables used in Predictive Modeling • Age, gender, race • Census tract data: population, population by race, average household income, persons per household • Primary and secondary diagnosis codes grouped using ontologies • Lab procedure codes grouped using ontologies • Vitals like heart rate, blood pressure, temperature, respiratory rate, BMI • Medications • Derived variables (next slide)
  • 22. Derived Variables • Disease flags – CKD, MI, HF, COPD, Diabetes, etc. • Medication flags – Diabetes medication count, ACE inhibitor, beta blocker, diuretic, inotropic agent, etc. • Treatment flags – Radiotherapy, chemotherapy • Patient history – Encounter 90 days earlier, 180 day earlier
  • 23. BMI Using WHO Simple Classification (1 year subset 4/2010-3/2011) Percent BMI Category for CKD patients Percent BMI Category for CKD female patients with multiple readmits (n=386) with multiple readmits (n=197) RR=1.2 “30 Day Readmission” represents encounters that were followed by a 30 day readmit “No 30 Day Readmission” represents other encounters that were not followed by a 30 day readmit Analytic Information Warehouse
  • 24. Predictive Modeling Results with Temporal Variable Constrained Dataset: MI data (Emory) All MI data and Multiple MI data Predict 30-day ed Risk # of # of Readmission Data encounters Readmissions rate All MI data High 968 360 37% Multiple MI High 68 35 51% All MI data (no predictive modeling) 9674 1648 17% Multiple MI (no predictive modeling) 376 167 44%
  • 25. Predictive Modeling Results with Temporal Variable Constrained Dataset: CKD data (Emory) All CKD data and End Stage Renal CKD Predicted # of # of Readmission Data Risk encounters Readmissions rate CKD High 2284 950 42% End Stage Renal High 952 444 47% All CKD (no predictive modeling) 45664 10818 24% End Stage Renal (no predictive modeling) 3312 12312 27%
  • 26. UHC Data Analyses • Much larger dataset • Much less detailed information about each patient • UHC only has coded data sent by institutions so co- morbidity related ICD-9 codes may be missing • Analyses across patient encounters can pick up chronic co-morbidities that might not be coded in a particular encounter
  • 27. Missing Diagnosis Codes in UHC dataset 10/1/2006 - 4/30/2011 Disease Number of Total number Number of Total number Patients with of patients Encounters of encounters missing codes with missing in future codes encounters Diabetes 144806 (8.01%) 1807322 311403 (9.4%) 3300804 Heart Failure 197043 (20.1%) 976041 366926 (20.7%) 1765203 MI 171213 (21.8%) 784559 301673 (25.8%) 1168056 Sickle Cell 2870 (10.5%) 27210 11162 (9.9%) 112268
  • 28. UHC Use of Temporal Variables in Sub setting Data Patient # Total # Readmitted Proportion of Patients Population Encounters Patients Readmitted MI 310954 47210 15.2% Multiple MI 73227 29017 39.6% Non-ESRD 13023536 1735308 13.3% ESRD 510702 142622 27.9% CKD 1334617 316399 23.7%
  • 29. UHC Use of Temporal Variables in Sub setting Data Patient # Total # Readmitted Proportion of Patients Population Patients Patients Readmitted Diabetes 2465049 465526 18.8% Uncontrolled Diabetes 388417 78005 20.0% ESRD 510702 142622 27.9% Uncontrolled Diabetes and ESRD 48583 14224 29.8%
  • 31. UHC “Readmission Hot Spots” 1000000 900000 800000 700000 600000 Encounters 500000 Patients 400000 300000 200000 100000 0 1 2 3 4 5 6 7 8
  • 32.
  • 33.
  • 34. Conclusion • Integrative dataset analysis can leverage patient information gathered over many encounters • Temporal analyses can generate derived variables that appear to correlate with readmissions • Hot spots appear to be an important phenomenon and have the potential of leading to patient-level interventions • Predictive modeling has promise of providing decision support • Future analysis will look at temporal patterns of encounters and relationship between LOS and readmission

Hinweis der Redaktion

  1. Hand off to Andrew at this point.
  2. Talk about how the temporal variablesMultiple MI and End Stage Renal helps in generating subsets of data that separate patients with different characteristics.This is Emory specific data set (richer set of variables than current UHC set).
  3. Using Multiple MI temporal feature, subset the data and develop a model based on the specific data. Talk about how temporal variable constrained data further helping the predictive model in generating a better list of high risk patients. Overall, we can generate a better final list of high risk patients with the use temporal variables than without.
  4. Using Multiple MI temporal feature, subset the data and develop a model based on the specific data. Talk about how temporal variable constrained data further helping the predictive model in generating a better list of high risk patients. Overall, we can generate a better final list of high risk patients with the use temporal variables than without.
  5. Statistics about patients who had diagnosis codes related to a disease in the past encounters but no such codes in at least one of the future encounters.UHC dataset 10/1/2006 - 4/30/2011Motivation for this slide: there are lot of encounters with valuable information missing. This information can be captured using temporal/longitudinal variables. Such longitudinal variables improve Predictive Models.
  6. Talk about how the temporal variablesMultiple MI and End Stage Renal helps in generating subsets of data that separate patients with different characteristics.
  7. Talk about how the temporal variablesMultiple MI and End Stage Renal helps in generating subsets of data that separate patients with different characteristics.