Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Wird geladen in …3
×
1 von 28

Wrangle 2016: Driving Healthcare Operations with Small Data

1

Teilen

Herunterladen, um offline zu lesen

By Sandy Ryza, Clover Health

How do you get people with chronic heart conditions to take their medication? Or diagnose complications as early as possible? Healthcare operations--the set of actions that organizations like insurers take to interact with their members--sit in some sort of nebulous shadow realm between social science, medicine, and corporate bureaucracy. In this talk, Sandy will throw some additional nouns that seem more at home in the modern web era, like "machine learning" and "A/B testing," into the mix. He'll also walk attendees through an example of now Clover Health builds and tests models for predicting which of diabetic members are likely to develop complications.

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Wrangle 2016: Driving Healthcare Operations with Small Data

  1. 1. Driving Healthcare Operations with Data Science
  2. 2. "literally a health insurance company"
  3. 3. "Operations" Clinical Operations ● Close member "gaps in care" ○ Not taking their meds ○ Not seeing their doctors ○ Not getting tested ● Document conditions Insurance Operations ● Approve / deny claims ● Approve / deny authorizations ● Catch fraud
  4. 4. "Operations" Clinical Operations ● Close member "gaps in care" ○ Not taking their meds ○ Not seeing their doctors ○ Not getting tested ● Document conditions Insurance Operations ● Approve / deny claims ● Approve / deny authorizations ● Catch fraud
  5. 5. E.g.
  6. 6. Data ScienceEnter
  7. 7. Data Science What should we do? For whom? Did it work? Enter
  8. 8. Case Study: Whom to Call for Home Visits?
  9. 9. Can we predict which of our diabetic members will have complications in the next 6 months?
  10. 10. Time Observation Interval Prediction Interval
  11. 11. Time Observation Interval Prediction Interval Demographic info, lab tests, medications, other diagnoses Diagnosed with diabetes complications?
  12. 12. Features Labels Member Age Hypertension hba1c CP001 65 Yes 6.5 CP002 77 No 8.3 CP002 84 Yes 7.4 Diagnosed with Complication in 6- month Interval Yes No Yes
  13. 13. Challenge: High Class Imbalance ● Historically, only 8% of diabetic members have been diagnosed with complications over a 6-month period. ● Easy to get "high" accuracy, but hard to get decent precision/recall tradeoff.
  14. 14. Approach: High Class Imbalance ● Evaluate using area under ROC curve. ● Empirically, tree ensemble models appear to handle the imbalance better than logistic regression.
  15. 15. Challenge: Missing Data ● Glycated hemoglobin clearly an important feature… but we only have measurements for ~60% of members. ● Whether we have a measurement correlates with both: ○ Diabetes complications. ○ How well a model trained without the lab measurement performs.
  16. 16. Approach: Missing Data ● Simply hardcode all missing values to something outside the measurement range. ○ In our case, 0.0. ● This way, tree models can split on "have a measurement" vs. "don't have a measurement".
  17. 17. Final Model: Gradient Boosting Tree Ensemble
  18. 18. Evaluation AUROC: 0.8 Precision: 24% Recall: 66%
  19. 19. Most Predictive Features Glycated Hemoglobin Age Hypertension Takes Insulin
  20. 20. Did it work?
  21. 21. Do we catch more complications if we make calls using the model?
  22. 22. Control Group Treatment Group
  23. 23. Control Group Treatment Group Call Group (Chosen at Random) Call Group (Chosen by Model)
  24. 24. Found Complications Didn't Find Complications Control Group 8 92 Treatment Group 24 76 FAKE RESULTS
  25. 25. Found Complications Didn't Find Complications Control Group 8 92 Treatment Group 24 76 FAKE RESULTS

Notizen

  • I think the best mascot for data science is an ogre beating something with a club.
  • So we have a ton of diabetic members, about a third of our members are diabetic, and we'd like to know which of these members are likely to suffer complications in the future, so that we can get them appropriate care
  • On-Tailed
  • ×