SlideShare a Scribd company logo
1 of 28
Download to read offline
FlorenceAI
Reinventing Data Science at Humana
David Mack, PhD
Cognitive/Machine Learning Principal
AI Engineering, Digital Health and Analytics
TM
A more human way to healthcareTM
David Mack, PhD – Cognitive/Machine Learning Principal
I have worked at Humana for 5½ years in clinical and enterprise
data science. I have been one of the primary architects and
maintainers of Humana’s ML Platform for the past 2 years that
now serves hundreds of data scientists. I love to tinker with
homemade IoT devices, build cool stuff, and learn new things!
Humana’s bold goal is to address the needs of the whole person
Have focused on community partnerships and social determinants of health
Commitment to help our millions of members achieve their best health
Fortune 50 company with $77.2bn consolidated revenue in 2020
Humana has invested significant resources into fighting:
• COVID-19 Pandemic
• Food Insecurity
• Loneliness and Social Isolation
• Inequities in Healthcare
Formed Digital Health and Analytics Organization in 2018
Through advanced analytics, experiential design, data and technology we are
working to meet our associates, members and the communities we serve,
anytime, anywhere, anyhow
What exactly is FlorenceAI*?
| 3
A cloud platform for automating and accelerating the delivery
lifecycle of data science solutions at scale in Azure
Key Foundational Pillars
• Feature stores
• Starter code frameworks
• Notebook based workflow
• Prod deployment partnership
• Extensive training curriculum
End-to-end ecosystem benefits
• Empowers data scientists to solve complex problems
• Promotes access to open-source innovation
• Simplifies model consumption with single interface
• Transforms workflows to improve performance
Microsoft Azure Cloud
Foundational Components
Other Key Tools
* Patent Pending
Feature Stores – Quality Ingredients for ML Algorithms
| 4
Extensive Metadata
• Standard descriptions
• Centralized ref tables​
• Ratings to identify any
quality impacts
• Enables discovery and
exploration
Tens of thousands of features available for training and scoring
with hundreds of instances available across multiple years​
Economies of Scale
• Pre-computed​ for
entire population
• Refreshed regularly​ at
different cadences
• Production ready and
pre-validated
Flexible but Specific
• Designed to cover
most use cases
• Domain expertise in
feature design
• Self-service for
custom situations
End-to-End Process
| 5
Cohort
Design
Initial Feature
Selection
Model Training
Experiments
Score and Register
Best Model
Record Training
Artifacts
Scoring Code
and Testing
Promote Model and
Automate Scoring
Example Problem to Help Trace the Workflow
| 6
12 months of history
Over 11 months of enrollment
6 months looking forward
Continuous enrollment
Fixed Calendar Date
Age ≥ 65, Medicare Advantage
Evidence of CKD stage in Medical Claims or Lab Results
Predict the most severe stage of Chronic Kidney Disease in the next 6 months​
Criteria to Define the Cohort
All code snippets shown in subsequent slides are for illustrative purposes only and may have certain field names or variables redacted for security
Initial Feature Selection and
Traditional Model Training
Walkthrough:
Initial Feature Selection Notebook
Goal:
Identify hundreds of important
features among tens of thousands
First Round of Model Experimentation using SparkML
| 9
Helper Function to execute
the run available in shared
“experiment utility”
Arrive at a “Best Model” using SparkML
| 10
Different helper function to
save the best model and
provide more details
Accuracy alone isn’t always enough, so it’s important
to have views like ROC curves or Heatmaps to help
catch potential mistakes early
Walkthrough:
SparkML Helper Functions
Goals:
Abstract complexity and
standardize logging
Encouraging Reproducibility with Reusable Code
| 12
What items are automatically saved to the MLFlow run?
• Hyperparameters
• Relevant Metrics
• MLFlow model object
• Evaluation Metric Figure (Downloadable)
What other artifacts are saved to ADLS?
• Original Input Schemas before any indexing or feature prep
• Original Training and Test Datasets with just selected features
• String Indexes and Imputation Dictionaries (outside of pipeline models)
• Best Model Scores from both training and test data
Storage
Account
Scoped
Workspace
Scoped
Applying Deep Neural Networks
to Tabular Data at Scale
Key Distinctions of Deep Neural Networks
| 14
Multiclass
Example
Learns over
repeated passes
called “epochs”
What extra things can we do to help us decide which model is the best?
• Use early stopping to minimize training time and combat overfitting
• Use callbacks to log values at the end of each epoch
• Test on smaller chunks of data and scale up as we learn more
Bayesian Hyperparameter Searching with Hyperopt
| 15
Attempts to minimize
our loss function
Can set our hyperparameter space and the
number of trials we want to run
Used a sample of our training data to go
quickly over the 20 trials we chose to run
MLFlow has a Handy Comparison Tool to Help us Focus
| 16
Quick Insights: Complex Layer 1 and Complex Layer 2 don’t do well
Complex Layer 1 with Simpler Layer 2 do much better
Can highlight
ranges to focus
our attention
Let’s use MORE Data with Distributed Training!
| 17
Driver Only Petastorm
Petastorm &
Horovod
1 MM members
1 Worker
6 sec per epoch
Lots of trials to narrow
down our choices
10 MM members
1 Worker
63 sec per epoch
Using all the data, but
takes forever
10 MM members
16 Workers
14 sec per epoch
Train on all the data
much more quickly
We generally see a sqrt(n) speed up over a single worker
Using Petastorm and Horovod, we used all the data and trained 4.5x faster
Walkthrough:
Petastorm and Horovod
Helper Functions
Goals:
Save headaches and empower
data scientists to train on all of the
data quickly
We Improved the Precision of our Model!
| 19
We don’t see as much over-prediction of the majority class
and see better precision in the mid-range classes
SparkML Logistic Regression Tensorflow NN on all the Data
Weighted f1 score = 0.615
(prw = 0.633, rcw = 0.609)
Weighted f1 score = 0.615
(prw = 0.646, rcw = 0.602)
Register, Score, and Preserve
the Model Before Deploying
it to Production
Scoring with a Spark UDF from MLFlow
| 21
• This allows us to easily get the scores into a Spark dataframe from any MLFlow model
• Can repeat for other types of targets or our training DF
Registering the Model
| 22
Model Metadata
(Screenshot from Models Tab in DB Workspace)
First registered in the Data
Scientist’s dev DB workspace
The Data Scientist promotes it to
“production” status in the dev
workspace after review
The associated MLFlow run is used
to also register it in our “production”
workspace for automated jobs
This newly registered model
is the official version used for
automated scoring
The path within the ADLS storage account contains the version so we can support multiple versions at the same time
Production Deployment Pipeline – Notebook-based Workflow
| 23
Key Requirements
• Use Azure DevOps to deploy code to various environments for testing and execution
• Tie execution to specific package versions and LTS non-ML Databricks Runtimes
• Use ADF Parameters to provide flexibility to minimize YAML code duplication
Reusable Framework of 3 notebooks: Feature Engineering, Scoring, Validation
Upstream Dependency Check
to prevent flow of bad data
and errors from missing data Logging via SQL Server to record
both success and failure
Partnership Between Data Scientists and AI Engineers is Pivotal
| 24
Each of the required files needed for deployment are part of the starter repo
and help the data scientist to have the end goal in view from the beginning
Each model is initially reviewed
and subsequently monitored for
AI Bias in key areas
All models are peer reviewed for both domain and
technical accuracy prior to production deployment
Early Wins for the Platform
Key Early Wins – big steps forward
Scaling and automating clunky processes
• Scaled from less than 40 condition flags on-premise to over 3x this in the cloud
• Got contributions from multiple teams following templates
• Now updates over 1 bn rows daily in 1.5 hours for entire member population
Faster prep, more iterations, better tuning and collaboration
• Reduced feature engineering step on very large source from hours to a few min
• Enabled DS team to iterate on models faster, going from 5+ hours for training to a
half hour or less, even for complex GBT models
• Reduced scoring step on prospective members from a week to 30 minutes
Shared resources accelerate everyone
• Hundreds of feature stores mean less process/data duplication and more time to
improve model design with a variety of approaches
• Flexibility to score at scale regardless of algorithm package in automated fashion
with a common output format
A more human way to healthcareTM
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
butest
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Simplilearn
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
DataWorks Summit
 

What's hot (20)

Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
A Brief Introduction to Machine Learning.pptx
A Brief Introduction to Machine Learning.pptxA Brief Introduction to Machine Learning.pptx
A Brief Introduction to Machine Learning.pptx
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
Non Orthogonal Multiple Acess with SIC
Non Orthogonal Multiple Acess with SICNon Orthogonal Multiple Acess with SIC
Non Orthogonal Multiple Acess with SIC
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training Review
 
5G_NR_Overview_Architecture_and_Operating_Modes
5G_NR_Overview_Architecture_and_Operating_Modes5G_NR_Overview_Architecture_and_Operating_Modes
5G_NR_Overview_Architecture_and_Operating_Modes
 
Deep learning - what is it and why now?
Deep learning - what is it and why now?Deep learning - what is it and why now?
Deep learning - what is it and why now?
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!Applying AI to software engineering problems: Do not forget the human!
Applying AI to software engineering problems: Do not forget the human!
 
URLLC for 5G and Beyond: Physical, MAC, and Network Solutions
URLLC for 5G and Beyond: Physical, MAC, and Network SolutionsURLLC for 5G and Beyond: Physical, MAC, and Network Solutions
URLLC for 5G and Beyond: Physical, MAC, and Network Solutions
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth
 
Machine learning with py torch
Machine learning with py torchMachine learning with py torch
Machine learning with py torch
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to FlorenceAI: Reinventing Data Science at Humana

Chapter 10
Chapter 10Chapter 10
Chapter 10
bodo-con
 
renita lobo-CV-Automation
renita lobo-CV-Automationrenita lobo-CV-Automation
renita lobo-CV-Automation
Renita Lobo
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
Florian Wilhelm
 

Similar to FlorenceAI: Reinventing Data Science at Humana (20)

Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOps
 
Lect7
Lect7Lect7
Lect7
 
Lect7
Lect7Lect7
Lect7
 
Foutse_Khomh.pptx
Foutse_Khomh.pptxFoutse_Khomh.pptx
Foutse_Khomh.pptx
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
 
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video PlatformDeep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science Projects
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
renita lobo-CV-Automation
renita lobo-CV-Automationrenita lobo-CV-Automation
renita lobo-CV-Automation
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Agile Development unleashed
Agile Development unleashedAgile Development unleashed
Agile Development unleashed
 
DevOps 101
DevOps 101DevOps 101
DevOps 101
 
An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...
 
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Recently uploaded (20)

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 

FlorenceAI: Reinventing Data Science at Humana

  • 1. FlorenceAI Reinventing Data Science at Humana David Mack, PhD Cognitive/Machine Learning Principal AI Engineering, Digital Health and Analytics TM A more human way to healthcareTM
  • 2. David Mack, PhD – Cognitive/Machine Learning Principal I have worked at Humana for 5½ years in clinical and enterprise data science. I have been one of the primary architects and maintainers of Humana’s ML Platform for the past 2 years that now serves hundreds of data scientists. I love to tinker with homemade IoT devices, build cool stuff, and learn new things! Humana’s bold goal is to address the needs of the whole person Have focused on community partnerships and social determinants of health Commitment to help our millions of members achieve their best health Fortune 50 company with $77.2bn consolidated revenue in 2020 Humana has invested significant resources into fighting: • COVID-19 Pandemic • Food Insecurity • Loneliness and Social Isolation • Inequities in Healthcare Formed Digital Health and Analytics Organization in 2018 Through advanced analytics, experiential design, data and technology we are working to meet our associates, members and the communities we serve, anytime, anywhere, anyhow
  • 3. What exactly is FlorenceAI*? | 3 A cloud platform for automating and accelerating the delivery lifecycle of data science solutions at scale in Azure Key Foundational Pillars • Feature stores • Starter code frameworks • Notebook based workflow • Prod deployment partnership • Extensive training curriculum End-to-end ecosystem benefits • Empowers data scientists to solve complex problems • Promotes access to open-source innovation • Simplifies model consumption with single interface • Transforms workflows to improve performance Microsoft Azure Cloud Foundational Components Other Key Tools * Patent Pending
  • 4. Feature Stores – Quality Ingredients for ML Algorithms | 4 Extensive Metadata • Standard descriptions • Centralized ref tables​ • Ratings to identify any quality impacts • Enables discovery and exploration Tens of thousands of features available for training and scoring with hundreds of instances available across multiple years​ Economies of Scale • Pre-computed​ for entire population • Refreshed regularly​ at different cadences • Production ready and pre-validated Flexible but Specific • Designed to cover most use cases • Domain expertise in feature design • Self-service for custom situations
  • 5. End-to-End Process | 5 Cohort Design Initial Feature Selection Model Training Experiments Score and Register Best Model Record Training Artifacts Scoring Code and Testing Promote Model and Automate Scoring
  • 6. Example Problem to Help Trace the Workflow | 6 12 months of history Over 11 months of enrollment 6 months looking forward Continuous enrollment Fixed Calendar Date Age ≥ 65, Medicare Advantage Evidence of CKD stage in Medical Claims or Lab Results Predict the most severe stage of Chronic Kidney Disease in the next 6 months​ Criteria to Define the Cohort All code snippets shown in subsequent slides are for illustrative purposes only and may have certain field names or variables redacted for security
  • 7. Initial Feature Selection and Traditional Model Training
  • 8. Walkthrough: Initial Feature Selection Notebook Goal: Identify hundreds of important features among tens of thousands
  • 9. First Round of Model Experimentation using SparkML | 9 Helper Function to execute the run available in shared “experiment utility”
  • 10. Arrive at a “Best Model” using SparkML | 10 Different helper function to save the best model and provide more details Accuracy alone isn’t always enough, so it’s important to have views like ROC curves or Heatmaps to help catch potential mistakes early
  • 11. Walkthrough: SparkML Helper Functions Goals: Abstract complexity and standardize logging
  • 12. Encouraging Reproducibility with Reusable Code | 12 What items are automatically saved to the MLFlow run? • Hyperparameters • Relevant Metrics • MLFlow model object • Evaluation Metric Figure (Downloadable) What other artifacts are saved to ADLS? • Original Input Schemas before any indexing or feature prep • Original Training and Test Datasets with just selected features • String Indexes and Imputation Dictionaries (outside of pipeline models) • Best Model Scores from both training and test data Storage Account Scoped Workspace Scoped
  • 13. Applying Deep Neural Networks to Tabular Data at Scale
  • 14. Key Distinctions of Deep Neural Networks | 14 Multiclass Example Learns over repeated passes called “epochs” What extra things can we do to help us decide which model is the best? • Use early stopping to minimize training time and combat overfitting • Use callbacks to log values at the end of each epoch • Test on smaller chunks of data and scale up as we learn more
  • 15. Bayesian Hyperparameter Searching with Hyperopt | 15 Attempts to minimize our loss function Can set our hyperparameter space and the number of trials we want to run Used a sample of our training data to go quickly over the 20 trials we chose to run
  • 16. MLFlow has a Handy Comparison Tool to Help us Focus | 16 Quick Insights: Complex Layer 1 and Complex Layer 2 don’t do well Complex Layer 1 with Simpler Layer 2 do much better Can highlight ranges to focus our attention
  • 17. Let’s use MORE Data with Distributed Training! | 17 Driver Only Petastorm Petastorm & Horovod 1 MM members 1 Worker 6 sec per epoch Lots of trials to narrow down our choices 10 MM members 1 Worker 63 sec per epoch Using all the data, but takes forever 10 MM members 16 Workers 14 sec per epoch Train on all the data much more quickly We generally see a sqrt(n) speed up over a single worker Using Petastorm and Horovod, we used all the data and trained 4.5x faster
  • 18. Walkthrough: Petastorm and Horovod Helper Functions Goals: Save headaches and empower data scientists to train on all of the data quickly
  • 19. We Improved the Precision of our Model! | 19 We don’t see as much over-prediction of the majority class and see better precision in the mid-range classes SparkML Logistic Regression Tensorflow NN on all the Data Weighted f1 score = 0.615 (prw = 0.633, rcw = 0.609) Weighted f1 score = 0.615 (prw = 0.646, rcw = 0.602)
  • 20. Register, Score, and Preserve the Model Before Deploying it to Production
  • 21. Scoring with a Spark UDF from MLFlow | 21 • This allows us to easily get the scores into a Spark dataframe from any MLFlow model • Can repeat for other types of targets or our training DF
  • 22. Registering the Model | 22 Model Metadata (Screenshot from Models Tab in DB Workspace) First registered in the Data Scientist’s dev DB workspace The Data Scientist promotes it to “production” status in the dev workspace after review The associated MLFlow run is used to also register it in our “production” workspace for automated jobs This newly registered model is the official version used for automated scoring The path within the ADLS storage account contains the version so we can support multiple versions at the same time
  • 23. Production Deployment Pipeline – Notebook-based Workflow | 23 Key Requirements • Use Azure DevOps to deploy code to various environments for testing and execution • Tie execution to specific package versions and LTS non-ML Databricks Runtimes • Use ADF Parameters to provide flexibility to minimize YAML code duplication Reusable Framework of 3 notebooks: Feature Engineering, Scoring, Validation Upstream Dependency Check to prevent flow of bad data and errors from missing data Logging via SQL Server to record both success and failure
  • 24. Partnership Between Data Scientists and AI Engineers is Pivotal | 24 Each of the required files needed for deployment are part of the starter repo and help the data scientist to have the end goal in view from the beginning Each model is initially reviewed and subsequently monitored for AI Bias in key areas All models are peer reviewed for both domain and technical accuracy prior to production deployment
  • 25. Early Wins for the Platform
  • 26. Key Early Wins – big steps forward Scaling and automating clunky processes • Scaled from less than 40 condition flags on-premise to over 3x this in the cloud • Got contributions from multiple teams following templates • Now updates over 1 bn rows daily in 1.5 hours for entire member population Faster prep, more iterations, better tuning and collaboration • Reduced feature engineering step on very large source from hours to a few min • Enabled DS team to iterate on models faster, going from 5+ hours for training to a half hour or less, even for complex GBT models • Reduced scoring step on prospective members from a week to 30 minutes Shared resources accelerate everyone • Hundreds of feature stores mean less process/data duplication and more time to improve model design with a variety of approaches • Flexibility to score at scale regardless of algorithm package in automated fashion with a common output format
  • 27. A more human way to healthcareTM
  • 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.