Humana strives to help the communities we serve and our individual members achieve their best health – no small task in the past year! We had the opportunity to rethink our existing operations and reimagine what a collaborative ML platform for hundreds of data scientists might look like. The primary goal of our ML Platform, named FlorenceAI, is to automate and accelerate the delivery lifecycle of data science solutions at scale. In this presentation, we will walk through an end-to-end example of how to build a model at scale on FlorenceAI and deploy it to production. Tools highlighted include Azure Databricks, MLFlow, AppInsights, and Azure Data Factory.
We will employ slides, notebooks and code snippets covering problem framing and design, initial feature selection, model design and experimentation, and a framework of centralized production code to streamline implementation. Hundreds of data scientists now use our feature store that has tens of thousands of features refreshed in daily and monthly cadences across several years of historical data. We already have dozens of models in production and also daily provide fresh insights for our Enterprise Clinical Operating Model. Each day, billions of rows of data are generated to give us timely information.
We already have examples of teams operating orders of magnitude faster and at a scale not within reach using fixed on-premise resources. Given rapid adoption from a dozen pilot users to over 100 MAU in the first 5 months, we will also share some anecodotes about key early wins created by the platform. We want FlorenceAI to enable Humana’s data scientists to focus their efforts where they add the most value so we can continue to deliver high-quality solutions that remain fresh, relevant and fair in an ever changing world.
1. FlorenceAI
Reinventing Data Science at Humana
David Mack, PhD
Cognitive/Machine Learning Principal
AI Engineering, Digital Health and Analytics
TM
A more human way to healthcareTM
2. David Mack, PhD – Cognitive/Machine Learning Principal
I have worked at Humana for 5½ years in clinical and enterprise
data science. I have been one of the primary architects and
maintainers of Humana’s ML Platform for the past 2 years that
now serves hundreds of data scientists. I love to tinker with
homemade IoT devices, build cool stuff, and learn new things!
Humana’s bold goal is to address the needs of the whole person
Have focused on community partnerships and social determinants of health
Commitment to help our millions of members achieve their best health
Fortune 50 company with $77.2bn consolidated revenue in 2020
Humana has invested significant resources into fighting:
• COVID-19 Pandemic
• Food Insecurity
• Loneliness and Social Isolation
• Inequities in Healthcare
Formed Digital Health and Analytics Organization in 2018
Through advanced analytics, experiential design, data and technology we are
working to meet our associates, members and the communities we serve,
anytime, anywhere, anyhow
3. What exactly is FlorenceAI*?
| 3
A cloud platform for automating and accelerating the delivery
lifecycle of data science solutions at scale in Azure
Key Foundational Pillars
• Feature stores
• Starter code frameworks
• Notebook based workflow
• Prod deployment partnership
• Extensive training curriculum
End-to-end ecosystem benefits
• Empowers data scientists to solve complex problems
• Promotes access to open-source innovation
• Simplifies model consumption with single interface
• Transforms workflows to improve performance
Microsoft Azure Cloud
Foundational Components
Other Key Tools
* Patent Pending
4. Feature Stores – Quality Ingredients for ML Algorithms
| 4
Extensive Metadata
• Standard descriptions
• Centralized ref tables
• Ratings to identify any
quality impacts
• Enables discovery and
exploration
Tens of thousands of features available for training and scoring
with hundreds of instances available across multiple years
Economies of Scale
• Pre-computed for
entire population
• Refreshed regularly at
different cadences
• Production ready and
pre-validated
Flexible but Specific
• Designed to cover
most use cases
• Domain expertise in
feature design
• Self-service for
custom situations
5. End-to-End Process
| 5
Cohort
Design
Initial Feature
Selection
Model Training
Experiments
Score and Register
Best Model
Record Training
Artifacts
Scoring Code
and Testing
Promote Model and
Automate Scoring
6. Example Problem to Help Trace the Workflow
| 6
12 months of history
Over 11 months of enrollment
6 months looking forward
Continuous enrollment
Fixed Calendar Date
Age ≥ 65, Medicare Advantage
Evidence of CKD stage in Medical Claims or Lab Results
Predict the most severe stage of Chronic Kidney Disease in the next 6 months
Criteria to Define the Cohort
All code snippets shown in subsequent slides are for illustrative purposes only and may have certain field names or variables redacted for security
9. First Round of Model Experimentation using SparkML
| 9
Helper Function to execute
the run available in shared
“experiment utility”
10. Arrive at a “Best Model” using SparkML
| 10
Different helper function to
save the best model and
provide more details
Accuracy alone isn’t always enough, so it’s important
to have views like ROC curves or Heatmaps to help
catch potential mistakes early
12. Encouraging Reproducibility with Reusable Code
| 12
What items are automatically saved to the MLFlow run?
• Hyperparameters
• Relevant Metrics
• MLFlow model object
• Evaluation Metric Figure (Downloadable)
What other artifacts are saved to ADLS?
• Original Input Schemas before any indexing or feature prep
• Original Training and Test Datasets with just selected features
• String Indexes and Imputation Dictionaries (outside of pipeline models)
• Best Model Scores from both training and test data
Storage
Account
Scoped
Workspace
Scoped
14. Key Distinctions of Deep Neural Networks
| 14
Multiclass
Example
Learns over
repeated passes
called “epochs”
What extra things can we do to help us decide which model is the best?
• Use early stopping to minimize training time and combat overfitting
• Use callbacks to log values at the end of each epoch
• Test on smaller chunks of data and scale up as we learn more
15. Bayesian Hyperparameter Searching with Hyperopt
| 15
Attempts to minimize
our loss function
Can set our hyperparameter space and the
number of trials we want to run
Used a sample of our training data to go
quickly over the 20 trials we chose to run
16. MLFlow has a Handy Comparison Tool to Help us Focus
| 16
Quick Insights: Complex Layer 1 and Complex Layer 2 don’t do well
Complex Layer 1 with Simpler Layer 2 do much better
Can highlight
ranges to focus
our attention
17. Let’s use MORE Data with Distributed Training!
| 17
Driver Only Petastorm
Petastorm &
Horovod
1 MM members
1 Worker
6 sec per epoch
Lots of trials to narrow
down our choices
10 MM members
1 Worker
63 sec per epoch
Using all the data, but
takes forever
10 MM members
16 Workers
14 sec per epoch
Train on all the data
much more quickly
We generally see a sqrt(n) speed up over a single worker
Using Petastorm and Horovod, we used all the data and trained 4.5x faster
19. We Improved the Precision of our Model!
| 19
We don’t see as much over-prediction of the majority class
and see better precision in the mid-range classes
SparkML Logistic Regression Tensorflow NN on all the Data
Weighted f1 score = 0.615
(prw = 0.633, rcw = 0.609)
Weighted f1 score = 0.615
(prw = 0.646, rcw = 0.602)
21. Scoring with a Spark UDF from MLFlow
| 21
• This allows us to easily get the scores into a Spark dataframe from any MLFlow model
• Can repeat for other types of targets or our training DF
22. Registering the Model
| 22
Model Metadata
(Screenshot from Models Tab in DB Workspace)
First registered in the Data
Scientist’s dev DB workspace
The Data Scientist promotes it to
“production” status in the dev
workspace after review
The associated MLFlow run is used
to also register it in our “production”
workspace for automated jobs
This newly registered model
is the official version used for
automated scoring
The path within the ADLS storage account contains the version so we can support multiple versions at the same time
23. Production Deployment Pipeline – Notebook-based Workflow
| 23
Key Requirements
• Use Azure DevOps to deploy code to various environments for testing and execution
• Tie execution to specific package versions and LTS non-ML Databricks Runtimes
• Use ADF Parameters to provide flexibility to minimize YAML code duplication
Reusable Framework of 3 notebooks: Feature Engineering, Scoring, Validation
Upstream Dependency Check
to prevent flow of bad data
and errors from missing data Logging via SQL Server to record
both success and failure
24. Partnership Between Data Scientists and AI Engineers is Pivotal
| 24
Each of the required files needed for deployment are part of the starter repo
and help the data scientist to have the end goal in view from the beginning
Each model is initially reviewed
and subsequently monitored for
AI Bias in key areas
All models are peer reviewed for both domain and
technical accuracy prior to production deployment
26. Key Early Wins – big steps forward
Scaling and automating clunky processes
• Scaled from less than 40 condition flags on-premise to over 3x this in the cloud
• Got contributions from multiple teams following templates
• Now updates over 1 bn rows daily in 1.5 hours for entire member population
Faster prep, more iterations, better tuning and collaboration
• Reduced feature engineering step on very large source from hours to a few min
• Enabled DS team to iterate on models faster, going from 5+ hours for training to a
half hour or less, even for complex GBT models
• Reduced scoring step on prospective members from a week to 30 minutes
Shared resources accelerate everyone
• Hundreds of feature stores mean less process/data duplication and more time to
improve model design with a variety of approaches
• Flexibility to score at scale regardless of algorithm package in automated fashion
with a common output format