SlideShare a Scribd company logo
1 of 197
Download to read offline
Catch Me If You Can: Keeping up with ML
Models in Production
Shreya Shankar
May 2021
Shreya Shankar Keeping up with ML in Production May 2021 1 / 50
Outline
1 Introduction
2 Case study: deploying a simple model
3 Post-deployment challenges
4 Monitoring and tracing
5 Conclusion
Shreya Shankar Keeping up with ML in Production May 2021 2 / 50
Introductions
BS and MS from Stanford Computer Science
Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
Introductions
BS and MS from Stanford Computer Science
First machine learning engineer at Viaduct, an applied ML startup
Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
Introductions
BS and MS from Stanford Computer Science
First machine learning engineer at Viaduct, an applied ML startup
Working with terabytes of time series data
Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
Introductions
BS and MS from Stanford Computer Science
First machine learning engineer at Viaduct, an applied ML startup
Working with terabytes of time series data
Building infrastructure for large-scale machine learning and data
analytics
Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
Introductions
BS and MS from Stanford Computer Science
First machine learning engineer at Viaduct, an applied ML startup
Working with terabytes of time series data
Building infrastructure for large-scale machine learning and data
analytics
Responsibilities spanning recruiting, SWE, ML, product, and more
Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
Academic and industry ML are different
Most academic ML efforts are focused on training techniques
Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
Academic and industry ML are different
Most academic ML efforts are focused on training techniques
Fairness
Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
Academic and industry ML are different
Most academic ML efforts are focused on training techniques
Fairness
Generalizability to test sets with unknown distributions
Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
Academic and industry ML are different
Most academic ML efforts are focused on training techniques
Fairness
Generalizability to test sets with unknown distributions
Security
Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
Academic and industry ML are different
Most academic ML efforts are focused on training techniques
Fairness
Generalizability to test sets with unknown distributions
Security
Industry places emphasis on Agile working style
Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
Academic and industry ML are different
Most academic ML efforts are focused on training techniques
Fairness
Generalizability to test sets with unknown distributions
Security
Industry places emphasis on Agile working style
In industry, we want to train few models but do lots of inference
Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
Academic and industry ML are different
Most academic ML efforts are focused on training techniques
Fairness
Generalizability to test sets with unknown distributions
Security
Industry places emphasis on Agile working style
In industry, we want to train few models but do lots of inference
What happens beyond the validation or test set?
Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
The depressing truth about ML IRL
87% of data science projects don’t make it to production1
1
https://venturebeat.com/2019/07/19/
why-do-87-of-data-science-projects-never-make-it-into-production/
Shreya Shankar Keeping up with ML in Production May 2021 5 / 50
The depressing truth about ML IRL
87% of data science projects don’t make it to production1
Data in the "real world" is not necessarily clean and balanced, like
canonical benchmark datasets (ex: ImageNet)
1
https://venturebeat.com/2019/07/19/
why-do-87-of-data-science-projects-never-make-it-into-production/
Shreya Shankar Keeping up with ML in Production May 2021 5 / 50
The depressing truth about ML IRL
87% of data science projects don’t make it to production1
Data in the "real world" is not necessarily clean and balanced, like
canonical benchmark datasets (ex: ImageNet)
Data in the "real world" is always changing
1
https://venturebeat.com/2019/07/19/
why-do-87-of-data-science-projects-never-make-it-into-production/
Shreya Shankar Keeping up with ML in Production May 2021 5 / 50
The depressing truth about ML IRL
87% of data science projects don’t make it to production1
Data in the "real world" is not necessarily clean and balanced, like
canonical benchmark datasets (ex: ImageNet)
Data in the "real world" is always changing
Showing high performance on a fixed train and validation set 6=
consistent high performance when that model is deployed
1
https://venturebeat.com/2019/07/19/
why-do-87-of-data-science-projects-never-make-it-into-production/
Shreya Shankar Keeping up with ML in Production May 2021 5 / 50
Today’s talk
Not about how to improve model performance on a validation or test
set
Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
Today’s talk
Not about how to improve model performance on a validation or test
set
Not about feature engineering, machine learning algorithms, or data
science
Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
Today’s talk
Not about how to improve model performance on a validation or test
set
Not about feature engineering, machine learning algorithms, or data
science
Discussing:
Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
Today’s talk
Not about how to improve model performance on a validation or test
set
Not about feature engineering, machine learning algorithms, or data
science
Discussing:
Case study of challenges faced post-deployment of models
Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
Today’s talk
Not about how to improve model performance on a validation or test
set
Not about feature engineering, machine learning algorithms, or data
science
Discussing:
Case study of challenges faced post-deployment of models
Showcase of tools to monitor production pipelines and enable Agile
teams of different stakeholders to quickly identify performance
degradation
Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
Today’s talk
Not about how to improve model performance on a validation or test
set
Not about feature engineering, machine learning algorithms, or data
science
Discussing:
Case study of challenges faced post-deployment of models
Showcase of tools to monitor production pipelines and enable Agile
teams of different stakeholders to quickly identify performance
degradation
Motivating when to retrain machine learning models in production
Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
General problem statement
Common to experience performance drift after a model is deployed
Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
General problem statement
Common to experience performance drift after a model is deployed
Unavoidable in many cases
Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
General problem statement
Common to experience performance drift after a model is deployed
Unavoidable in many cases
Time series data
Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
General problem statement
Common to experience performance drift after a model is deployed
Unavoidable in many cases
Time series data
High-variance data
Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
General problem statement
Common to experience performance drift after a model is deployed
Unavoidable in many cases
Time series data
High-variance data
Intuitively, a model and pipeline should not be stale
Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
General problem statement
Common to experience performance drift after a model is deployed
Unavoidable in many cases
Time series data
High-variance data
Intuitively, a model and pipeline should not be stale
Want the pipeline to reflect most recent data points
Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
General problem statement
Common to experience performance drift after a model is deployed
Unavoidable in many cases
Time series data
High-variance data
Intuitively, a model and pipeline should not be stale
Want the pipeline to reflect most recent data points
Will simulate performance drift with a toy task — predicting whether
a taxi rider will give their driver a high tip or not
Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
Case study description
Using publicly available NYC Taxicab Trip Data (data in public S3
bucket s3://nyc-tlc/trip data/)
Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
Case study description
Using publicly available NYC Taxicab Trip Data (data in public S3
bucket s3://nyc-tlc/trip data/)
For this exercise, we will train and "deploy" a model to predict
whether the user gave a large tip
Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
Case study description
Using publicly available NYC Taxicab Trip Data (data in public S3
bucket s3://nyc-tlc/trip data/)
For this exercise, we will train and "deploy" a model to predict
whether the user gave a large tip
Using Python, Prometheus, Grafana, mlflow, mltrace
Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
Case study description
Using publicly available NYC Taxicab Trip Data (data in public S3
bucket s3://nyc-tlc/trip data/)
For this exercise, we will train and "deploy" a model to predict
whether the user gave a large tip
Using Python, Prometheus, Grafana, mlflow, mltrace
Goal of this exercise is to demonstrate what can go wrong
post-deployment and how to troubleshoot bugs
Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
Case study description
Using publicly available NYC Taxicab Trip Data (data in public S3
bucket s3://nyc-tlc/trip data/)
For this exercise, we will train and "deploy" a model to predict
whether the user gave a large tip
Using Python, Prometheus, Grafana, mlflow, mltrace
Goal of this exercise is to demonstrate what can go wrong
post-deployment and how to troubleshoot bugs
Diagnosing failure points in the pipeline is challenging for Agile teams
when pipelines are very complex!
Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
Tabular dataset where each record corresponds to one taxi ride
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
Tabular dataset where each record corresponds to one taxi ride
Pickup and dropoff times
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
Tabular dataset where each record corresponds to one taxi ride
Pickup and dropoff times
Number of passengers
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
Tabular dataset where each record corresponds to one taxi ride
Pickup and dropoff times
Number of passengers
Trip distance
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
Tabular dataset where each record corresponds to one taxi ride
Pickup and dropoff times
Number of passengers
Trip distance
Pickup and dropoff location zones
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
Tabular dataset where each record corresponds to one taxi ride
Pickup and dropoff times
Number of passengers
Trip distance
Pickup and dropoff location zones
Fare, toll, and tip amounts
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
Tabular dataset where each record corresponds to one taxi ride
Pickup and dropoff times
Number of passengers
Trip distance
Pickup and dropoff location zones
Fare, toll, and tip amounts
Monthly files from Jan 2009 to June 2020
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Dataset description
Simple toy example using the publicly available NYC Taxicab Trip
Data2
Tabular dataset where each record corresponds to one taxi ride
Pickup and dropoff times
Number of passengers
Trip distance
Pickup and dropoff location zones
Fare, toll, and tip amounts
Monthly files from Jan 2009 to June 2020
Each month is about 1 GB of data → over 100 GB
2
https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
Learning task
Let X be the train covariate matrix (features) and y be the train
target vector (labels)
Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
Learning task
Let X be the train covariate matrix (features) and y be the train
target vector (labels)
Xi represents an n-dim feature vector corresponding to the ith ride and
yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the
total fare
Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
Learning task
Let X be the train covariate matrix (features) and y be the train
target vector (labels)
Xi represents an n-dim feature vector corresponding to the ith ride and
yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the
total fare
Want to learn classifier f such that f (Xi ) ≈ yi for all Xi ∈ X
Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
Learning task
Let X be the train covariate matrix (features) and y be the train
target vector (labels)
Xi represents an n-dim feature vector corresponding to the ith ride and
yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the
total fare
Want to learn classifier f such that f (Xi ) ≈ yi for all Xi ∈ X
We do not want to overfit, meaning F1(Xtrain) ≈ F1(Xtest)
Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
Learning task
Let X be the train covariate matrix (features) and y be the train
target vector (labels)
Xi represents an n-dim feature vector corresponding to the ith ride and
yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the
total fare
Want to learn classifier f such that f (Xi ) ≈ yi for all Xi ∈ X
We do not want to overfit, meaning F1(Xtrain) ≈ F1(Xtest)
We will use a RandomForestClassifier with basic hyperparameters
Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
Learning task
Let X be the train covariate matrix (features) and y be the train
target vector (labels)
Xi represents an n-dim feature vector corresponding to the ith ride and
yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the
total fare
Want to learn classifier f such that f (Xi ) ≈ yi for all Xi ∈ X
We do not want to overfit, meaning F1(Xtrain) ≈ F1(Xtest)
We will use a RandomForestClassifier with basic hyperparameters
n_estimators=100, max_depth=10
Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
Toy ML Pipeline
Code lives in toy-ml-pipeline3, an example of an end-to-end pipeline for
our high tip prediction task.
3
https://github.com/shreyashankar/toy-ml-pipeline
Shreya Shankar Keeping up with ML in Production May 2021 11 / 50
Toy ML pipeline utilities
S3 bucket for intermediate storage
Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
Toy ML pipeline utilities
S3 bucket for intermediate storage
io library
Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
Toy ML pipeline utilities
S3 bucket for intermediate storage
io library
dumps versioned output of each stage every time a stage is run
Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
Toy ML pipeline utilities
S3 bucket for intermediate storage
io library
dumps versioned output of each stage every time a stage is run
load latest version of outputs so we can use them as inputs to another
stage
Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
Toy ML pipeline utilities
S3 bucket for intermediate storage
io library
dumps versioned output of each stage every time a stage is run
load latest version of outputs so we can use them as inputs to another
stage
Serve predictions via Flask application
global mw
mw = models. RandomForestModelWrapper .load(’training/
models ’)
@app.route(’/predict ’, methods=[’POST ’])
def predict ():
req = request.get_json ()
df = pd.read_json(req[’data ’])
df[’prediction ’] = mw.predict(df)
result = {
’prediction ’: df[’prediction ’].to_list ()
}
return jsonify(result)
Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
ETL
Cleaning stage reads in raw data for each month and removes:
Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
ETL
Cleaning stage reads in raw data for each month and removes:
Rows with timestamps that don’t fall within the month range
Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
ETL
Cleaning stage reads in raw data for each month and removes:
Rows with timestamps that don’t fall within the month range
Rows with $0 fares
Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
ETL
Cleaning stage reads in raw data for each month and removes:
Rows with timestamps that don’t fall within the month range
Rows with $0 fares
Featuregen stage computes the following features:
Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
ETL
Cleaning stage reads in raw data for each month and removes:
Rows with timestamps that don’t fall within the month range
Rows with $0 fares
Featuregen stage computes the following features:
Trip-based: passenger count, trip distance, trip time, trip speed
Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
ETL
Cleaning stage reads in raw data for each month and removes:
Rows with timestamps that don’t fall within the month range
Rows with $0 fares
Featuregen stage computes the following features:
Trip-based: passenger count, trip distance, trip time, trip speed
Pickup-based: pickup weekday, pickup hour, pickup minute, work hour
indicator
Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
ETL
Cleaning stage reads in raw data for each month and removes:
Rows with timestamps that don’t fall within the month range
Rows with $0 fares
Featuregen stage computes the following features:
Trip-based: passenger count, trip distance, trip time, trip speed
Pickup-based: pickup weekday, pickup hour, pickup minute, work hour
indicator
Basic categorical: pickup location code, dropoff location code, type of
ride
Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
precision = number of true positives
number of records predicted to be positive
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
precision = number of true positives
number of records predicted to be positive
recall = number of true positives
number of positives in the dataset
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
precision = number of true positives
number of records predicted to be positive
recall = number of true positives
number of positives in the dataset
F1 = 2∗precision∗recall
precision+recall
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
precision = number of true positives
number of records predicted to be positive
recall = number of true positives
number of positives in the dataset
F1 = 2∗precision∗recall
precision+recall
Higher F1 score is better, want to have low false positives and false
negatives
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
precision = number of true positives
number of records predicted to be positive
recall = number of true positives
number of positives in the dataset
F1 = 2∗precision∗recall
precision+recall
Higher F1 score is better, want to have low false positives and false
negatives
Steps
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
precision = number of true positives
number of records predicted to be positive
recall = number of true positives
number of positives in the dataset
F1 = 2∗precision∗recall
precision+recall
Higher F1 score is better, want to have low false positives and false
negatives
Steps
1 Split the output of the featuregen stage into train and validation sets
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
precision = number of true positives
number of records predicted to be positive
recall = number of true positives
number of positives in the dataset
F1 = 2∗precision∗recall
precision+recall
Higher F1 score is better, want to have low false positives and false
negatives
Steps
1 Split the output of the featuregen stage into train and validation sets
2 Train and validate the model in a notebook
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Offline training and evaluation
Train on January 2020, validate on February 2020
We will measure F1 score for our metric
precision = number of true positives
number of records predicted to be positive
recall = number of true positives
number of positives in the dataset
F1 = 2∗precision∗recall
precision+recall
Higher F1 score is better, want to have low false positives and false
negatives
Steps
1 Split the output of the featuregen stage into train and validation sets
2 Train and validate the model in a notebook
3 Train the model for production in another notebook
Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
Train and test split code snippet
train_month = ’2020_01 ’
test_month = ’2020_02 ’
# Load latest features for train and test sets
train_df = io. load_output_df (os.path.join(’features ’,
train_month))
test_df = io.load_output_df(os.path.join(’features ’,
test_month))
# Save train and test sets
print(io.save_output_df (train_df , ’training/files/train ’))
print(io.save_output_df (test_df , ’training/files/test ’))
Output
s3://toy-applied-ml-pipeline/dev/training/files/train/20210501-165847.pq
s3://toy-applied-ml-pipeline/dev/training/files/test/20210501-170136.pq
Shreya Shankar Keeping up with ML in Production May 2021 15 / 50
Training code snippet
feature_columns = [
’pickup_weekday ’, ’pickup_hour ’,
’pickup_minute ’, ’work_hours ’,
’passenger_count ’, ’trip_distance ’,
’trip_time ’, ’trip_speed ’, ’PULocationID ’,
’DOLocationID ’, ’RatecodeID ’
]
label_column = ’high_tip_indicator ’
model_params = {’max_depth ’: 10}
# Create and train model
mw = models. RandomForestModelWrapper (feature_columns =
feature_columns , model_params=
model_params)
mw.train(train_df , label_column)
Shreya Shankar Keeping up with ML in Production May 2021 16 / 50
Training and test set evaluation
train_score = mw.score(train_df , label_column)
test_score = mw.score(test_df , label_column)
mw.add_metric(’train_f1 ’, train_score)
mw.add_metric(’test_f1 ’, test_score)
print(’Metrics:’)
print(mw.get_metrics ())
Output
Metrics:
’train_f1’: 0.7310504153094398, ’test_f1’: 0.735223195868965
Shreya Shankar Keeping up with ML in Production May 2021 17 / 50
Train the production model
Simulate deployment in March 2020 (COVID-19 incoming!)
Shreya Shankar Keeping up with ML in Production May 2021 18 / 50
Train the production model
Simulate deployment in March 2020 (COVID-19 incoming!)
In practice, train models on different windows/do more thorough data
science
Shreya Shankar Keeping up with ML in Production May 2021 18 / 50
Train the production model
Simulate deployment in March 2020 (COVID-19 incoming!)
In practice, train models on different windows/do more thorough data
science
Here, we will just train a model on February 2020 with the same
parameters that we explored before
train_df = io. load_output_df (os.path.join(’features ’, ’
2020 -02’))
prod_mw = models. RandomForestModelWrapper (
feature_columns =
feature_columns ,
model_params=model_params)
prod_mw.train(train_df , label_column)
train_score = prod_mw.score(train_df , label_column)
prod_mw.add_metric(’train_f1 ’, train_score)
prod_mw.save(’training/models ’, dev=False)
Shreya Shankar Keeping up with ML in Production May 2021 18 / 50
Deploy model
Flask server that runs predict on features passed in via request
Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
Deploy model
Flask server that runs predict on features passed in via request
Separate file inference.py that repeatedly sends requests containing
features
Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
Deploy model
Flask server that runs predict on features passed in via request
Separate file inference.py that repeatedly sends requests containing
features
Logging metrics in the Flask app via Prometheus
Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
Deploy model
Flask server that runs predict on features passed in via request
Separate file inference.py that repeatedly sends requests containing
features
Logging metrics in the Flask app via Prometheus
Visualizing metrics via Grafana
Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
Deploy model
Flask server that runs predict on features passed in via request
Separate file inference.py that repeatedly sends requests containing
features
Logging metrics in the Flask app via Prometheus
Visualizing metrics via Grafana
Demo
Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
Inspect F1 scores at the end of every day
Output
day rolling_f1 daily_f1
178123 1 0.576629 0.576629
370840 2 0.633320 0.677398
592741 3 0.649983 0.675877
. . .
1514827 28 0.659934 0.501860
1520358 29 0.659764 0.537860
1529847 30 0.659530 0.576178
Shreya Shankar Keeping up with ML in Production May 2021 20 / 50
Deeper dive into March F1 scores
day rolling_f1 daily_f1
178123 1 0.576629 0.576629
370840 2 0.633320 0.677398
592741 3 0.649983 0.675877
. . .
1507234 27 0.660228 0.576993
1514827 28 0.659934 0.501860
1520358 29 0.659764 0.537860
1529847 30 0.659530 0.576178
Large discrepancy between rolling metric and daily metric
Shreya Shankar Keeping up with ML in Production May 2021 21 / 50
Deeper dive into March F1 scores
day rolling_f1 daily_f1
178123 1 0.576629 0.576629
370840 2 0.633320 0.677398
592741 3 0.649983 0.675877
. . .
1507234 27 0.660228 0.576993
1514827 28 0.659934 0.501860
1520358 29 0.659764 0.537860
1529847 30 0.659530 0.576178
Large discrepancy between rolling metric and daily metric
What you monitor is important – daily metric significantly drops
towards the end
Shreya Shankar Keeping up with ML in Production May 2021 21 / 50
Deeper dive into March F1 scores
day rolling_f1 daily_f1
178123 1 0.576629 0.576629
370840 2 0.633320 0.677398
592741 3 0.649983 0.675877
. . .
1507234 27 0.660228 0.576993
1514827 28 0.659934 0.501860
1520358 29 0.659764 0.537860
1529847 30 0.659530 0.576178
Large discrepancy between rolling metric and daily metric
What you monitor is important – daily metric significantly drops
towards the end
Visible effect of COVID-19 on the data
Shreya Shankar Keeping up with ML in Production May 2021 21 / 50
Evaluating the same model on following months
Output
2020-04
F1: 0.5714705472990737
2020-05
F1: 0.5530868473460906
2020-06
F1: 0.5967621469282887
Performance gets significantly worse! We will need to retrain models
periodically.
Shreya Shankar Keeping up with ML in Production May 2021 22 / 50
Challenge: lag
In this example we are simulating a "live" deployment on historical
data, so we have no lag
Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
Challenge: lag
In this example we are simulating a "live" deployment on historical
data, so we have no lag
In practice there are many types of lag
Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
Challenge: lag
In this example we are simulating a "live" deployment on historical
data, so we have no lag
In practice there are many types of lag
Feature lag: our system only learns about the ride well after it has
occurred
Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
Challenge: lag
In this example we are simulating a "live" deployment on historical
data, so we have no lag
In practice there are many types of lag
Feature lag: our system only learns about the ride well after it has
occurred
Label lag: our system only learns of the label (fare, tip) well after the
ride has occurred
Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
Challenge: lag
In this example we are simulating a "live" deployment on historical
data, so we have no lag
In practice there are many types of lag
Feature lag: our system only learns about the ride well after it has
occurred
Label lag: our system only learns of the label (fare, tip) well after the
ride has occurred
The evaluation metric will inherently be lagging
Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
Challenge: lag
In this example we are simulating a "live" deployment on historical
data, so we have no lag
In practice there are many types of lag
Feature lag: our system only learns about the ride well after it has
occurred
Label lag: our system only learns of the label (fare, tip) well after the
ride has occurred
The evaluation metric will inherently be lagging
We might not be able to train on the most recent data
Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
Challenge: distribution shift
Data "drifts" over time, and models will need to be retrained to
reflect such drift
Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
Challenge: distribution shift
Data "drifts" over time, and models will need to be retrained to
reflect such drift
Open question: how often do you retrain a model?
Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
Challenge: distribution shift
Data "drifts" over time, and models will need to be retrained to
reflect such drift
Open question: how often do you retrain a model?
Retraining adds complexity to the overall system (more artifacts to
version and keep track of)
Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
Challenge: distribution shift
Data "drifts" over time, and models will need to be retrained to
reflect such drift
Open question: how often do you retrain a model?
Retraining adds complexity to the overall system (more artifacts to
version and keep track of)
Retraining can be expensive in terms of compute
Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
Challenge: distribution shift
Data "drifts" over time, and models will need to be retrained to
reflect such drift
Open question: how often do you retrain a model?
Retraining adds complexity to the overall system (more artifacts to
version and keep track of)
Retraining can be expensive in terms of compute
Retraining can take time and energy from people
Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
Challenge: distribution shift
Data "drifts" over time, and models will need to be retrained to
reflect such drift
Open question: how often do you retrain a model?
Retraining adds complexity to the overall system (more artifacts to
version and keep track of)
Retraining can be expensive in terms of compute
Retraining can take time and energy from people
Open question: how do you know when the data has "drifted?"
Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
Demo
Shreya Shankar Keeping up with ML in Production May 2021 25 / 50
How do you know when the data has "drifted?"
4
4
Rabanser, Günnemann, and Lipton, Failing Loudly: An Empirical Study of Methods
for Detecting Dataset Shift.
Shreya Shankar Keeping up with ML in Production May 2021 26 / 50
Challenge: quantifying distribution shift
We only have 11 features, so we’ll skip the dimensionality reduction
step
Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
Challenge: quantifying distribution shift
We only have 11 features, so we’ll skip the dimensionality reduction
step
"Multiple univariate testing seem to offer comparable performance to
multivariate testing" (Rabanser et al.)
Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
Challenge: quantifying distribution shift
We only have 11 features, so we’ll skip the dimensionality reduction
step
"Multiple univariate testing seem to offer comparable performance to
multivariate testing" (Rabanser et al.)
Maybe this is specific to their MNIST and CIFAR experiments
Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
Challenge: quantifying distribution shift
We only have 11 features, so we’ll skip the dimensionality reduction
step
"Multiple univariate testing seem to offer comparable performance to
multivariate testing" (Rabanser et al.)
Maybe this is specific to their MNIST and CIFAR experiments
Regardless, we will employ multiple univariate testing
Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
Challenge: quantifying distribution shift
We only have 11 features, so we’ll skip the dimensionality reduction
step
"Multiple univariate testing seem to offer comparable performance to
multivariate testing" (Rabanser et al.)
Maybe this is specific to their MNIST and CIFAR experiments
Regardless, we will employ multiple univariate testing
For each feature, we will run a 2-sided Kolmogorov-Smirnov test
Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
Challenge: quantifying distribution shift
We only have 11 features, so we’ll skip the dimensionality reduction
step
"Multiple univariate testing seem to offer comparable performance to
multivariate testing" (Rabanser et al.)
Maybe this is specific to their MNIST and CIFAR experiments
Regardless, we will employ multiple univariate testing
For each feature, we will run a 2-sided Kolmogorov-Smirnov test
Continuous data, non-parametric test
Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
Challenge: quantifying distribution shift
We only have 11 features, so we’ll skip the dimensionality reduction
step
"Multiple univariate testing seem to offer comparable performance to
multivariate testing" (Rabanser et al.)
Maybe this is specific to their MNIST and CIFAR experiments
Regardless, we will employ multiple univariate testing
For each feature, we will run a 2-sided Kolmogorov-Smirnov test
Continuous data, non-parametric test
Compute the largest difference
Z = sup
z
|Fp(z) − Fq(z)|
where Fp and Fq are the empirical CDFs of the data we are comparing
Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
Kolmogorov-Smirnov test made simple
For each feature, we will run a 2-sided Kolmogorov-Smirnov test
Shreya Shankar Keeping up with ML in Production May 2021 28 / 50
Kolmogorov-Smirnov test made simple
For each feature, we will run a 2-sided Kolmogorov-Smirnov test
Continuous data, non-parametric test
Shreya Shankar Keeping up with ML in Production May 2021 28 / 50
Kolmogorov-Smirnov test made simple
For each feature, we will run a 2-sided Kolmogorov-Smirnov test
Continuous data, non-parametric test
Compute the largest difference
Z = sup
z
|Fp(z) − Fq(z)|
where Fp and Fq are the empirical CDFs of the data we are comparing
Shreya Shankar Keeping up with ML in Production May 2021 28 / 50
Kolmogorov-Smirnov test made simple
For each feature, we will run a 2-sided Kolmogorov-Smirnov test
Continuous data, non-parametric test
Compute the largest difference
Z = sup
z
|Fp(z) − Fq(z)|
where Fp and Fq are the empirical CDFs of the data we are comparing
Example comparing two random normally distributed PDFs
Shreya Shankar Keeping up with ML in Production May 2021 28 / 50
How do you know when the data has "drifted?"
Comparing January 2020 and February 2020 datasets using the KS test:
Output
feature statistic p_value
0 pickup_weekday 0.046196 0.000000e+00
2 work_hours 0.028587 0.000000e+00
6 trip_time 0.017205 0.000000e+00
7 trip_speed 0.035415 0.000000e+00
1 pickup_hour 0.009676 8.610133e-258
5 trip_distance 0.005312 5.266602e-78
8 PULocationID 0.004083 2.994877e-46
9 DOLocationID 0.003132 2.157559e-27
4 passenger_count 0.002947 2.634493e-24
10 RatecodeID 0.002616 3.047481e-19
3 pickup_minute 0.000702 8.861498e-02
Shreya Shankar Keeping up with ML in Production May 2021 29 / 50
How do you know when the data has "drifted?"
For our "offline" train and evaluation sets (January and February), we
get extremely low p-values!
5
Lin, Lucas, and Shmueli, “Too Big to Fail: Large Samples and the p-Value Problem”.
Shreya Shankar Keeping up with ML in Production May 2021 30 / 50
How do you know when the data has "drifted?"
For our "offline" train and evaluation sets (January and February), we
get extremely low p-values!
This method can have a high "false positive rate"
5
Lin, Lucas, and Shmueli, “Too Big to Fail: Large Samples and the p-Value Problem”.
Shreya Shankar Keeping up with ML in Production May 2021 30 / 50
How do you know when the data has "drifted?"
For our "offline" train and evaluation sets (January and February), we
get extremely low p-values!
This method can have a high "false positive rate"
In my experience, this method has flagged distributions as
"significantly different" more than I want it to
5
Lin, Lucas, and Shmueli, “Too Big to Fail: Large Samples and the p-Value Problem”.
Shreya Shankar Keeping up with ML in Production May 2021 30 / 50
How do you know when the data has "drifted?"
For our "offline" train and evaluation sets (January and February), we
get extremely low p-values!
This method can have a high "false positive rate"
In my experience, this method has flagged distributions as
"significantly different" more than I want it to
In the era of "big data" (we have millions of data points), p-values are
not useful to look at5
5
Lin, Lucas, and Shmueli, “Too Big to Fail: Large Samples and the p-Value Problem”.
Shreya Shankar Keeping up with ML in Production May 2021 30 / 50
How do you know when the data has "drifted?"
You don’t really know.
An unsatisfying solution is to retrain the model to be as fresh as possible.
Shreya Shankar Keeping up with ML in Production May 2021 31 / 50
Types of production ML bugs
Data changed or corrupted
Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
Types of production ML bugs
Data changed or corrupted
Data dependency / infra issues
Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
Types of production ML bugs
Data changed or corrupted
Data dependency / infra issues
Logical error in the ETL code
Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
Types of production ML bugs
Data changed or corrupted
Data dependency / infra issues
Logical error in the ETL code
Logical error in retraining a model
Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
Types of production ML bugs
Data changed or corrupted
Data dependency / infra issues
Logical error in the ETL code
Logical error in retraining a model
Logical error in promoting a retrained model to inference step
Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
Types of production ML bugs
Data changed or corrupted
Data dependency / infra issues
Logical error in the ETL code
Logical error in retraining a model
Logical error in promoting a retrained model to inference step
Change in consumer behavior
Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
Types of production ML bugs
Data changed or corrupted
Data dependency / infra issues
Logical error in the ETL code
Logical error in retraining a model
Logical error in promoting a retrained model to inference step
Change in consumer behavior
Many more
Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
Why production ML bugs are hard to find and
address
ML code fails silently (few runtime or compiler errors)
Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
Why production ML bugs are hard to find and
address
ML code fails silently (few runtime or compiler errors)
Little or no unit or integration testing in feature engineering and ML
code
Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
Why production ML bugs are hard to find and
address
ML code fails silently (few runtime or compiler errors)
Little or no unit or integration testing in feature engineering and ML
code
Very few (if any) people have end-to-end visibility on an ML pipeline
Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
Why production ML bugs are hard to find and
address
ML code fails silently (few runtime or compiler errors)
Little or no unit or integration testing in feature engineering and ML
code
Very few (if any) people have end-to-end visibility on an ML pipeline
Underdeveloped monitoring and tracing tools
Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
Why production ML bugs are hard to find and
address
ML code fails silently (few runtime or compiler errors)
Little or no unit or integration testing in feature engineering and ML
code
Very few (if any) people have end-to-end visibility on an ML pipeline
Underdeveloped monitoring and tracing tools
So many artifacts to keep track of, especially if you retrain your models
Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
First-pass monitoring
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
First-pass monitoring
Model output distributions
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
First-pass monitoring
Model output distributions
Averages, percentiles
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
First-pass monitoring
Model output distributions
Averages, percentiles
More advanced monitoring
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
First-pass monitoring
Model output distributions
Averages, percentiles
More advanced monitoring
Model input (feature) distributions
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
First-pass monitoring
Model output distributions
Averages, percentiles
More advanced monitoring
Model input (feature) distributions
ETL intermediate output distributions
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
First-pass monitoring
Model output distributions
Averages, percentiles
More advanced monitoring
Model input (feature) distributions
ETL intermediate output distributions
Can get tedious if you have many features or steps in ETL
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
What to monitor?
Agile philosophy: "Continuous Delivery." Monitor proactively!
Especially hard when there are no labels
First-pass monitoring
Model output distributions
Averages, percentiles
More advanced monitoring
Model input (feature) distributions
ETL intermediate output distributions
Can get tedious if you have many features or steps in ETL
Still unclear how to quantify, detect, and act on
Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
Prometheus monitoring for ML
Selling points Pain points
Easy to call hist.observe(val)
User (you) needs to define the
histogram buckets
Easy to integrate with Grafana
Metric naming, tracking, and
visualization clutter
Many applications already use
Prometheus
PromQL is not very intuitive for
ML-based tasks
Shreya Shankar Keeping up with ML in Production May 2021 35 / 50
PromQL for ML monitoring
Question Query
Average output
[5m]
sum(rate(histogram_sum[5m])) /
sum(rate(histogram_count[5m])),
time series view
Median output
[5m]
histogram_quantile(0.5, sum by
(job, le)
(rate(histogram_bucket[5m]))), time
series view
Probability
distribution of
outputs [5m]
sum(rate(histogram_bucket[5m])) by
(le), heatmap view6
6
Jordan, A simple solution for monitoring ML systems.
Shreya Shankar Keeping up with ML in Production May 2021 36 / 50
Retraining regularly
Suppose we retrain our model on a weekly basis.
Use MLFlow Model Registry7 to keep track of which model is the best
model
Now, we need some tracing8...
7
https://www.mlflow.org/docs/latest/model-registry.html
8
Hermann and Del Balso, Scaling Machine Learning at Uber with Michelangelo.
Shreya Shankar Keeping up with ML in Production May 2021 37 / 50
Retraining regularly
Suppose we retrain our model on a weekly basis.
Use MLFlow Model Registry7 to keep track of which model is the best
model
Alternatively, can also manage versioning and a registry ourselves
(what we do now)
Now, we need some tracing8...
7
https://www.mlflow.org/docs/latest/model-registry.html
8
Hermann and Del Balso, Scaling Machine Learning at Uber with Michelangelo.
Shreya Shankar Keeping up with ML in Production May 2021 37 / 50
Retraining regularly
Suppose we retrain our model on a weekly basis.
Use MLFlow Model Registry7 to keep track of which model is the best
model
Alternatively, can also manage versioning and a registry ourselves
(what we do now)
At inference time, pull the latest/best model from our model registry
Now, we need some tracing8...
7
https://www.mlflow.org/docs/latest/model-registry.html
8
Hermann and Del Balso, Scaling Machine Learning at Uber with Michelangelo.
Shreya Shankar Keeping up with ML in Production May 2021 37 / 50
mltrace
Coarse-grained lineage and tracing
9
https://github.com/loglabs/mltrace
Shreya Shankar Keeping up with ML in Production May 2021 38 / 50
mltrace
Coarse-grained lineage and tracing
Designed specifically for complex data or ML pipelines
9
https://github.com/loglabs/mltrace
Shreya Shankar Keeping up with ML in Production May 2021 38 / 50
mltrace
Coarse-grained lineage and tracing
Designed specifically for complex data or ML pipelines
Designed specifically for Agile multidisciplinary teams
9
https://github.com/loglabs/mltrace
Shreya Shankar Keeping up with ML in Production May 2021 38 / 50
mltrace
Coarse-grained lineage and tracing
Designed specifically for complex data or ML pipelines
Designed specifically for Agile multidisciplinary teams
Alpha release9 contains Python API to log run information and UI to
view traces for outputs
9
https://github.com/loglabs/mltrace
Shreya Shankar Keeping up with ML in Production May 2021 38 / 50
mltrace design principles
Utter simiplicity (user knows everything going on)
Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
mltrace design principles
Utter simiplicity (user knows everything going on)
No need for users to manually set component dependencies – the tool
should detect dependencies based on I/O of a component’s run
Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
mltrace design principles
Utter simiplicity (user knows everything going on)
No need for users to manually set component dependencies – the tool
should detect dependencies based on I/O of a component’s run
API designed for both engineers and data scientists
Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
mltrace design principles
Utter simiplicity (user knows everything going on)
No need for users to manually set component dependencies – the tool
should detect dependencies based on I/O of a component’s run
API designed for both engineers and data scientists
UI designed for people to help triage issues even if they didn’t build
the ETL or models themselves
Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
mltrace design principles
Utter simiplicity (user knows everything going on)
No need for users to manually set component dependencies – the tool
should detect dependencies based on I/O of a component’s run
API designed for both engineers and data scientists
UI designed for people to help triage issues even if they didn’t build
the ETL or models themselves
When there is a bug, whose backlog do you put it on?
Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
mltrace design principles
Utter simiplicity (user knows everything going on)
No need for users to manually set component dependencies – the tool
should detect dependencies based on I/O of a component’s run
API designed for both engineers and data scientists
UI designed for people to help triage issues even if they didn’t build
the ETL or models themselves
When there is a bug, whose backlog do you put it on?
Enable people who may not have developed the model to investigate
the bug
Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
mltrace concepts
Pipelines are made of components (ex: cleaning, featuregen, split,
train)
10
https://mltrace.readthedocs.io/en/latest
11
https://docs.dagster.io/concepts/solids-pipelines/solids
12
https://www.mlflow.org/docs/latest/tracking.html
Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
mltrace concepts
Pipelines are made of components (ex: cleaning, featuregen, split,
train)
In ML pipelines, different artifacts are produced (inputs and outputs)
when the same component is run more than once
10
https://mltrace.readthedocs.io/en/latest
11
https://docs.dagster.io/concepts/solids-pipelines/solids
12
https://www.mlflow.org/docs/latest/tracking.html
Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
mltrace concepts
Pipelines are made of components (ex: cleaning, featuregen, split,
train)
In ML pipelines, different artifacts are produced (inputs and outputs)
when the same component is run more than once
Component and ComponentRun objects10
10
https://mltrace.readthedocs.io/en/latest
11
https://docs.dagster.io/concepts/solids-pipelines/solids
12
https://www.mlflow.org/docs/latest/tracking.html
Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
mltrace concepts
Pipelines are made of components (ex: cleaning, featuregen, split,
train)
In ML pipelines, different artifacts are produced (inputs and outputs)
when the same component is run more than once
Component and ComponentRun objects10
Decorator interface similar to Dagster "solids"11
10
https://mltrace.readthedocs.io/en/latest
11
https://docs.dagster.io/concepts/solids-pipelines/solids
12
https://www.mlflow.org/docs/latest/tracking.html
Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
mltrace concepts
Pipelines are made of components (ex: cleaning, featuregen, split,
train)
In ML pipelines, different artifacts are produced (inputs and outputs)
when the same component is run more than once
Component and ComponentRun objects10
Decorator interface similar to Dagster "solids"11
User specifies component name, input and output variables
10
https://mltrace.readthedocs.io/en/latest
11
https://docs.dagster.io/concepts/solids-pipelines/solids
12
https://www.mlflow.org/docs/latest/tracking.html
Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
mltrace concepts
Pipelines are made of components (ex: cleaning, featuregen, split,
train)
In ML pipelines, different artifacts are produced (inputs and outputs)
when the same component is run more than once
Component and ComponentRun objects10
Decorator interface similar to Dagster "solids"11
User specifies component name, input and output variables
Alternative Pythonic interface similar to MLFlow tracking12
10
https://mltrace.readthedocs.io/en/latest
11
https://docs.dagster.io/concepts/solids-pipelines/solids
12
https://www.mlflow.org/docs/latest/tracking.html
Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
mltrace concepts
Pipelines are made of components (ex: cleaning, featuregen, split,
train)
In ML pipelines, different artifacts are produced (inputs and outputs)
when the same component is run more than once
Component and ComponentRun objects10
Decorator interface similar to Dagster "solids"11
User specifies component name, input and output variables
Alternative Pythonic interface similar to MLFlow tracking12
User creates instance of ComponentRun object and calls
log_component_run on the object
10
https://mltrace.readthedocs.io/en/latest
11
https://docs.dagster.io/concepts/solids-pipelines/solids
12
https://www.mlflow.org/docs/latest/tracking.html
Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
mltrace integration example
def clean_data( raw_data_filename : str , raw_df: pd.DataFrame ,
month: str , year: str ,
component: str) -> str:
first , last = calendar.monthrange(int(year), int(month))
first_day = f’{year}-{month}-{first:02d}’
last_day = f’{year}-{month}-{last:02d}’
clean_df = helpers. remove_zero_fare_and_oob_rows (
raw_df , first_day , last_day)
# Write "clean" df to s3
output_path = io. save_output_df (clean_df , component)
return output_path
Shreya Shankar Keeping up with ML in Production May 2021 41 / 50
mltrace integration example
@register(’cleaning ’, input_vars=[’raw_data_filename ’],
output_vars=[’output_path ’])
def clean_data( raw_data_filename : str , raw_df: pd.DataFrame ,
month: str , year: str ,
component: str) -> str:
first , last = calendar.monthrange(int(year), int(month))
first_day = f’{year}-{month}-{first:02d}’
last_day = f’{year}-{month}-{last:02d}’
clean_df = helpers. remove_zero_fare_and_oob_rows (
raw_df , first_day , last_day)
# Write "clean" df to s3
output_path = io. save_output_df (clean_df , component)
return output_path
Shreya Shankar Keeping up with ML in Production May 2021 42 / 50
mltrace integration example
from mltrace import create_component , tag_component ,
register
create_component (’cleaning ’, ’Cleans the raw data with basic
OOB criteria.’, ’shreya ’)
tag_component(’cleaning ’, [’etl’])
# Run function
raw_df = io.read_file(month , year)
raw_data_filename = io. get_raw_data_filename (’2020 ’, ’01’)
output_path = clean_data(raw_data_filename , raw_df , ’01’, ’
2020 ’, ’clean/2020_01 ’)
print(output_path)
Shreya Shankar Keeping up with ML in Production May 2021 43 / 50
mltrace UI demo
Shreya Shankar Keeping up with ML in Production May 2021 44 / 50
mltrace immediate roadmap13
UI feature to easily see if a component is stale
13
Please email shreyashankar@berkeley.edu if you’re interested in contributing!
Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
mltrace immediate roadmap13
UI feature to easily see if a component is stale
There are newer versions of the component’s outputs
13
Please email shreyashankar@berkeley.edu if you’re interested in contributing!
Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
mltrace immediate roadmap13
UI feature to easily see if a component is stale
There are newer versions of the component’s outputs
The latest run of a component happened weeks or months ago
13
Please email shreyashankar@berkeley.edu if you’re interested in contributing!
Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
mltrace immediate roadmap13
UI feature to easily see if a component is stale
There are newer versions of the component’s outputs
The latest run of a component happened weeks or months ago
CLI to interact with mltrace logs
13
Please email shreyashankar@berkeley.edu if you’re interested in contributing!
Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
mltrace immediate roadmap13
UI feature to easily see if a component is stale
There are newer versions of the component’s outputs
The latest run of a component happened weeks or months ago
CLI to interact with mltrace logs
Scala Spark integrations (or REST API for logging)
13
Please email shreyashankar@berkeley.edu if you’re interested in contributing!
Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
mltrace immediate roadmap13
UI feature to easily see if a component is stale
There are newer versions of the component’s outputs
The latest run of a component happened weeks or months ago
CLI to interact with mltrace logs
Scala Spark integrations (or REST API for logging)
Prometheus integrations to easily monitor outputs
13
Please email shreyashankar@berkeley.edu if you’re interested in contributing!
Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
mltrace immediate roadmap13
UI feature to easily see if a component is stale
There are newer versions of the component’s outputs
The latest run of a component happened weeks or months ago
CLI to interact with mltrace logs
Scala Spark integrations (or REST API for logging)
Prometheus integrations to easily monitor outputs
Possibly other MLOps tool integrations
13
Please email shreyashankar@berkeley.edu if you’re interested in contributing!
Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
Talk recap
Introduced an end-to-end ML pipeline for a toy task
Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
Talk recap
Introduced an end-to-end ML pipeline for a toy task
Demonstrated performance drift or degradation when a model was
deployed via Prometheus metric logging and Grafana dashboard
visualizations
Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
Talk recap
Introduced an end-to-end ML pipeline for a toy task
Demonstrated performance drift or degradation when a model was
deployed via Prometheus metric logging and Grafana dashboard
visualizations
Showed via statistical tests that it’s hard to know when exactly to
retrain a model
Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
Talk recap
Introduced an end-to-end ML pipeline for a toy task
Demonstrated performance drift or degradation when a model was
deployed via Prometheus metric logging and Grafana dashboard
visualizations
Showed via statistical tests that it’s hard to know when exactly to
retrain a model
Discussed challenges around continuously retraining models and
maintaining ML production pipelines
Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
Talk recap
Introduced an end-to-end ML pipeline for a toy task
Demonstrated performance drift or degradation when a model was
deployed via Prometheus metric logging and Grafana dashboard
visualizations
Showed via statistical tests that it’s hard to know when exactly to
retrain a model
Discussed challenges around continuously retraining models and
maintaining ML production pipelines
Introduced mltrace, a tool developed for ML pipelines that performs
coarse-grained lineage and tracing
Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
Areas of future work in MLOps
Only scratched the surface with challenges discussed today
14
D’Amour et al., Underspecification Presents Challenges for Credibility in Modern
Machine Learning.
Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
Areas of future work in MLOps
Only scratched the surface with challenges discussed today
Many more of these problems around deep learning
14
D’Amour et al., Underspecification Presents Challenges for Credibility in Modern
Machine Learning.
Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
Areas of future work in MLOps
Only scratched the surface with challenges discussed today
Many more of these problems around deep learning
Embeddings as features – if someone updates the upstream embedding
model, do all the data scientists downstream need to immediately
change their models?
14
D’Amour et al., Underspecification Presents Challenges for Credibility in Modern
Machine Learning.
Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
Areas of future work in MLOps
Only scratched the surface with challenges discussed today
Many more of these problems around deep learning
Embeddings as features – if someone updates the upstream embedding
model, do all the data scientists downstream need to immediately
change their models?
"Underspecified" pipelines can pose threats14
14
D’Amour et al., Underspecification Presents Challenges for Credibility in Modern
Machine Learning.
Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
Areas of future work in MLOps
Only scratched the surface with challenges discussed today
Many more of these problems around deep learning
Embeddings as features – if someone updates the upstream embedding
model, do all the data scientists downstream need to immediately
change their models?
"Underspecified" pipelines can pose threats14
How to enable anyone to build dashboards or monitor pipelines?
14
D’Amour et al., Underspecification Presents Challenges for Credibility in Modern
Machine Learning.
Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
Areas of future work in MLOps
Only scratched the surface with challenges discussed today
Many more of these problems around deep learning
Embeddings as features – if someone updates the upstream embedding
model, do all the data scientists downstream need to immediately
change their models?
"Underspecified" pipelines can pose threats14
How to enable anyone to build dashboards or monitor pipelines?
ML people know what to monitor, infra people know how to monitor
14
D’Amour et al., Underspecification Presents Challenges for Credibility in Modern
Machine Learning.
Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
Areas of future work in MLOps
Only scratched the surface with challenges discussed today
Many more of these problems around deep learning
Embeddings as features – if someone updates the upstream embedding
model, do all the data scientists downstream need to immediately
change their models?
"Underspecified" pipelines can pose threats14
How to enable anyone to build dashboards or monitor pipelines?
ML people know what to monitor, infra people know how to monitor
We should strive to build tools to allow engineers and data scientists to
be more self sufficient
14
D’Amour et al., Underspecification Presents Challenges for Credibility in Modern
Machine Learning.
Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
Miscellaneous
Code for this talk:
https://github.com/shreyashankar/debugging-ml-talk
Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
Miscellaneous
Code for this talk:
https://github.com/shreyashankar/debugging-ml-talk
Toy ML Pipeline with mltrace, Prometheus, and Grafana:
https://github.com/shreyashankar/toy-ml-pipeline/tree/
shreyashankar/db
Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
Miscellaneous
Code for this talk:
https://github.com/shreyashankar/debugging-ml-talk
Toy ML Pipeline with mltrace, Prometheus, and Grafana:
https://github.com/shreyashankar/toy-ml-pipeline/tree/
shreyashankar/db
mltrace: https://github.com/loglabs/mltrace
Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
Miscellaneous
Code for this talk:
https://github.com/shreyashankar/debugging-ml-talk
Toy ML Pipeline with mltrace, Prometheus, and Grafana:
https://github.com/shreyashankar/toy-ml-pipeline/tree/
shreyashankar/db
mltrace: https://github.com/loglabs/mltrace
Contact info
Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
Miscellaneous
Code for this talk:
https://github.com/shreyashankar/debugging-ml-talk
Toy ML Pipeline with mltrace, Prometheus, and Grafana:
https://github.com/shreyashankar/toy-ml-pipeline/tree/
shreyashankar/db
mltrace: https://github.com/loglabs/mltrace
Contact info
Twitter: @sh_reya
Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
Miscellaneous
Code for this talk:
https://github.com/shreyashankar/debugging-ml-talk
Toy ML Pipeline with mltrace, Prometheus, and Grafana:
https://github.com/shreyashankar/toy-ml-pipeline/tree/
shreyashankar/db
mltrace: https://github.com/loglabs/mltrace
Contact info
Twitter: @sh_reya
Email: shreyashankar@berkeley.edu
Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
Miscellaneous
Code for this talk:
https://github.com/shreyashankar/debugging-ml-talk
Toy ML Pipeline with mltrace, Prometheus, and Grafana:
https://github.com/shreyashankar/toy-ml-pipeline/tree/
shreyashankar/db
mltrace: https://github.com/loglabs/mltrace
Contact info
Twitter: @sh_reya
Email: shreyashankar@berkeley.edu
Thank you!
Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
References
D’Amour, Alexander et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning. 2020.
arXiv: 2011.03395 [cs.LG].
Hermann, Jeremy and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo. 2018. URL:
https://eng.uber.com/scaling-michelangelo/.
Jordan, Jeremy. A simple solution for monitoring ML systems. 2021. URL:
https://www.jeremyjordan.me/ml-monitoring/.
Lin, Mingfeng, Henry Lucas, and Galit Shmueli. “Too Big to Fail: Large Samples and the p-Value Problem”. In:
Information Systems Research 24 (Dec. 2013), pp. 906–917. DOI: 10.1287/isre.2013.0480.
Rabanser, Stephan, Stephan Günnemann, and Zachary C. Lipton. Failing Loudly: An Empirical Study of Methods
for Detecting Dataset Shift. 2019. arXiv: 1810.11953 [stat.ML].
Shreya Shankar Keeping up with ML in Production May 2021 49 / 50
Feedback
Your feedback is important to us. Don’t forget to rate and review the
sessions.
Shreya Shankar Keeping up with ML in Production May 2021 50 / 50

More Related Content

What's hot

Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 
MBSE and Model-Based Testing with Capella
MBSE and Model-Based Testing with CapellaMBSE and Model-Based Testing with Capella
MBSE and Model-Based Testing with CapellaObeo
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleDatabricks
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...Databricks
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLJordan Birdsell
 
System of systems modeling with Capella
System of systems modeling with CapellaSystem of systems modeling with Capella
System of systems modeling with CapellaObeo
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDatabricks
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
[Capella Days 2020] Innovating with MBSE – Medical Device Example
[Capella Days 2020] Innovating with MBSE – Medical Device Example[Capella Days 2020] Innovating with MBSE – Medical Device Example
[Capella Days 2020] Innovating with MBSE – Medical Device ExampleObeo
 
Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...Obeo
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumSasha Rosenbaum
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsDatabricks
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflowDatabricks
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 

What's hot (20)

Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
MBSE and Model-Based Testing with Capella
MBSE and Model-Based Testing with CapellaMBSE and Model-Based Testing with Capella
MBSE and Model-Based Testing with Capella
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
System of systems modeling with Capella
System of systems modeling with CapellaSystem of systems modeling with Capella
System of systems modeling with Capella
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
[Capella Days 2020] Innovating with MBSE – Medical Device Example
[Capella Days 2020] Innovating with MBSE – Medical Device Example[Capella Days 2020] Innovating with MBSE – Medical Device Example
[Capella Days 2020] Innovating with MBSE – Medical Device Example
 
Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...Capella Days 2021 | An example of model-centric engineering environment with ...
Capella Days 2021 | An example of model-centric engineering environment with ...
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Seldon: Deploying Models at Scale
Seldon: Deploying Models at ScaleSeldon: Deploying Models at Scale
Seldon: Deploying Models at Scale
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 

Similar to Catch Me If You Can: Keeping Up With ML Models in Production

Micro Instructional Design for Problem-Based and Game-Based Learning
Micro Instructional Design for Problem-Based and Game-Based LearningMicro Instructional Design for Problem-Based and Game-Based Learning
Micro Instructional Design for Problem-Based and Game-Based LearningAndy Petroski
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineSrivatsan Srinivasan
 
Slay the Dragons of Agile Measurement
Slay the Dragons of Agile MeasurementSlay the Dragons of Agile Measurement
Slay the Dragons of Agile MeasurementJosiah Renaudin
 
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
[DSC Croatia 22] The Ethos of Process in ML - Leanne FitzpatrickDataScienceConferenc1
 
Experience with an Online Assessment in a Lecture about Fundamentals of Elect...
Experience with an Online Assessment in a Lecture about Fundamentals of Elect...Experience with an Online Assessment in a Lecture about Fundamentals of Elect...
Experience with an Online Assessment in a Lecture about Fundamentals of Elect...Mathias Magdowski
 
SharePoint Maturity and Business Process - Why They're Inseparable! - as pres...
SharePoint Maturity and Business Process - Why They're Inseparable! - as pres...SharePoint Maturity and Business Process - Why They're Inseparable! - as pres...
SharePoint Maturity and Business Process - Why They're Inseparable! - as pres...Sadalit Van Buren
 
vodQA Pune - Innovations in Testing - Agenda
vodQA Pune - Innovations in Testing - AgendavodQA Pune - Innovations in Testing - Agenda
vodQA Pune - Innovations in Testing - AgendavodQA
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
Intro to ML for product school meetup
Intro to ML for product school meetupIntro to ML for product school meetup
Intro to ML for product school meetupErez Shilon
 
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...Edureka!
 
BigMLSchool: ML Platforms and AutoML in the Enterprise
BigMLSchool: ML Platforms and AutoML in the EnterpriseBigMLSchool: ML Platforms and AutoML in the Enterprise
BigMLSchool: ML Platforms and AutoML in the EnterpriseBigML, Inc
 
Psychology and Engineering of Testing
Psychology and Engineering of TestingPsychology and Engineering of Testing
Psychology and Engineering of TestingIlari Henrik Aegerter
 
612016 Data Modeling Scoring Guidehttpscourserooma.ca.docx
612016 Data Modeling Scoring Guidehttpscourserooma.ca.docx612016 Data Modeling Scoring Guidehttpscourserooma.ca.docx
612016 Data Modeling Scoring Guidehttpscourserooma.ca.docxalinainglis
 
Boost Your Data Career with Predictive Analytics! Learn How ?
Boost Your Data Career with Predictive Analytics! Learn How ? Boost Your Data Career with Predictive Analytics! Learn How ?
Boost Your Data Career with Predictive Analytics! Learn How ? Edureka!
 

Similar to Catch Me If You Can: Keeping Up With ML Models in Production (20)

Micro Instructional Design for Problem-Based and Game-Based Learning
Micro Instructional Design for Problem-Based and Game-Based LearningMicro Instructional Design for Problem-Based and Game-Based Learning
Micro Instructional Design for Problem-Based and Game-Based Learning
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning Pipeline
 
Slay the Dragons of Agile Measurement
Slay the Dragons of Agile MeasurementSlay the Dragons of Agile Measurement
Slay the Dragons of Agile Measurement
 
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
 
Agile Development
Agile DevelopmentAgile Development
Agile Development
 
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
 
Experience with an Online Assessment in a Lecture about Fundamentals of Elect...
Experience with an Online Assessment in a Lecture about Fundamentals of Elect...Experience with an Online Assessment in a Lecture about Fundamentals of Elect...
Experience with an Online Assessment in a Lecture about Fundamentals of Elect...
 
SharePoint Maturity and Business Process - Why They're Inseparable! - as pres...
SharePoint Maturity and Business Process - Why They're Inseparable! - as pres...SharePoint Maturity and Business Process - Why They're Inseparable! - as pres...
SharePoint Maturity and Business Process - Why They're Inseparable! - as pres...
 
vodQA Pune - Innovations in Testing - Agenda
vodQA Pune - Innovations in Testing - AgendavodQA Pune - Innovations in Testing - Agenda
vodQA Pune - Innovations in Testing - Agenda
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Intro to ML for product school meetup
Intro to ML for product school meetupIntro to ML for product school meetup
Intro to ML for product school meetup
 
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
 
ML Playbook
ML PlaybookML Playbook
ML Playbook
 
Final Year.pdf
Final Year.pdfFinal Year.pdf
Final Year.pdf
 
BigMLSchool: ML Platforms and AutoML in the Enterprise
BigMLSchool: ML Platforms and AutoML in the EnterpriseBigMLSchool: ML Platforms and AutoML in the Enterprise
BigMLSchool: ML Platforms and AutoML in the Enterprise
 
Psychology and Engineering of Testing
Psychology and Engineering of TestingPsychology and Engineering of Testing
Psychology and Engineering of Testing
 
612016 Data Modeling Scoring Guidehttpscourserooma.ca.docx
612016 Data Modeling Scoring Guidehttpscourserooma.ca.docx612016 Data Modeling Scoring Guidehttpscourserooma.ca.docx
612016 Data Modeling Scoring Guidehttpscourserooma.ca.docx
 
Boost Your Data Career with Predictive Analytics! Learn How ?
Boost Your Data Career with Predictive Analytics! Learn How ? Boost Your Data Career with Predictive Analytics! Learn How ?
Boost Your Data Career with Predictive Analytics! Learn How ?
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

Catch Me If You Can: Keeping Up With ML Models in Production

  • 1. Catch Me If You Can: Keeping up with ML Models in Production Shreya Shankar May 2021 Shreya Shankar Keeping up with ML in Production May 2021 1 / 50
  • 2. Outline 1 Introduction 2 Case study: deploying a simple model 3 Post-deployment challenges 4 Monitoring and tracing 5 Conclusion Shreya Shankar Keeping up with ML in Production May 2021 2 / 50
  • 3. Introductions BS and MS from Stanford Computer Science Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
  • 4. Introductions BS and MS from Stanford Computer Science First machine learning engineer at Viaduct, an applied ML startup Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
  • 5. Introductions BS and MS from Stanford Computer Science First machine learning engineer at Viaduct, an applied ML startup Working with terabytes of time series data Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
  • 6. Introductions BS and MS from Stanford Computer Science First machine learning engineer at Viaduct, an applied ML startup Working with terabytes of time series data Building infrastructure for large-scale machine learning and data analytics Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
  • 7. Introductions BS and MS from Stanford Computer Science First machine learning engineer at Viaduct, an applied ML startup Working with terabytes of time series data Building infrastructure for large-scale machine learning and data analytics Responsibilities spanning recruiting, SWE, ML, product, and more Shreya Shankar Keeping up with ML in Production May 2021 3 / 50
  • 8. Academic and industry ML are different Most academic ML efforts are focused on training techniques Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
  • 9. Academic and industry ML are different Most academic ML efforts are focused on training techniques Fairness Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
  • 10. Academic and industry ML are different Most academic ML efforts are focused on training techniques Fairness Generalizability to test sets with unknown distributions Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
  • 11. Academic and industry ML are different Most academic ML efforts are focused on training techniques Fairness Generalizability to test sets with unknown distributions Security Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
  • 12. Academic and industry ML are different Most academic ML efforts are focused on training techniques Fairness Generalizability to test sets with unknown distributions Security Industry places emphasis on Agile working style Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
  • 13. Academic and industry ML are different Most academic ML efforts are focused on training techniques Fairness Generalizability to test sets with unknown distributions Security Industry places emphasis on Agile working style In industry, we want to train few models but do lots of inference Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
  • 14. Academic and industry ML are different Most academic ML efforts are focused on training techniques Fairness Generalizability to test sets with unknown distributions Security Industry places emphasis on Agile working style In industry, we want to train few models but do lots of inference What happens beyond the validation or test set? Shreya Shankar Keeping up with ML in Production May 2021 4 / 50
  • 15. The depressing truth about ML IRL 87% of data science projects don’t make it to production1 1 https://venturebeat.com/2019/07/19/ why-do-87-of-data-science-projects-never-make-it-into-production/ Shreya Shankar Keeping up with ML in Production May 2021 5 / 50
  • 16. The depressing truth about ML IRL 87% of data science projects don’t make it to production1 Data in the "real world" is not necessarily clean and balanced, like canonical benchmark datasets (ex: ImageNet) 1 https://venturebeat.com/2019/07/19/ why-do-87-of-data-science-projects-never-make-it-into-production/ Shreya Shankar Keeping up with ML in Production May 2021 5 / 50
  • 17. The depressing truth about ML IRL 87% of data science projects don’t make it to production1 Data in the "real world" is not necessarily clean and balanced, like canonical benchmark datasets (ex: ImageNet) Data in the "real world" is always changing 1 https://venturebeat.com/2019/07/19/ why-do-87-of-data-science-projects-never-make-it-into-production/ Shreya Shankar Keeping up with ML in Production May 2021 5 / 50
  • 18. The depressing truth about ML IRL 87% of data science projects don’t make it to production1 Data in the "real world" is not necessarily clean and balanced, like canonical benchmark datasets (ex: ImageNet) Data in the "real world" is always changing Showing high performance on a fixed train and validation set 6= consistent high performance when that model is deployed 1 https://venturebeat.com/2019/07/19/ why-do-87-of-data-science-projects-never-make-it-into-production/ Shreya Shankar Keeping up with ML in Production May 2021 5 / 50
  • 19. Today’s talk Not about how to improve model performance on a validation or test set Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
  • 20. Today’s talk Not about how to improve model performance on a validation or test set Not about feature engineering, machine learning algorithms, or data science Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
  • 21. Today’s talk Not about how to improve model performance on a validation or test set Not about feature engineering, machine learning algorithms, or data science Discussing: Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
  • 22. Today’s talk Not about how to improve model performance on a validation or test set Not about feature engineering, machine learning algorithms, or data science Discussing: Case study of challenges faced post-deployment of models Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
  • 23. Today’s talk Not about how to improve model performance on a validation or test set Not about feature engineering, machine learning algorithms, or data science Discussing: Case study of challenges faced post-deployment of models Showcase of tools to monitor production pipelines and enable Agile teams of different stakeholders to quickly identify performance degradation Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
  • 24. Today’s talk Not about how to improve model performance on a validation or test set Not about feature engineering, machine learning algorithms, or data science Discussing: Case study of challenges faced post-deployment of models Showcase of tools to monitor production pipelines and enable Agile teams of different stakeholders to quickly identify performance degradation Motivating when to retrain machine learning models in production Shreya Shankar Keeping up with ML in Production May 2021 6 / 50
  • 25. General problem statement Common to experience performance drift after a model is deployed Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
  • 26. General problem statement Common to experience performance drift after a model is deployed Unavoidable in many cases Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
  • 27. General problem statement Common to experience performance drift after a model is deployed Unavoidable in many cases Time series data Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
  • 28. General problem statement Common to experience performance drift after a model is deployed Unavoidable in many cases Time series data High-variance data Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
  • 29. General problem statement Common to experience performance drift after a model is deployed Unavoidable in many cases Time series data High-variance data Intuitively, a model and pipeline should not be stale Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
  • 30. General problem statement Common to experience performance drift after a model is deployed Unavoidable in many cases Time series data High-variance data Intuitively, a model and pipeline should not be stale Want the pipeline to reflect most recent data points Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
  • 31. General problem statement Common to experience performance drift after a model is deployed Unavoidable in many cases Time series data High-variance data Intuitively, a model and pipeline should not be stale Want the pipeline to reflect most recent data points Will simulate performance drift with a toy task — predicting whether a taxi rider will give their driver a high tip or not Shreya Shankar Keeping up with ML in Production May 2021 7 / 50
  • 32. Case study description Using publicly available NYC Taxicab Trip Data (data in public S3 bucket s3://nyc-tlc/trip data/) Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
  • 33. Case study description Using publicly available NYC Taxicab Trip Data (data in public S3 bucket s3://nyc-tlc/trip data/) For this exercise, we will train and "deploy" a model to predict whether the user gave a large tip Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
  • 34. Case study description Using publicly available NYC Taxicab Trip Data (data in public S3 bucket s3://nyc-tlc/trip data/) For this exercise, we will train and "deploy" a model to predict whether the user gave a large tip Using Python, Prometheus, Grafana, mlflow, mltrace Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
  • 35. Case study description Using publicly available NYC Taxicab Trip Data (data in public S3 bucket s3://nyc-tlc/trip data/) For this exercise, we will train and "deploy" a model to predict whether the user gave a large tip Using Python, Prometheus, Grafana, mlflow, mltrace Goal of this exercise is to demonstrate what can go wrong post-deployment and how to troubleshoot bugs Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
  • 36. Case study description Using publicly available NYC Taxicab Trip Data (data in public S3 bucket s3://nyc-tlc/trip data/) For this exercise, we will train and "deploy" a model to predict whether the user gave a large tip Using Python, Prometheus, Grafana, mlflow, mltrace Goal of this exercise is to demonstrate what can go wrong post-deployment and how to troubleshoot bugs Diagnosing failure points in the pipeline is challenging for Agile teams when pipelines are very complex! Shreya Shankar Keeping up with ML in Production May 2021 8 / 50
  • 37. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 38. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 Tabular dataset where each record corresponds to one taxi ride 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 39. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 Tabular dataset where each record corresponds to one taxi ride Pickup and dropoff times 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 40. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 Tabular dataset where each record corresponds to one taxi ride Pickup and dropoff times Number of passengers 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 41. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 Tabular dataset where each record corresponds to one taxi ride Pickup and dropoff times Number of passengers Trip distance 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 42. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 Tabular dataset where each record corresponds to one taxi ride Pickup and dropoff times Number of passengers Trip distance Pickup and dropoff location zones 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 43. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 Tabular dataset where each record corresponds to one taxi ride Pickup and dropoff times Number of passengers Trip distance Pickup and dropoff location zones Fare, toll, and tip amounts 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 44. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 Tabular dataset where each record corresponds to one taxi ride Pickup and dropoff times Number of passengers Trip distance Pickup and dropoff location zones Fare, toll, and tip amounts Monthly files from Jan 2009 to June 2020 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 45. Dataset description Simple toy example using the publicly available NYC Taxicab Trip Data2 Tabular dataset where each record corresponds to one taxi ride Pickup and dropoff times Number of passengers Trip distance Pickup and dropoff location zones Fare, toll, and tip amounts Monthly files from Jan 2009 to June 2020 Each month is about 1 GB of data → over 100 GB 2 https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page Shreya Shankar Keeping up with ML in Production May 2021 9 / 50
  • 46. Learning task Let X be the train covariate matrix (features) and y be the train target vector (labels) Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
  • 47. Learning task Let X be the train covariate matrix (features) and y be the train target vector (labels) Xi represents an n-dim feature vector corresponding to the ith ride and yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the total fare Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
  • 48. Learning task Let X be the train covariate matrix (features) and y be the train target vector (labels) Xi represents an n-dim feature vector corresponding to the ith ride and yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the total fare Want to learn classifier f such that f (Xi ) ≈ yi for all Xi ∈ X Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
  • 49. Learning task Let X be the train covariate matrix (features) and y be the train target vector (labels) Xi represents an n-dim feature vector corresponding to the ith ride and yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the total fare Want to learn classifier f such that f (Xi ) ≈ yi for all Xi ∈ X We do not want to overfit, meaning F1(Xtrain) ≈ F1(Xtest) Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
  • 50. Learning task Let X be the train covariate matrix (features) and y be the train target vector (labels) Xi represents an n-dim feature vector corresponding to the ith ride and yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the total fare Want to learn classifier f such that f (Xi ) ≈ yi for all Xi ∈ X We do not want to overfit, meaning F1(Xtrain) ≈ F1(Xtest) We will use a RandomForestClassifier with basic hyperparameters Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
  • 51. Learning task Let X be the train covariate matrix (features) and y be the train target vector (labels) Xi represents an n-dim feature vector corresponding to the ith ride and yi ∈ [0, 1] where 1 represents the rider tipping more than 20% of the total fare Want to learn classifier f such that f (Xi ) ≈ yi for all Xi ∈ X We do not want to overfit, meaning F1(Xtrain) ≈ F1(Xtest) We will use a RandomForestClassifier with basic hyperparameters n_estimators=100, max_depth=10 Shreya Shankar Keeping up with ML in Production May 2021 10 / 50
  • 52. Toy ML Pipeline Code lives in toy-ml-pipeline3, an example of an end-to-end pipeline for our high tip prediction task. 3 https://github.com/shreyashankar/toy-ml-pipeline Shreya Shankar Keeping up with ML in Production May 2021 11 / 50
  • 53. Toy ML pipeline utilities S3 bucket for intermediate storage Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
  • 54. Toy ML pipeline utilities S3 bucket for intermediate storage io library Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
  • 55. Toy ML pipeline utilities S3 bucket for intermediate storage io library dumps versioned output of each stage every time a stage is run Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
  • 56. Toy ML pipeline utilities S3 bucket for intermediate storage io library dumps versioned output of each stage every time a stage is run load latest version of outputs so we can use them as inputs to another stage Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
  • 57. Toy ML pipeline utilities S3 bucket for intermediate storage io library dumps versioned output of each stage every time a stage is run load latest version of outputs so we can use them as inputs to another stage Serve predictions via Flask application global mw mw = models. RandomForestModelWrapper .load(’training/ models ’) @app.route(’/predict ’, methods=[’POST ’]) def predict (): req = request.get_json () df = pd.read_json(req[’data ’]) df[’prediction ’] = mw.predict(df) result = { ’prediction ’: df[’prediction ’].to_list () } return jsonify(result) Shreya Shankar Keeping up with ML in Production May 2021 12 / 50
  • 58. ETL Cleaning stage reads in raw data for each month and removes: Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
  • 59. ETL Cleaning stage reads in raw data for each month and removes: Rows with timestamps that don’t fall within the month range Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
  • 60. ETL Cleaning stage reads in raw data for each month and removes: Rows with timestamps that don’t fall within the month range Rows with $0 fares Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
  • 61. ETL Cleaning stage reads in raw data for each month and removes: Rows with timestamps that don’t fall within the month range Rows with $0 fares Featuregen stage computes the following features: Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
  • 62. ETL Cleaning stage reads in raw data for each month and removes: Rows with timestamps that don’t fall within the month range Rows with $0 fares Featuregen stage computes the following features: Trip-based: passenger count, trip distance, trip time, trip speed Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
  • 63. ETL Cleaning stage reads in raw data for each month and removes: Rows with timestamps that don’t fall within the month range Rows with $0 fares Featuregen stage computes the following features: Trip-based: passenger count, trip distance, trip time, trip speed Pickup-based: pickup weekday, pickup hour, pickup minute, work hour indicator Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
  • 64. ETL Cleaning stage reads in raw data for each month and removes: Rows with timestamps that don’t fall within the month range Rows with $0 fares Featuregen stage computes the following features: Trip-based: passenger count, trip distance, trip time, trip speed Pickup-based: pickup weekday, pickup hour, pickup minute, work hour indicator Basic categorical: pickup location code, dropoff location code, type of ride Shreya Shankar Keeping up with ML in Production May 2021 13 / 50
  • 65. Offline training and evaluation Train on January 2020, validate on February 2020 Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 66. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 67. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric precision = number of true positives number of records predicted to be positive Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 68. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric precision = number of true positives number of records predicted to be positive recall = number of true positives number of positives in the dataset Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 69. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric precision = number of true positives number of records predicted to be positive recall = number of true positives number of positives in the dataset F1 = 2∗precision∗recall precision+recall Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 70. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric precision = number of true positives number of records predicted to be positive recall = number of true positives number of positives in the dataset F1 = 2∗precision∗recall precision+recall Higher F1 score is better, want to have low false positives and false negatives Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 71. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric precision = number of true positives number of records predicted to be positive recall = number of true positives number of positives in the dataset F1 = 2∗precision∗recall precision+recall Higher F1 score is better, want to have low false positives and false negatives Steps Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 72. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric precision = number of true positives number of records predicted to be positive recall = number of true positives number of positives in the dataset F1 = 2∗precision∗recall precision+recall Higher F1 score is better, want to have low false positives and false negatives Steps 1 Split the output of the featuregen stage into train and validation sets Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 73. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric precision = number of true positives number of records predicted to be positive recall = number of true positives number of positives in the dataset F1 = 2∗precision∗recall precision+recall Higher F1 score is better, want to have low false positives and false negatives Steps 1 Split the output of the featuregen stage into train and validation sets 2 Train and validate the model in a notebook Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 74. Offline training and evaluation Train on January 2020, validate on February 2020 We will measure F1 score for our metric precision = number of true positives number of records predicted to be positive recall = number of true positives number of positives in the dataset F1 = 2∗precision∗recall precision+recall Higher F1 score is better, want to have low false positives and false negatives Steps 1 Split the output of the featuregen stage into train and validation sets 2 Train and validate the model in a notebook 3 Train the model for production in another notebook Shreya Shankar Keeping up with ML in Production May 2021 14 / 50
  • 75. Train and test split code snippet train_month = ’2020_01 ’ test_month = ’2020_02 ’ # Load latest features for train and test sets train_df = io. load_output_df (os.path.join(’features ’, train_month)) test_df = io.load_output_df(os.path.join(’features ’, test_month)) # Save train and test sets print(io.save_output_df (train_df , ’training/files/train ’)) print(io.save_output_df (test_df , ’training/files/test ’)) Output s3://toy-applied-ml-pipeline/dev/training/files/train/20210501-165847.pq s3://toy-applied-ml-pipeline/dev/training/files/test/20210501-170136.pq Shreya Shankar Keeping up with ML in Production May 2021 15 / 50
  • 76. Training code snippet feature_columns = [ ’pickup_weekday ’, ’pickup_hour ’, ’pickup_minute ’, ’work_hours ’, ’passenger_count ’, ’trip_distance ’, ’trip_time ’, ’trip_speed ’, ’PULocationID ’, ’DOLocationID ’, ’RatecodeID ’ ] label_column = ’high_tip_indicator ’ model_params = {’max_depth ’: 10} # Create and train model mw = models. RandomForestModelWrapper (feature_columns = feature_columns , model_params= model_params) mw.train(train_df , label_column) Shreya Shankar Keeping up with ML in Production May 2021 16 / 50
  • 77. Training and test set evaluation train_score = mw.score(train_df , label_column) test_score = mw.score(test_df , label_column) mw.add_metric(’train_f1 ’, train_score) mw.add_metric(’test_f1 ’, test_score) print(’Metrics:’) print(mw.get_metrics ()) Output Metrics: ’train_f1’: 0.7310504153094398, ’test_f1’: 0.735223195868965 Shreya Shankar Keeping up with ML in Production May 2021 17 / 50
  • 78. Train the production model Simulate deployment in March 2020 (COVID-19 incoming!) Shreya Shankar Keeping up with ML in Production May 2021 18 / 50
  • 79. Train the production model Simulate deployment in March 2020 (COVID-19 incoming!) In practice, train models on different windows/do more thorough data science Shreya Shankar Keeping up with ML in Production May 2021 18 / 50
  • 80. Train the production model Simulate deployment in March 2020 (COVID-19 incoming!) In practice, train models on different windows/do more thorough data science Here, we will just train a model on February 2020 with the same parameters that we explored before train_df = io. load_output_df (os.path.join(’features ’, ’ 2020 -02’)) prod_mw = models. RandomForestModelWrapper ( feature_columns = feature_columns , model_params=model_params) prod_mw.train(train_df , label_column) train_score = prod_mw.score(train_df , label_column) prod_mw.add_metric(’train_f1 ’, train_score) prod_mw.save(’training/models ’, dev=False) Shreya Shankar Keeping up with ML in Production May 2021 18 / 50
  • 81. Deploy model Flask server that runs predict on features passed in via request Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
  • 82. Deploy model Flask server that runs predict on features passed in via request Separate file inference.py that repeatedly sends requests containing features Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
  • 83. Deploy model Flask server that runs predict on features passed in via request Separate file inference.py that repeatedly sends requests containing features Logging metrics in the Flask app via Prometheus Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
  • 84. Deploy model Flask server that runs predict on features passed in via request Separate file inference.py that repeatedly sends requests containing features Logging metrics in the Flask app via Prometheus Visualizing metrics via Grafana Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
  • 85. Deploy model Flask server that runs predict on features passed in via request Separate file inference.py that repeatedly sends requests containing features Logging metrics in the Flask app via Prometheus Visualizing metrics via Grafana Demo Shreya Shankar Keeping up with ML in Production May 2021 19 / 50
  • 86. Inspect F1 scores at the end of every day Output day rolling_f1 daily_f1 178123 1 0.576629 0.576629 370840 2 0.633320 0.677398 592741 3 0.649983 0.675877 . . . 1514827 28 0.659934 0.501860 1520358 29 0.659764 0.537860 1529847 30 0.659530 0.576178 Shreya Shankar Keeping up with ML in Production May 2021 20 / 50
  • 87. Deeper dive into March F1 scores day rolling_f1 daily_f1 178123 1 0.576629 0.576629 370840 2 0.633320 0.677398 592741 3 0.649983 0.675877 . . . 1507234 27 0.660228 0.576993 1514827 28 0.659934 0.501860 1520358 29 0.659764 0.537860 1529847 30 0.659530 0.576178 Large discrepancy between rolling metric and daily metric Shreya Shankar Keeping up with ML in Production May 2021 21 / 50
  • 88. Deeper dive into March F1 scores day rolling_f1 daily_f1 178123 1 0.576629 0.576629 370840 2 0.633320 0.677398 592741 3 0.649983 0.675877 . . . 1507234 27 0.660228 0.576993 1514827 28 0.659934 0.501860 1520358 29 0.659764 0.537860 1529847 30 0.659530 0.576178 Large discrepancy between rolling metric and daily metric What you monitor is important – daily metric significantly drops towards the end Shreya Shankar Keeping up with ML in Production May 2021 21 / 50
  • 89. Deeper dive into March F1 scores day rolling_f1 daily_f1 178123 1 0.576629 0.576629 370840 2 0.633320 0.677398 592741 3 0.649983 0.675877 . . . 1507234 27 0.660228 0.576993 1514827 28 0.659934 0.501860 1520358 29 0.659764 0.537860 1529847 30 0.659530 0.576178 Large discrepancy between rolling metric and daily metric What you monitor is important – daily metric significantly drops towards the end Visible effect of COVID-19 on the data Shreya Shankar Keeping up with ML in Production May 2021 21 / 50
  • 90. Evaluating the same model on following months Output 2020-04 F1: 0.5714705472990737 2020-05 F1: 0.5530868473460906 2020-06 F1: 0.5967621469282887 Performance gets significantly worse! We will need to retrain models periodically. Shreya Shankar Keeping up with ML in Production May 2021 22 / 50
  • 91. Challenge: lag In this example we are simulating a "live" deployment on historical data, so we have no lag Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
  • 92. Challenge: lag In this example we are simulating a "live" deployment on historical data, so we have no lag In practice there are many types of lag Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
  • 93. Challenge: lag In this example we are simulating a "live" deployment on historical data, so we have no lag In practice there are many types of lag Feature lag: our system only learns about the ride well after it has occurred Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
  • 94. Challenge: lag In this example we are simulating a "live" deployment on historical data, so we have no lag In practice there are many types of lag Feature lag: our system only learns about the ride well after it has occurred Label lag: our system only learns of the label (fare, tip) well after the ride has occurred Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
  • 95. Challenge: lag In this example we are simulating a "live" deployment on historical data, so we have no lag In practice there are many types of lag Feature lag: our system only learns about the ride well after it has occurred Label lag: our system only learns of the label (fare, tip) well after the ride has occurred The evaluation metric will inherently be lagging Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
  • 96. Challenge: lag In this example we are simulating a "live" deployment on historical data, so we have no lag In practice there are many types of lag Feature lag: our system only learns about the ride well after it has occurred Label lag: our system only learns of the label (fare, tip) well after the ride has occurred The evaluation metric will inherently be lagging We might not be able to train on the most recent data Shreya Shankar Keeping up with ML in Production May 2021 23 / 50
  • 97. Challenge: distribution shift Data "drifts" over time, and models will need to be retrained to reflect such drift Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
  • 98. Challenge: distribution shift Data "drifts" over time, and models will need to be retrained to reflect such drift Open question: how often do you retrain a model? Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
  • 99. Challenge: distribution shift Data "drifts" over time, and models will need to be retrained to reflect such drift Open question: how often do you retrain a model? Retraining adds complexity to the overall system (more artifacts to version and keep track of) Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
  • 100. Challenge: distribution shift Data "drifts" over time, and models will need to be retrained to reflect such drift Open question: how often do you retrain a model? Retraining adds complexity to the overall system (more artifacts to version and keep track of) Retraining can be expensive in terms of compute Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
  • 101. Challenge: distribution shift Data "drifts" over time, and models will need to be retrained to reflect such drift Open question: how often do you retrain a model? Retraining adds complexity to the overall system (more artifacts to version and keep track of) Retraining can be expensive in terms of compute Retraining can take time and energy from people Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
  • 102. Challenge: distribution shift Data "drifts" over time, and models will need to be retrained to reflect such drift Open question: how often do you retrain a model? Retraining adds complexity to the overall system (more artifacts to version and keep track of) Retraining can be expensive in terms of compute Retraining can take time and energy from people Open question: how do you know when the data has "drifted?" Shreya Shankar Keeping up with ML in Production May 2021 24 / 50
  • 103. Demo Shreya Shankar Keeping up with ML in Production May 2021 25 / 50
  • 104. How do you know when the data has "drifted?" 4 4 Rabanser, Günnemann, and Lipton, Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. Shreya Shankar Keeping up with ML in Production May 2021 26 / 50
  • 105. Challenge: quantifying distribution shift We only have 11 features, so we’ll skip the dimensionality reduction step Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
  • 106. Challenge: quantifying distribution shift We only have 11 features, so we’ll skip the dimensionality reduction step "Multiple univariate testing seem to offer comparable performance to multivariate testing" (Rabanser et al.) Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
  • 107. Challenge: quantifying distribution shift We only have 11 features, so we’ll skip the dimensionality reduction step "Multiple univariate testing seem to offer comparable performance to multivariate testing" (Rabanser et al.) Maybe this is specific to their MNIST and CIFAR experiments Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
  • 108. Challenge: quantifying distribution shift We only have 11 features, so we’ll skip the dimensionality reduction step "Multiple univariate testing seem to offer comparable performance to multivariate testing" (Rabanser et al.) Maybe this is specific to their MNIST and CIFAR experiments Regardless, we will employ multiple univariate testing Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
  • 109. Challenge: quantifying distribution shift We only have 11 features, so we’ll skip the dimensionality reduction step "Multiple univariate testing seem to offer comparable performance to multivariate testing" (Rabanser et al.) Maybe this is specific to their MNIST and CIFAR experiments Regardless, we will employ multiple univariate testing For each feature, we will run a 2-sided Kolmogorov-Smirnov test Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
  • 110. Challenge: quantifying distribution shift We only have 11 features, so we’ll skip the dimensionality reduction step "Multiple univariate testing seem to offer comparable performance to multivariate testing" (Rabanser et al.) Maybe this is specific to their MNIST and CIFAR experiments Regardless, we will employ multiple univariate testing For each feature, we will run a 2-sided Kolmogorov-Smirnov test Continuous data, non-parametric test Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
  • 111. Challenge: quantifying distribution shift We only have 11 features, so we’ll skip the dimensionality reduction step "Multiple univariate testing seem to offer comparable performance to multivariate testing" (Rabanser et al.) Maybe this is specific to their MNIST and CIFAR experiments Regardless, we will employ multiple univariate testing For each feature, we will run a 2-sided Kolmogorov-Smirnov test Continuous data, non-parametric test Compute the largest difference Z = sup z |Fp(z) − Fq(z)| where Fp and Fq are the empirical CDFs of the data we are comparing Shreya Shankar Keeping up with ML in Production May 2021 27 / 50
  • 112. Kolmogorov-Smirnov test made simple For each feature, we will run a 2-sided Kolmogorov-Smirnov test Shreya Shankar Keeping up with ML in Production May 2021 28 / 50
  • 113. Kolmogorov-Smirnov test made simple For each feature, we will run a 2-sided Kolmogorov-Smirnov test Continuous data, non-parametric test Shreya Shankar Keeping up with ML in Production May 2021 28 / 50
  • 114. Kolmogorov-Smirnov test made simple For each feature, we will run a 2-sided Kolmogorov-Smirnov test Continuous data, non-parametric test Compute the largest difference Z = sup z |Fp(z) − Fq(z)| where Fp and Fq are the empirical CDFs of the data we are comparing Shreya Shankar Keeping up with ML in Production May 2021 28 / 50
  • 115. Kolmogorov-Smirnov test made simple For each feature, we will run a 2-sided Kolmogorov-Smirnov test Continuous data, non-parametric test Compute the largest difference Z = sup z |Fp(z) − Fq(z)| where Fp and Fq are the empirical CDFs of the data we are comparing Example comparing two random normally distributed PDFs Shreya Shankar Keeping up with ML in Production May 2021 28 / 50
  • 116. How do you know when the data has "drifted?" Comparing January 2020 and February 2020 datasets using the KS test: Output feature statistic p_value 0 pickup_weekday 0.046196 0.000000e+00 2 work_hours 0.028587 0.000000e+00 6 trip_time 0.017205 0.000000e+00 7 trip_speed 0.035415 0.000000e+00 1 pickup_hour 0.009676 8.610133e-258 5 trip_distance 0.005312 5.266602e-78 8 PULocationID 0.004083 2.994877e-46 9 DOLocationID 0.003132 2.157559e-27 4 passenger_count 0.002947 2.634493e-24 10 RatecodeID 0.002616 3.047481e-19 3 pickup_minute 0.000702 8.861498e-02 Shreya Shankar Keeping up with ML in Production May 2021 29 / 50
  • 117. How do you know when the data has "drifted?" For our "offline" train and evaluation sets (January and February), we get extremely low p-values! 5 Lin, Lucas, and Shmueli, “Too Big to Fail: Large Samples and the p-Value Problem”. Shreya Shankar Keeping up with ML in Production May 2021 30 / 50
  • 118. How do you know when the data has "drifted?" For our "offline" train and evaluation sets (January and February), we get extremely low p-values! This method can have a high "false positive rate" 5 Lin, Lucas, and Shmueli, “Too Big to Fail: Large Samples and the p-Value Problem”. Shreya Shankar Keeping up with ML in Production May 2021 30 / 50
  • 119. How do you know when the data has "drifted?" For our "offline" train and evaluation sets (January and February), we get extremely low p-values! This method can have a high "false positive rate" In my experience, this method has flagged distributions as "significantly different" more than I want it to 5 Lin, Lucas, and Shmueli, “Too Big to Fail: Large Samples and the p-Value Problem”. Shreya Shankar Keeping up with ML in Production May 2021 30 / 50
  • 120. How do you know when the data has "drifted?" For our "offline" train and evaluation sets (January and February), we get extremely low p-values! This method can have a high "false positive rate" In my experience, this method has flagged distributions as "significantly different" more than I want it to In the era of "big data" (we have millions of data points), p-values are not useful to look at5 5 Lin, Lucas, and Shmueli, “Too Big to Fail: Large Samples and the p-Value Problem”. Shreya Shankar Keeping up with ML in Production May 2021 30 / 50
  • 121. How do you know when the data has "drifted?" You don’t really know. An unsatisfying solution is to retrain the model to be as fresh as possible. Shreya Shankar Keeping up with ML in Production May 2021 31 / 50
  • 122. Types of production ML bugs Data changed or corrupted Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
  • 123. Types of production ML bugs Data changed or corrupted Data dependency / infra issues Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
  • 124. Types of production ML bugs Data changed or corrupted Data dependency / infra issues Logical error in the ETL code Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
  • 125. Types of production ML bugs Data changed or corrupted Data dependency / infra issues Logical error in the ETL code Logical error in retraining a model Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
  • 126. Types of production ML bugs Data changed or corrupted Data dependency / infra issues Logical error in the ETL code Logical error in retraining a model Logical error in promoting a retrained model to inference step Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
  • 127. Types of production ML bugs Data changed or corrupted Data dependency / infra issues Logical error in the ETL code Logical error in retraining a model Logical error in promoting a retrained model to inference step Change in consumer behavior Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
  • 128. Types of production ML bugs Data changed or corrupted Data dependency / infra issues Logical error in the ETL code Logical error in retraining a model Logical error in promoting a retrained model to inference step Change in consumer behavior Many more Shreya Shankar Keeping up with ML in Production May 2021 32 / 50
  • 129. Why production ML bugs are hard to find and address ML code fails silently (few runtime or compiler errors) Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
  • 130. Why production ML bugs are hard to find and address ML code fails silently (few runtime or compiler errors) Little or no unit or integration testing in feature engineering and ML code Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
  • 131. Why production ML bugs are hard to find and address ML code fails silently (few runtime or compiler errors) Little or no unit or integration testing in feature engineering and ML code Very few (if any) people have end-to-end visibility on an ML pipeline Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
  • 132. Why production ML bugs are hard to find and address ML code fails silently (few runtime or compiler errors) Little or no unit or integration testing in feature engineering and ML code Very few (if any) people have end-to-end visibility on an ML pipeline Underdeveloped monitoring and tracing tools Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
  • 133. Why production ML bugs are hard to find and address ML code fails silently (few runtime or compiler errors) Little or no unit or integration testing in feature engineering and ML code Very few (if any) people have end-to-end visibility on an ML pipeline Underdeveloped monitoring and tracing tools So many artifacts to keep track of, especially if you retrain your models Shreya Shankar Keeping up with ML in Production May 2021 33 / 50
  • 134. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 135. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 136. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels First-pass monitoring Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 137. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels First-pass monitoring Model output distributions Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 138. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels First-pass monitoring Model output distributions Averages, percentiles Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 139. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels First-pass monitoring Model output distributions Averages, percentiles More advanced monitoring Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 140. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels First-pass monitoring Model output distributions Averages, percentiles More advanced monitoring Model input (feature) distributions Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 141. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels First-pass monitoring Model output distributions Averages, percentiles More advanced monitoring Model input (feature) distributions ETL intermediate output distributions Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 142. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels First-pass monitoring Model output distributions Averages, percentiles More advanced monitoring Model input (feature) distributions ETL intermediate output distributions Can get tedious if you have many features or steps in ETL Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 143. What to monitor? Agile philosophy: "Continuous Delivery." Monitor proactively! Especially hard when there are no labels First-pass monitoring Model output distributions Averages, percentiles More advanced monitoring Model input (feature) distributions ETL intermediate output distributions Can get tedious if you have many features or steps in ETL Still unclear how to quantify, detect, and act on Shreya Shankar Keeping up with ML in Production May 2021 34 / 50
  • 144. Prometheus monitoring for ML Selling points Pain points Easy to call hist.observe(val) User (you) needs to define the histogram buckets Easy to integrate with Grafana Metric naming, tracking, and visualization clutter Many applications already use Prometheus PromQL is not very intuitive for ML-based tasks Shreya Shankar Keeping up with ML in Production May 2021 35 / 50
  • 145. PromQL for ML monitoring Question Query Average output [5m] sum(rate(histogram_sum[5m])) / sum(rate(histogram_count[5m])), time series view Median output [5m] histogram_quantile(0.5, sum by (job, le) (rate(histogram_bucket[5m]))), time series view Probability distribution of outputs [5m] sum(rate(histogram_bucket[5m])) by (le), heatmap view6 6 Jordan, A simple solution for monitoring ML systems. Shreya Shankar Keeping up with ML in Production May 2021 36 / 50
  • 146. Retraining regularly Suppose we retrain our model on a weekly basis. Use MLFlow Model Registry7 to keep track of which model is the best model Now, we need some tracing8... 7 https://www.mlflow.org/docs/latest/model-registry.html 8 Hermann and Del Balso, Scaling Machine Learning at Uber with Michelangelo. Shreya Shankar Keeping up with ML in Production May 2021 37 / 50
  • 147. Retraining regularly Suppose we retrain our model on a weekly basis. Use MLFlow Model Registry7 to keep track of which model is the best model Alternatively, can also manage versioning and a registry ourselves (what we do now) Now, we need some tracing8... 7 https://www.mlflow.org/docs/latest/model-registry.html 8 Hermann and Del Balso, Scaling Machine Learning at Uber with Michelangelo. Shreya Shankar Keeping up with ML in Production May 2021 37 / 50
  • 148. Retraining regularly Suppose we retrain our model on a weekly basis. Use MLFlow Model Registry7 to keep track of which model is the best model Alternatively, can also manage versioning and a registry ourselves (what we do now) At inference time, pull the latest/best model from our model registry Now, we need some tracing8... 7 https://www.mlflow.org/docs/latest/model-registry.html 8 Hermann and Del Balso, Scaling Machine Learning at Uber with Michelangelo. Shreya Shankar Keeping up with ML in Production May 2021 37 / 50
  • 149. mltrace Coarse-grained lineage and tracing 9 https://github.com/loglabs/mltrace Shreya Shankar Keeping up with ML in Production May 2021 38 / 50
  • 150. mltrace Coarse-grained lineage and tracing Designed specifically for complex data or ML pipelines 9 https://github.com/loglabs/mltrace Shreya Shankar Keeping up with ML in Production May 2021 38 / 50
  • 151. mltrace Coarse-grained lineage and tracing Designed specifically for complex data or ML pipelines Designed specifically for Agile multidisciplinary teams 9 https://github.com/loglabs/mltrace Shreya Shankar Keeping up with ML in Production May 2021 38 / 50
  • 152. mltrace Coarse-grained lineage and tracing Designed specifically for complex data or ML pipelines Designed specifically for Agile multidisciplinary teams Alpha release9 contains Python API to log run information and UI to view traces for outputs 9 https://github.com/loglabs/mltrace Shreya Shankar Keeping up with ML in Production May 2021 38 / 50
  • 153. mltrace design principles Utter simiplicity (user knows everything going on) Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
  • 154. mltrace design principles Utter simiplicity (user knows everything going on) No need for users to manually set component dependencies – the tool should detect dependencies based on I/O of a component’s run Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
  • 155. mltrace design principles Utter simiplicity (user knows everything going on) No need for users to manually set component dependencies – the tool should detect dependencies based on I/O of a component’s run API designed for both engineers and data scientists Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
  • 156. mltrace design principles Utter simiplicity (user knows everything going on) No need for users to manually set component dependencies – the tool should detect dependencies based on I/O of a component’s run API designed for both engineers and data scientists UI designed for people to help triage issues even if they didn’t build the ETL or models themselves Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
  • 157. mltrace design principles Utter simiplicity (user knows everything going on) No need for users to manually set component dependencies – the tool should detect dependencies based on I/O of a component’s run API designed for both engineers and data scientists UI designed for people to help triage issues even if they didn’t build the ETL or models themselves When there is a bug, whose backlog do you put it on? Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
  • 158. mltrace design principles Utter simiplicity (user knows everything going on) No need for users to manually set component dependencies – the tool should detect dependencies based on I/O of a component’s run API designed for both engineers and data scientists UI designed for people to help triage issues even if they didn’t build the ETL or models themselves When there is a bug, whose backlog do you put it on? Enable people who may not have developed the model to investigate the bug Shreya Shankar Keeping up with ML in Production May 2021 39 / 50
  • 159. mltrace concepts Pipelines are made of components (ex: cleaning, featuregen, split, train) 10 https://mltrace.readthedocs.io/en/latest 11 https://docs.dagster.io/concepts/solids-pipelines/solids 12 https://www.mlflow.org/docs/latest/tracking.html Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
  • 160. mltrace concepts Pipelines are made of components (ex: cleaning, featuregen, split, train) In ML pipelines, different artifacts are produced (inputs and outputs) when the same component is run more than once 10 https://mltrace.readthedocs.io/en/latest 11 https://docs.dagster.io/concepts/solids-pipelines/solids 12 https://www.mlflow.org/docs/latest/tracking.html Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
  • 161. mltrace concepts Pipelines are made of components (ex: cleaning, featuregen, split, train) In ML pipelines, different artifacts are produced (inputs and outputs) when the same component is run more than once Component and ComponentRun objects10 10 https://mltrace.readthedocs.io/en/latest 11 https://docs.dagster.io/concepts/solids-pipelines/solids 12 https://www.mlflow.org/docs/latest/tracking.html Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
  • 162. mltrace concepts Pipelines are made of components (ex: cleaning, featuregen, split, train) In ML pipelines, different artifacts are produced (inputs and outputs) when the same component is run more than once Component and ComponentRun objects10 Decorator interface similar to Dagster "solids"11 10 https://mltrace.readthedocs.io/en/latest 11 https://docs.dagster.io/concepts/solids-pipelines/solids 12 https://www.mlflow.org/docs/latest/tracking.html Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
  • 163. mltrace concepts Pipelines are made of components (ex: cleaning, featuregen, split, train) In ML pipelines, different artifacts are produced (inputs and outputs) when the same component is run more than once Component and ComponentRun objects10 Decorator interface similar to Dagster "solids"11 User specifies component name, input and output variables 10 https://mltrace.readthedocs.io/en/latest 11 https://docs.dagster.io/concepts/solids-pipelines/solids 12 https://www.mlflow.org/docs/latest/tracking.html Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
  • 164. mltrace concepts Pipelines are made of components (ex: cleaning, featuregen, split, train) In ML pipelines, different artifacts are produced (inputs and outputs) when the same component is run more than once Component and ComponentRun objects10 Decorator interface similar to Dagster "solids"11 User specifies component name, input and output variables Alternative Pythonic interface similar to MLFlow tracking12 10 https://mltrace.readthedocs.io/en/latest 11 https://docs.dagster.io/concepts/solids-pipelines/solids 12 https://www.mlflow.org/docs/latest/tracking.html Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
  • 165. mltrace concepts Pipelines are made of components (ex: cleaning, featuregen, split, train) In ML pipelines, different artifacts are produced (inputs and outputs) when the same component is run more than once Component and ComponentRun objects10 Decorator interface similar to Dagster "solids"11 User specifies component name, input and output variables Alternative Pythonic interface similar to MLFlow tracking12 User creates instance of ComponentRun object and calls log_component_run on the object 10 https://mltrace.readthedocs.io/en/latest 11 https://docs.dagster.io/concepts/solids-pipelines/solids 12 https://www.mlflow.org/docs/latest/tracking.html Shreya Shankar Keeping up with ML in Production May 2021 40 / 50
  • 166. mltrace integration example def clean_data( raw_data_filename : str , raw_df: pd.DataFrame , month: str , year: str , component: str) -> str: first , last = calendar.monthrange(int(year), int(month)) first_day = f’{year}-{month}-{first:02d}’ last_day = f’{year}-{month}-{last:02d}’ clean_df = helpers. remove_zero_fare_and_oob_rows ( raw_df , first_day , last_day) # Write "clean" df to s3 output_path = io. save_output_df (clean_df , component) return output_path Shreya Shankar Keeping up with ML in Production May 2021 41 / 50
  • 167. mltrace integration example @register(’cleaning ’, input_vars=[’raw_data_filename ’], output_vars=[’output_path ’]) def clean_data( raw_data_filename : str , raw_df: pd.DataFrame , month: str , year: str , component: str) -> str: first , last = calendar.monthrange(int(year), int(month)) first_day = f’{year}-{month}-{first:02d}’ last_day = f’{year}-{month}-{last:02d}’ clean_df = helpers. remove_zero_fare_and_oob_rows ( raw_df , first_day , last_day) # Write "clean" df to s3 output_path = io. save_output_df (clean_df , component) return output_path Shreya Shankar Keeping up with ML in Production May 2021 42 / 50
  • 168. mltrace integration example from mltrace import create_component , tag_component , register create_component (’cleaning ’, ’Cleans the raw data with basic OOB criteria.’, ’shreya ’) tag_component(’cleaning ’, [’etl’]) # Run function raw_df = io.read_file(month , year) raw_data_filename = io. get_raw_data_filename (’2020 ’, ’01’) output_path = clean_data(raw_data_filename , raw_df , ’01’, ’ 2020 ’, ’clean/2020_01 ’) print(output_path) Shreya Shankar Keeping up with ML in Production May 2021 43 / 50
  • 169. mltrace UI demo Shreya Shankar Keeping up with ML in Production May 2021 44 / 50
  • 170. mltrace immediate roadmap13 UI feature to easily see if a component is stale 13 Please email shreyashankar@berkeley.edu if you’re interested in contributing! Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
  • 171. mltrace immediate roadmap13 UI feature to easily see if a component is stale There are newer versions of the component’s outputs 13 Please email shreyashankar@berkeley.edu if you’re interested in contributing! Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
  • 172. mltrace immediate roadmap13 UI feature to easily see if a component is stale There are newer versions of the component’s outputs The latest run of a component happened weeks or months ago 13 Please email shreyashankar@berkeley.edu if you’re interested in contributing! Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
  • 173. mltrace immediate roadmap13 UI feature to easily see if a component is stale There are newer versions of the component’s outputs The latest run of a component happened weeks or months ago CLI to interact with mltrace logs 13 Please email shreyashankar@berkeley.edu if you’re interested in contributing! Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
  • 174. mltrace immediate roadmap13 UI feature to easily see if a component is stale There are newer versions of the component’s outputs The latest run of a component happened weeks or months ago CLI to interact with mltrace logs Scala Spark integrations (or REST API for logging) 13 Please email shreyashankar@berkeley.edu if you’re interested in contributing! Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
  • 175. mltrace immediate roadmap13 UI feature to easily see if a component is stale There are newer versions of the component’s outputs The latest run of a component happened weeks or months ago CLI to interact with mltrace logs Scala Spark integrations (or REST API for logging) Prometheus integrations to easily monitor outputs 13 Please email shreyashankar@berkeley.edu if you’re interested in contributing! Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
  • 176. mltrace immediate roadmap13 UI feature to easily see if a component is stale There are newer versions of the component’s outputs The latest run of a component happened weeks or months ago CLI to interact with mltrace logs Scala Spark integrations (or REST API for logging) Prometheus integrations to easily monitor outputs Possibly other MLOps tool integrations 13 Please email shreyashankar@berkeley.edu if you’re interested in contributing! Shreya Shankar Keeping up with ML in Production May 2021 45 / 50
  • 177. Talk recap Introduced an end-to-end ML pipeline for a toy task Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
  • 178. Talk recap Introduced an end-to-end ML pipeline for a toy task Demonstrated performance drift or degradation when a model was deployed via Prometheus metric logging and Grafana dashboard visualizations Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
  • 179. Talk recap Introduced an end-to-end ML pipeline for a toy task Demonstrated performance drift or degradation when a model was deployed via Prometheus metric logging and Grafana dashboard visualizations Showed via statistical tests that it’s hard to know when exactly to retrain a model Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
  • 180. Talk recap Introduced an end-to-end ML pipeline for a toy task Demonstrated performance drift or degradation when a model was deployed via Prometheus metric logging and Grafana dashboard visualizations Showed via statistical tests that it’s hard to know when exactly to retrain a model Discussed challenges around continuously retraining models and maintaining ML production pipelines Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
  • 181. Talk recap Introduced an end-to-end ML pipeline for a toy task Demonstrated performance drift or degradation when a model was deployed via Prometheus metric logging and Grafana dashboard visualizations Showed via statistical tests that it’s hard to know when exactly to retrain a model Discussed challenges around continuously retraining models and maintaining ML production pipelines Introduced mltrace, a tool developed for ML pipelines that performs coarse-grained lineage and tracing Shreya Shankar Keeping up with ML in Production May 2021 46 / 50
  • 182. Areas of future work in MLOps Only scratched the surface with challenges discussed today 14 D’Amour et al., Underspecification Presents Challenges for Credibility in Modern Machine Learning. Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
  • 183. Areas of future work in MLOps Only scratched the surface with challenges discussed today Many more of these problems around deep learning 14 D’Amour et al., Underspecification Presents Challenges for Credibility in Modern Machine Learning. Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
  • 184. Areas of future work in MLOps Only scratched the surface with challenges discussed today Many more of these problems around deep learning Embeddings as features – if someone updates the upstream embedding model, do all the data scientists downstream need to immediately change their models? 14 D’Amour et al., Underspecification Presents Challenges for Credibility in Modern Machine Learning. Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
  • 185. Areas of future work in MLOps Only scratched the surface with challenges discussed today Many more of these problems around deep learning Embeddings as features – if someone updates the upstream embedding model, do all the data scientists downstream need to immediately change their models? "Underspecified" pipelines can pose threats14 14 D’Amour et al., Underspecification Presents Challenges for Credibility in Modern Machine Learning. Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
  • 186. Areas of future work in MLOps Only scratched the surface with challenges discussed today Many more of these problems around deep learning Embeddings as features – if someone updates the upstream embedding model, do all the data scientists downstream need to immediately change their models? "Underspecified" pipelines can pose threats14 How to enable anyone to build dashboards or monitor pipelines? 14 D’Amour et al., Underspecification Presents Challenges for Credibility in Modern Machine Learning. Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
  • 187. Areas of future work in MLOps Only scratched the surface with challenges discussed today Many more of these problems around deep learning Embeddings as features – if someone updates the upstream embedding model, do all the data scientists downstream need to immediately change their models? "Underspecified" pipelines can pose threats14 How to enable anyone to build dashboards or monitor pipelines? ML people know what to monitor, infra people know how to monitor 14 D’Amour et al., Underspecification Presents Challenges for Credibility in Modern Machine Learning. Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
  • 188. Areas of future work in MLOps Only scratched the surface with challenges discussed today Many more of these problems around deep learning Embeddings as features – if someone updates the upstream embedding model, do all the data scientists downstream need to immediately change their models? "Underspecified" pipelines can pose threats14 How to enable anyone to build dashboards or monitor pipelines? ML people know what to monitor, infra people know how to monitor We should strive to build tools to allow engineers and data scientists to be more self sufficient 14 D’Amour et al., Underspecification Presents Challenges for Credibility in Modern Machine Learning. Shreya Shankar Keeping up with ML in Production May 2021 47 / 50
  • 189. Miscellaneous Code for this talk: https://github.com/shreyashankar/debugging-ml-talk Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
  • 190. Miscellaneous Code for this talk: https://github.com/shreyashankar/debugging-ml-talk Toy ML Pipeline with mltrace, Prometheus, and Grafana: https://github.com/shreyashankar/toy-ml-pipeline/tree/ shreyashankar/db Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
  • 191. Miscellaneous Code for this talk: https://github.com/shreyashankar/debugging-ml-talk Toy ML Pipeline with mltrace, Prometheus, and Grafana: https://github.com/shreyashankar/toy-ml-pipeline/tree/ shreyashankar/db mltrace: https://github.com/loglabs/mltrace Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
  • 192. Miscellaneous Code for this talk: https://github.com/shreyashankar/debugging-ml-talk Toy ML Pipeline with mltrace, Prometheus, and Grafana: https://github.com/shreyashankar/toy-ml-pipeline/tree/ shreyashankar/db mltrace: https://github.com/loglabs/mltrace Contact info Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
  • 193. Miscellaneous Code for this talk: https://github.com/shreyashankar/debugging-ml-talk Toy ML Pipeline with mltrace, Prometheus, and Grafana: https://github.com/shreyashankar/toy-ml-pipeline/tree/ shreyashankar/db mltrace: https://github.com/loglabs/mltrace Contact info Twitter: @sh_reya Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
  • 194. Miscellaneous Code for this talk: https://github.com/shreyashankar/debugging-ml-talk Toy ML Pipeline with mltrace, Prometheus, and Grafana: https://github.com/shreyashankar/toy-ml-pipeline/tree/ shreyashankar/db mltrace: https://github.com/loglabs/mltrace Contact info Twitter: @sh_reya Email: shreyashankar@berkeley.edu Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
  • 195. Miscellaneous Code for this talk: https://github.com/shreyashankar/debugging-ml-talk Toy ML Pipeline with mltrace, Prometheus, and Grafana: https://github.com/shreyashankar/toy-ml-pipeline/tree/ shreyashankar/db mltrace: https://github.com/loglabs/mltrace Contact info Twitter: @sh_reya Email: shreyashankar@berkeley.edu Thank you! Shreya Shankar Keeping up with ML in Production May 2021 48 / 50
  • 196. References D’Amour, Alexander et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning. 2020. arXiv: 2011.03395 [cs.LG]. Hermann, Jeremy and Mike Del Balso. Scaling Machine Learning at Uber with Michelangelo. 2018. URL: https://eng.uber.com/scaling-michelangelo/. Jordan, Jeremy. A simple solution for monitoring ML systems. 2021. URL: https://www.jeremyjordan.me/ml-monitoring/. Lin, Mingfeng, Henry Lucas, and Galit Shmueli. “Too Big to Fail: Large Samples and the p-Value Problem”. In: Information Systems Research 24 (Dec. 2013), pp. 906–917. DOI: 10.1287/isre.2013.0480. Rabanser, Stephan, Stephan Günnemann, and Zachary C. Lipton. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. 2019. arXiv: 1810.11953 [stat.ML]. Shreya Shankar Keeping up with ML in Production May 2021 49 / 50
  • 197. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions. Shreya Shankar Keeping up with ML in Production May 2021 50 / 50