SlideShare ist ein Scribd-Unternehmen logo
1 von 83
Downloaden Sie, um offline zu lesen
Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not
use or distribute these slides for commercial purposes. You may make copies of these
slides and use or distribute them for educational purposes as long as you
cite DeepLearning.AI as the source of the slides.
For the rest of the details of the license, see
https://creativecommons.org/licenses/by-sa/2.0/legalcode
Welcome
Model Analysis
Model Performance
Analysis
Model Analysis Overview
● Is model performing well?
● Is there scope for improvement?
● Can the data change in future?
● Has the data changed since you created your training dataset?
What is next after model training/deployment?
Black box evaluation vs model introspection
● Models can be tested for metrics like accuracy and losses like test error
without knowing internal details
● For finer evaluation, models can be inspected part by part
Black box evaluation
Model introspection
Performance metrics vs optimization objectives
● Performance metrics will vary based
on the task like regression,
classification, etc.
● Within a type of task, based on the
end-goal, your performance metrics
may be different
● Performance is measured after a
round of optimization
● Machine Learning formulates the
problem statement into an objective
function
● Learning algorithms find optimum
values for each variable to converge
into local/global minima
https://cs231n.github.io/neural-networks-3/
Performance metrics vs optimization objectives
Top level aggregate metrics vs slicing
● Most of the time, metrics are calculated on the entire dataset
● Slicing deals with understanding how the model is performing on each
subset of data
Introduction to TensorFlow
Model Analysis
Advanced Model Analysis and
Debugging
Why should you slice your data?
Your top-level metrics may hide problems
● Your model may not perform well for particular [customers | products |
stores | days of the week | etc.]
Each prediction request is an individual event, maybe an individual
customer
● For example, customers may have a bad experience
● For example, some stores may perform badly
TensorFlow Model Analysis (TFMA)
Open source
library
Scalable
framework
Ensures models
meet required
quality
thresholds
Used to compute
and visualize
evaluation
metrics
Inspect model’s
performance
against different
slices of data
Architecture
Read
Inputs
Tf.
Example
Write
Results
analysis
metrics
...
Evaluators AnalysisEvaluator(Default)
CustomEvaluator(Optional)
MetricsandPlotsEvaluator(default)
Group
By
Slices
Compute/
Combine
Predict
(default)
Slice
Keys
(default)
Custom
Extractor
(optional)
Extractors
ExtractAndEvaluate
...
One model vs multiple models over time
TensorFlow metrics in TensorFlow
model analysis
Metric
value
TensorFlow metrics in TensorBoard
Metric
value
Global steps
Aggregate vs sliced metrics
Aggregate metric computed over
entire eval dataset
1 - Specificity
Sensitivity
Metric “sliced” by different segments
of the eval dataset
1 - Specificity
Sensitivity
Streaming vs full-pass metrics
Streaming metrics are approximations
computed on mini-batches of data
TensorBoard visualizes metrics
through mini-batches
Apache Beam is used for scaling on
large datasets
TFMA gives evaluation results after
running through entire dataset
Advanced Model Analysis and
Debugging
TFMA in Practice
TFMA in practice
● Analyse impact of different slices of data over various metrics
● How to track metrics over time?
import tensorflow as tf
import tensorflow_transform as tft
import tensorflow_model_analysis as tfma
Step 1: Export EvalSavedModel for TFMA
def get_serve_tf_examples_fn(model, tf_transform_output):
# Return a function that parses a serialized tf.Example and applies TFT
tf_transform_output = tft.TFTransformOutput(transform_output_dir)
signatures = {
'serving_default': get_serve_tf_examples_fn(model, tf_transform_output)
.get_concrete_function(tf.TensorSpec(...)),
}
model.save(serving_model_dir_path, save_format='tf', signatures=signatures)
# Specify slicing spec
slice_spec = [slicer.SingleSliceSpec(columns=[‘column_name’]), ...]
# Define metrics
metrics = [tf.keras.metrics.Accuracy(name='accuracy'),
tfma.metrics.MeanPrediction(name='mean_prediction'), ...]
metrics_specs = tfma.metrics.specs_from_metrics(metrics)
Step 2: Create EvalConfig
eval_config = tfma.EvalConfig(
model_specs=[tfma.ModelSpec(label_key=features.LABEL_KEY)],
slicing_specs=slice_spec,
metrics_specs=metrics_specs, ...)
# Specify the path to the eval graph and to where the result should be written
eval_model_dir = ...
result_path = ...
eval_shared_model = tfma.default_eval_shared_model(
eval_saved_model_path=eval_model_dir,
eval_config=eval_config)
Step 3: Analyze model
# Run TensorFlow Model Analysis
eval_result = tfma.run_model_analysis(eval_shared_model=eval_shared_model,
output_path=result_path,
...)
# render results
tfma.viewer.render_slicing_metrics(result)
Step 4: Visualizing metrics
Model Debugging
Overview
Advanced Model Analysis and
Debugging
Model robustness
● Robustness is much more than generalization
● Is the model accurate even for slightly corrupted input data?
Robustness metrics
Split data in to train/val/dev sets
Specific metrics for regression and classification problems
Robustness measurement shouldn’t take place during training
Model debugging
● Deals with detecting and dealing with problems in ML systems
● Applies mainstream software engineering practices to ML models
Privacy
harms
Social
discrimination
Security
vulnerabilities
Model
decay
Opaqueness
Model Debugging Objectives
Sensitivity
analysis
Benchmark
models
Residual
analysis
Model Debugging Techniques
Advanced Model Analysis and
Debugging
Benchmark Models
Simple, trusted and interpretable models solving the same problem
Compare your ML model against these models
Benchmark model is the starting point of ML development
Benchmark models
Advanced Model Analysis and
Debugging
Sensitivity Analysis and
Adversarial Attacks
Sensitivity analysis
● Simulate data of your choice and see what your model predicts
● See how model reacts to data which has never been used before
What-If Tool for sensitivity analysis
Random Attacks
● Expose models to high volumes of random input data
● Exploits the unexpected software and math bugs
● Great way to start debugging
Partial dependence plots
● Visualize the effects of changing one or more variables in your model
● PDPbox and PyCEbox open source packages
How vulnerable to attacks is your model?
● Attacks are aimed at fooling your model
● Successful attacks could be catastrophic
● Test adversarial examples
● Harden your model
Sensitivity can mean vulnerability
A Famous Example: Ostrich
How vulnerable to attacks is your model?
Example:
A self-driving car crashes because black
and white stickers applied to a stop sign
cause a classifier to interpret it as a Speed
Limit 45 sign.
How vulnerable to attacks is your model?
Example:
A spam detector fails to classify an email as
spam. The spam mail has been designed to
look like a normal email, but is actually
phishing.
How vulnerable to attacks is your model?
Example:
A machine-learning powered scanner scans
suitcases for weapons at an airport. A knife
was developed to avoid detection by
making the system think it is an umbrella.
● Informational Harm: Leakage of information
● Behavioral Harm: Manipulating the behavior of the model
Informational and Behavioral Harms
Informational Harms
● Membership Inference: was this person’s data
used for training?
● Model Inversion: recreate the training data
● Model Extraction: recreate the model
Behavioral Harms
● Poisoning: insert malicious data into
training data
● Evasion: input data that causes the model
to intentionally misclassify that data
Measuring your vulnerability to attack
Cleverhans:
an open-source Python library to benchmark
machine learning systems' vulnerability to
adversarial examples
Foolbox:
an open-source Python library that lets you
easily run adversarial attacks against machine
learning models
Attempted defenses against adversarial examples
● Defensive distillation
Adversarial example searches
Advanced Model Analysis and
Debugging
Residual Analysis
Residual analysis
● Measures the difference between model’s predictions and ground truth
● Randomly distributed errors are good
● Correlated or systematic errors show that a model can be improved
Systematic = Bad
Random = Good
Residual analysis
Residual analysis
● Residuals should not be correlated with another feature
● Adjacent residuals should not be correlated with each other
(autocorrelation)
Advanced Model Analysis and
Debugging
Model Remediation
Remediation techniques
Adding synthetic data into training set
Helps correct for unbalanced training data
Data
augmentation
Overcome myth of neural networks as black box
Understand how data is getting transformed
Interpretable and
explainable ML
Remediation techniques
● Model editing:
○ Applies to decision trees
○ Manual tweaks to adapt your use case
● Model assertions:
○ Implement business rules that override model predictions
Include people with varied backgrounds
for collecting training data
Conduct feature selection on training data
Use fairness metrics to select hyperparameters
and decision cut-off thresholds
Remediation techniques
Discrimination
remediation
Remediation techniques
● Conduct model debugging at regular intervals
● Inspect accuracy, fairness, security problems, etc
Anomaly
detection
Model
monitoring
● Anomalies can be a warning of an attack
● Enforce data integrity constraints on incoming data
Fairness
Advanced Model Analysis and
Debugging
Fairness indicators
● Open source library to compute fairness metrics
● Easily scales across dataset of any size
● Built in top of TFMA
What does fairness indicators do?
● Compute commonly-identified fairness metrics for classification
models
● Compare model performance across subgroups to other models
● No remediation tools provided
Evaluate at individual slices
● Overall metrics can hide poor performance for certain parts of data
● Some metrics may fare well over others
Aspects to consider
● Establish context and different user types
● Seek domain experts help
● Use data slicing widely and wisely
General guidelines
● Compute performance metrics at all slices of data
● Evaluate your metrics across multiple thresholds
● If decision margin is small, report in more detail
Measuring Fairness
Advanced Model Analysis and
Debugging
Positive rate / Negative rate
● Percentage data points classified as positive/negative
● Independent of ground truth
● Use case: having equal final percentages of groups is important
● TPR: percentage of positive data points that are correctly labeled
positive
● FNR: percentage of positive data points that are incorrectly labeled
negative
● Measures equality of opportunity, when the positive class should be
equal across subgroups
● Use case: where it is important that same percent of qualified
candidates are rated positive in each group
True positive rate (TPR) / False negative rate (FNR)
● TNR: percentage of negative data points that are correctly labeled
negative
● FPR: percentage of negative data points that are incorrectly labeled
positive
● Measures equality of opportunity, when the negative class should be
equal across subgroup
● Use case: where misclassifying something as positive are more
concerning than classifying the positives
True negative rate (TNR) / False positive rate (FPR)
Accuracy & Area under the curve (AUC)
● Accuracy: The percentage of data points that are correctly labeled
● AUC: The percentage of data points that are correctly labeled when
each class is given equal weight independent of number of samples
● Metrics related to predictive parity
● Use case: when precision is critical
Good fairness indicators doesn’t always mean
the model is fair
Unfair skews if there is a gap in a metric
between two groups
Conduct adversarial testing for rare, malicious
examples
Continuous evaluation throughout
development and deployment
Tips
About the CelebA dataset
● 200K celebrity images
● Each image has 40 attribute annotations
● Each image has 5 landmark locations
● Assumption on smiling attribute
Fairness indicators in practice
Build a classifier to detect smiling
Evaluate fairness and performance across age groups
Generate visualizations to gain model performance insight
Continuous evaluation and
monitoring
Continuous Evaluation and
Monitoring
Why do models need to be monitored?
● Training data is a snapshot of the world at a point in time
● Many types of data change over time, some quickly
● ML Models do not get better with age
● As model performance degrades, you want an early warning
● Concept drift: loss of prediction quality
● Concept Emergence: new type of data distribution
● Types of dataset shift:
○ covariate shift
○ prior probability shift
Data drift and shift
Raw Data
Prediction
Preprocessing
Model
How are models monitored?
Monitoring
Training Data
Labeling
Models number of error as binomial random variable
Alert rule
Method used is drift detection method
Statistical process control
Method used is linear four rates
If data is stationary, contingency table should remain constant
Sequential analysis
Calculate mean error rate at every window of data
Size of window adapts, becoming shorter when data is not stationary
Method used is Adaptive Windowing (ADWIN)
Error distribution monitoring
Clustering/novelty detection
● Assign data to known cluster or detect emerging concept
● Multiple algorithms available: OLINDDA, MINAS, ECSMiner, and GC3
● Susceptible to curse of dimensionality
● Does not detect population level changes
Monitors individual feature separately at every window of data
Use PCA to reduce number of features
Pearson correlation in Change of Concept
Algorithms to compare:
Hellinger Distance in HDDDM
Feature distribution monitoring
Model-dependent monitoring
● Concentrate efforts near decision margin in latent space
● One algorithm is Margin Density Drift Detection (MD3)
● Area in latent space where classifiers have low confidence matter more
● Reduces false alarm rate effectively
Google Cloud AI Continuous Evaluation
● Leverages AI Platform Prediction and Data Labeling services
● Deploy your model to AI Platform Prediction with model version
● Create evaluation job
● Input and output are saved in BigQuery table
● Run evaluation job on few of these samples
● View the evaluation metrics on Google Cloud console
How often should you retrain?
● Depends on the rate of change
● If possible, automate the management of detecting model drift and
triggering model retraining
How often should you retrain?

Weitere ähnliche Inhalte

Ähnlich wie C3 w4

Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentationNaveen Kumar
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learningPramit Choudhary
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Machine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsMachine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsRamsha Ijaz
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxJishanAhmed24
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBigML, Inc
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Supercharge your AB testing with automated causal inference - Community Works...
Supercharge your AB testing with automated causal inference - Community Works...Supercharge your AB testing with automated causal inference - Community Works...
Supercharge your AB testing with automated causal inference - Community Works...Egor Kraev
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?Matei Zaharia
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...Agile Testing Alliance
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachAjit Ghodke
 
Machine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdfMachine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdfNsitTech
 

Ähnlich wie C3 w4 (20)

Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Machine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsMachine learning: A Walk Through School Exams
Machine learning: A Walk Through School Exams
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Supercharge your AB testing with automated causal inference - Community Works...
Supercharge your AB testing with automated causal inference - Community Works...Supercharge your AB testing with automated causal inference - Community Works...
Supercharge your AB testing with automated causal inference - Community Works...
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Pstc 2018
Pstc 2018Pstc 2018
Pstc 2018
 
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
#Interactive Session by Vivek Patle and Jahnavi Umarji, "Empowering Functiona...
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
 
Machine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdfMachine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdf
 

Mehr von Ajay Taneja

Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessingAjay Taneja
 
Week 3 data journey and data storage
Week 3   data journey and data storageWeek 3   data journey and data storage
Week 3 data journey and data storageAjay Taneja
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 Machine Learning Data Life Cycle in Production (Week 2   feature engineering... Machine Learning Data Life Cycle in Production (Week 2   feature engineering...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...Ajay Taneja
 
Course 2 Machine Learning Data LifeCycle in Production - Week 1
Course 2   Machine Learning Data LifeCycle in Production - Week 1Course 2   Machine Learning Data LifeCycle in Production - Week 1
Course 2 Machine Learning Data LifeCycle in Production - Week 1Ajay Taneja
 
Reference_Letters
Reference_LettersReference_Letters
Reference_LettersAjay Taneja
 

Mehr von Ajay Taneja (9)

C3 w1
C3 w1C3 w1
C3 w1
 
C3 w2
C3 w2C3 w2
C3 w2
 
C3 w3
C3 w3C3 w3
C3 w3
 
Week 4 advanced labeling, augmentation and data preprocessing
Week 4   advanced labeling, augmentation and data preprocessingWeek 4   advanced labeling, augmentation and data preprocessing
Week 4 advanced labeling, augmentation and data preprocessing
 
Week 3 data journey and data storage
Week 3   data journey and data storageWeek 3   data journey and data storage
Week 3 data journey and data storage
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 Machine Learning Data Life Cycle in Production (Week 2   feature engineering... Machine Learning Data Life Cycle in Production (Week 2   feature engineering...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 
Course 2 Machine Learning Data LifeCycle in Production - Week 1
Course 2   Machine Learning Data LifeCycle in Production - Week 1Course 2   Machine Learning Data LifeCycle in Production - Week 1
Course 2 Machine Learning Data LifeCycle in Production - Week 1
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Reference_Letters
Reference_LettersReference_Letters
Reference_Letters
 

Kürzlich hochgeladen

Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilVinayVitekari
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEselvakumar948
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 

Kürzlich hochgeladen (20)

Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 

C3 w4

  • 1. Copyright Notice These slides are distributed under the Creative Commons License. DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute these slides for commercial purposes. You may make copies of these slides and use or distribute them for educational purposes as long as you cite DeepLearning.AI as the source of the slides. For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode
  • 4. ● Is model performing well? ● Is there scope for improvement? ● Can the data change in future? ● Has the data changed since you created your training dataset? What is next after model training/deployment?
  • 5. Black box evaluation vs model introspection ● Models can be tested for metrics like accuracy and losses like test error without knowing internal details ● For finer evaluation, models can be inspected part by part
  • 8. Performance metrics vs optimization objectives ● Performance metrics will vary based on the task like regression, classification, etc. ● Within a type of task, based on the end-goal, your performance metrics may be different ● Performance is measured after a round of optimization ● Machine Learning formulates the problem statement into an objective function ● Learning algorithms find optimum values for each variable to converge into local/global minima
  • 10. Top level aggregate metrics vs slicing ● Most of the time, metrics are calculated on the entire dataset ● Slicing deals with understanding how the model is performing on each subset of data
  • 11. Introduction to TensorFlow Model Analysis Advanced Model Analysis and Debugging
  • 12. Why should you slice your data? Your top-level metrics may hide problems ● Your model may not perform well for particular [customers | products | stores | days of the week | etc.] Each prediction request is an individual event, maybe an individual customer ● For example, customers may have a bad experience ● For example, some stores may perform badly
  • 13. TensorFlow Model Analysis (TFMA) Open source library Scalable framework Ensures models meet required quality thresholds Used to compute and visualize evaluation metrics Inspect model’s performance against different slices of data
  • 15. One model vs multiple models over time TensorFlow metrics in TensorFlow model analysis Metric value TensorFlow metrics in TensorBoard Metric value Global steps
  • 16. Aggregate vs sliced metrics Aggregate metric computed over entire eval dataset 1 - Specificity Sensitivity Metric “sliced” by different segments of the eval dataset 1 - Specificity Sensitivity
  • 17. Streaming vs full-pass metrics Streaming metrics are approximations computed on mini-batches of data TensorBoard visualizes metrics through mini-batches Apache Beam is used for scaling on large datasets TFMA gives evaluation results after running through entire dataset
  • 18. Advanced Model Analysis and Debugging TFMA in Practice
  • 19. TFMA in practice ● Analyse impact of different slices of data over various metrics ● How to track metrics over time?
  • 20. import tensorflow as tf import tensorflow_transform as tft import tensorflow_model_analysis as tfma Step 1: Export EvalSavedModel for TFMA def get_serve_tf_examples_fn(model, tf_transform_output): # Return a function that parses a serialized tf.Example and applies TFT tf_transform_output = tft.TFTransformOutput(transform_output_dir) signatures = { 'serving_default': get_serve_tf_examples_fn(model, tf_transform_output) .get_concrete_function(tf.TensorSpec(...)), } model.save(serving_model_dir_path, save_format='tf', signatures=signatures)
  • 21. # Specify slicing spec slice_spec = [slicer.SingleSliceSpec(columns=[‘column_name’]), ...] # Define metrics metrics = [tf.keras.metrics.Accuracy(name='accuracy'), tfma.metrics.MeanPrediction(name='mean_prediction'), ...] metrics_specs = tfma.metrics.specs_from_metrics(metrics) Step 2: Create EvalConfig eval_config = tfma.EvalConfig( model_specs=[tfma.ModelSpec(label_key=features.LABEL_KEY)], slicing_specs=slice_spec, metrics_specs=metrics_specs, ...)
  • 22. # Specify the path to the eval graph and to where the result should be written eval_model_dir = ... result_path = ... eval_shared_model = tfma.default_eval_shared_model( eval_saved_model_path=eval_model_dir, eval_config=eval_config) Step 3: Analyze model # Run TensorFlow Model Analysis eval_result = tfma.run_model_analysis(eval_shared_model=eval_shared_model, output_path=result_path, ...)
  • 24.
  • 25. Model Debugging Overview Advanced Model Analysis and Debugging
  • 26. Model robustness ● Robustness is much more than generalization ● Is the model accurate even for slightly corrupted input data?
  • 27. Robustness metrics Split data in to train/val/dev sets Specific metrics for regression and classification problems Robustness measurement shouldn’t take place during training
  • 28. Model debugging ● Deals with detecting and dealing with problems in ML systems ● Applies mainstream software engineering practices to ML models
  • 31. Advanced Model Analysis and Debugging Benchmark Models
  • 32. Simple, trusted and interpretable models solving the same problem Compare your ML model against these models Benchmark model is the starting point of ML development Benchmark models
  • 33. Advanced Model Analysis and Debugging Sensitivity Analysis and Adversarial Attacks
  • 34. Sensitivity analysis ● Simulate data of your choice and see what your model predicts ● See how model reacts to data which has never been used before
  • 35. What-If Tool for sensitivity analysis
  • 36. Random Attacks ● Expose models to high volumes of random input data ● Exploits the unexpected software and math bugs ● Great way to start debugging
  • 37. Partial dependence plots ● Visualize the effects of changing one or more variables in your model ● PDPbox and PyCEbox open source packages
  • 38. How vulnerable to attacks is your model? ● Attacks are aimed at fooling your model ● Successful attacks could be catastrophic ● Test adversarial examples ● Harden your model Sensitivity can mean vulnerability
  • 39. A Famous Example: Ostrich
  • 40. How vulnerable to attacks is your model? Example: A self-driving car crashes because black and white stickers applied to a stop sign cause a classifier to interpret it as a Speed Limit 45 sign.
  • 41. How vulnerable to attacks is your model? Example: A spam detector fails to classify an email as spam. The spam mail has been designed to look like a normal email, but is actually phishing.
  • 42. How vulnerable to attacks is your model? Example: A machine-learning powered scanner scans suitcases for weapons at an airport. A knife was developed to avoid detection by making the system think it is an umbrella.
  • 43. ● Informational Harm: Leakage of information ● Behavioral Harm: Manipulating the behavior of the model Informational and Behavioral Harms
  • 44. Informational Harms ● Membership Inference: was this person’s data used for training? ● Model Inversion: recreate the training data ● Model Extraction: recreate the model
  • 45. Behavioral Harms ● Poisoning: insert malicious data into training data ● Evasion: input data that causes the model to intentionally misclassify that data
  • 46. Measuring your vulnerability to attack Cleverhans: an open-source Python library to benchmark machine learning systems' vulnerability to adversarial examples Foolbox: an open-source Python library that lets you easily run adversarial attacks against machine learning models
  • 47. Attempted defenses against adversarial examples ● Defensive distillation Adversarial example searches
  • 48. Advanced Model Analysis and Debugging Residual Analysis
  • 49. Residual analysis ● Measures the difference between model’s predictions and ground truth ● Randomly distributed errors are good ● Correlated or systematic errors show that a model can be improved
  • 50. Systematic = Bad Random = Good Residual analysis
  • 51. Residual analysis ● Residuals should not be correlated with another feature ● Adjacent residuals should not be correlated with each other (autocorrelation)
  • 52. Advanced Model Analysis and Debugging Model Remediation
  • 53. Remediation techniques Adding synthetic data into training set Helps correct for unbalanced training data Data augmentation Overcome myth of neural networks as black box Understand how data is getting transformed Interpretable and explainable ML
  • 54. Remediation techniques ● Model editing: ○ Applies to decision trees ○ Manual tweaks to adapt your use case ● Model assertions: ○ Implement business rules that override model predictions
  • 55. Include people with varied backgrounds for collecting training data Conduct feature selection on training data Use fairness metrics to select hyperparameters and decision cut-off thresholds Remediation techniques Discrimination remediation
  • 56. Remediation techniques ● Conduct model debugging at regular intervals ● Inspect accuracy, fairness, security problems, etc Anomaly detection Model monitoring ● Anomalies can be a warning of an attack ● Enforce data integrity constraints on incoming data
  • 58. Fairness indicators ● Open source library to compute fairness metrics ● Easily scales across dataset of any size ● Built in top of TFMA
  • 59. What does fairness indicators do? ● Compute commonly-identified fairness metrics for classification models ● Compare model performance across subgroups to other models ● No remediation tools provided
  • 60. Evaluate at individual slices ● Overall metrics can hide poor performance for certain parts of data ● Some metrics may fare well over others
  • 61. Aspects to consider ● Establish context and different user types ● Seek domain experts help ● Use data slicing widely and wisely
  • 62. General guidelines ● Compute performance metrics at all slices of data ● Evaluate your metrics across multiple thresholds ● If decision margin is small, report in more detail
  • 63. Measuring Fairness Advanced Model Analysis and Debugging
  • 64. Positive rate / Negative rate ● Percentage data points classified as positive/negative ● Independent of ground truth ● Use case: having equal final percentages of groups is important
  • 65. ● TPR: percentage of positive data points that are correctly labeled positive ● FNR: percentage of positive data points that are incorrectly labeled negative ● Measures equality of opportunity, when the positive class should be equal across subgroups ● Use case: where it is important that same percent of qualified candidates are rated positive in each group True positive rate (TPR) / False negative rate (FNR)
  • 66. ● TNR: percentage of negative data points that are correctly labeled negative ● FPR: percentage of negative data points that are incorrectly labeled positive ● Measures equality of opportunity, when the negative class should be equal across subgroup ● Use case: where misclassifying something as positive are more concerning than classifying the positives True negative rate (TNR) / False positive rate (FPR)
  • 67. Accuracy & Area under the curve (AUC) ● Accuracy: The percentage of data points that are correctly labeled ● AUC: The percentage of data points that are correctly labeled when each class is given equal weight independent of number of samples ● Metrics related to predictive parity ● Use case: when precision is critical
  • 68. Good fairness indicators doesn’t always mean the model is fair Unfair skews if there is a gap in a metric between two groups Conduct adversarial testing for rare, malicious examples Continuous evaluation throughout development and deployment Tips
  • 69. About the CelebA dataset ● 200K celebrity images ● Each image has 40 attribute annotations ● Each image has 5 landmark locations ● Assumption on smiling attribute
  • 70. Fairness indicators in practice Build a classifier to detect smiling Evaluate fairness and performance across age groups Generate visualizations to gain model performance insight
  • 72. Why do models need to be monitored? ● Training data is a snapshot of the world at a point in time ● Many types of data change over time, some quickly ● ML Models do not get better with age ● As model performance degrades, you want an early warning
  • 73. ● Concept drift: loss of prediction quality ● Concept Emergence: new type of data distribution ● Types of dataset shift: ○ covariate shift ○ prior probability shift Data drift and shift
  • 74. Raw Data Prediction Preprocessing Model How are models monitored? Monitoring Training Data Labeling
  • 75. Models number of error as binomial random variable Alert rule Method used is drift detection method Statistical process control
  • 76. Method used is linear four rates If data is stationary, contingency table should remain constant Sequential analysis
  • 77. Calculate mean error rate at every window of data Size of window adapts, becoming shorter when data is not stationary Method used is Adaptive Windowing (ADWIN) Error distribution monitoring
  • 78. Clustering/novelty detection ● Assign data to known cluster or detect emerging concept ● Multiple algorithms available: OLINDDA, MINAS, ECSMiner, and GC3 ● Susceptible to curse of dimensionality ● Does not detect population level changes
  • 79. Monitors individual feature separately at every window of data Use PCA to reduce number of features Pearson correlation in Change of Concept Algorithms to compare: Hellinger Distance in HDDDM Feature distribution monitoring
  • 80. Model-dependent monitoring ● Concentrate efforts near decision margin in latent space ● One algorithm is Margin Density Drift Detection (MD3) ● Area in latent space where classifiers have low confidence matter more ● Reduces false alarm rate effectively
  • 81. Google Cloud AI Continuous Evaluation ● Leverages AI Platform Prediction and Data Labeling services ● Deploy your model to AI Platform Prediction with model version ● Create evaluation job ● Input and output are saved in BigQuery table ● Run evaluation job on few of these samples ● View the evaluation metrics on Google Cloud console
  • 82. How often should you retrain? ● Depends on the rate of change ● If possible, automate the management of detecting model drift and triggering model retraining
  • 83. How often should you retrain?