Explainability for machine learning (ML) algorithms is essential when moving to production, but it’s notoriously difficult to achieve transparency and navigate changing regulations, like the EU’s new AI legislation. In this session, Seldon CEO Alex Housley dives into what recent legislation means for organisations deploying ML, how they can protect their deployments with powerful monitoring capabilities and what to consider when building a robust ML deployment pipeline. With the support of real-world use cases, Alex also gives a practical guide to approaching deployment responsibly, and how to bake monitoring into ML pipelines to minimise risk and future-proof organisations.
2. The unbundling of ML platforms
1. Tech giants build DIY ML platforms
from scratch to gain competitive
advangtage e.g. Michelangelo,
FBLearner, TFX.
2. Specialised tools emerge to solve
MLOps challenges - e.g. version
control, feature stores, CI/CD,
monitoring.
3. Cloud-native driving hybrid/multi-
cloud adoption: more control,
reduced vendor lock-in.
16/06/2021
#COGX2021
2
Hidden Technical Debt in Machine Learning Systems.
D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young,
Jean-Francois Crespo, Dan Dennison Google, NIPS 2015 Conference
Analysis Tools
Serving
Infrastructure
Monitoring
Machine
Resource
Management
Process
Management Tools
Analysis Tools
Serving
Infrastructure
Monitoring
Machine
Resource
Management
Process
Management Tools
ML
Code
Data Verification
Data Collection
Feature Extraction
Configuration
3. AI adoption is accelerating in the enterprise
16/06/2021 3
AI Adoption in the Enterprise 2021 – O’Reilly (oreilly.com)
5,154 global respondents
How Tech Stacks Up in B2B - Andreessen Horowitz (a16z.com)
Survey of technology leaders at Fortune 500, Global 2000, and
SaaS 50 companies.
#COGX2021
4. Data scientists and DevOps must collaborate to
productionise models
16/06/2021
#COGX2021
4
Siloed teams in offices and working remote. From 1 week to 6 months to deploy a model.
ML Engineer
$141k average base salary
5. Production ML has a larger surface area than data
prep and training
The nine stages of the machine learning workflow (Amershi, IEEE 2019)
5
More Metadata
#COGX2021
Data Scientists
Data Engineers ML Engineers
“Day 0” “Day 1” “Day 2”
DevOps
Product / Mgmt
6. Scaling MLOps across the organisation
16/06/2021 6
Team
– 10 users
– < 50 models
– One training system
– Minimal or team-
level restrictions
Business Unit
– 50 users
– < 200 models
– 3-4 training systems,
multiple frameworks
– Large DevOps team
– Dept level
constraints
– Role based access
Organisation
– >100 users
– Hundreds or
thousands of models
– Multiple platforms
and clouds
– Org blueprints
– Compliance
– Higher level
principles AI ethics
#COGX2021
9. Capital One created a ‘Model as a Service’ platform powered by Seldon
Case study: Capital One ‘Model as a Service’
Objectives
Improve the speed-to-market for ML
models
Lower the barrier to entry for developers
to get their models into production
Implement operational
efficiencies and economies of
scale
“With our MaaS platform running on
Seldon, we’ve gone from it taking
months to minutes to deploy or
update models.”
Steve Evangelista
Director of Project Management, Capital One
Results
–MVP in less than 90 days
–Deployment process now takes minutes instead of
months
–Versioning, vulnerability scanning, containerizing,
deployment, testing and promoting to production is all
taken care
–Use cases across business including fraud,
marketing, finance and customer service
–Rigorous compliance through model management and
monitoring
–Developers could work in any language/framework
10. Why not just wrap my models with Flask?
Flask works well in R&D until you need:
– Multiple optimized model servers
– Metrics and tracing
– Lineage and auditability
– Ingress configuration
– Complex inference graphs
(ensembles, AB tests, MABs, etc)
– Scalable solution that is battle-tested
by wide community of open-source
and commercial users
16/06/2021
ORCHESTRATION AT SCALE
11
11. Model serving: to achieve scale, you need to abstract
complex ML concepts into standardised infra components
ORCHESTRATION AT SCALE
12
Adversarial
Detector
Is the model
being attacked?
12. Leverage pre-packaged servers for framework-
agnostic model serving
– Leverage out of the box optimized
servers that wrap your model artifacts
– Enable data scientists deploy models
from their preferred framework
– Model servers are optimized for each
framework for optimal performance
– Extend existing pre-packaged servers
with simple SDKs
16/06/2021
ORCHESTRATION AT SCALE
13
Central Repository
(S3, ModelDB,...)
Model
Reusable
Server
Image Registry
13. The anatomy of e2e enterprise MLOps architectures
16/06/2021
ORCHESTRATION AT SCALE
14
14. Canary Tests, Shadows & Rolling Updates
16/06/2021
ORCHESTRATION AT SCALE
15
Remove
Revert
Models
Why does this matter?
Robust and safer testing in
production with zero
downtime to minimise risk.
Canary
90%
10%
Promote
100%
Create
100%
Resource
requests/limits
Autoscaling
spec
15. Tempo: open-source MLOps SDK for data scientists
16/06/2021
ORCHESTRATION AT SCALE
16
https://github.com/SeldonIO/tempo
Powerful Inference
Orchestration Logic
Pluggable Runtimes
Custom Python
Components
● Create custom business logic for
models.
● Use any python expressions/libraries to
orchestrate component requests.
Data Science
Friendly
● Allow any data science library to be used
easily. E.g.,
○ Custom Models
○ Alibi Explainers
○ Alibi-Detect Outlier Models
○ Multi-Armed Bandits
● Local testing before hand-off to
production
● Python first with output to YAML
● Extendable runtimes.
○ Seldon Deploy
○ Seldon Core
○ Docker with Seldon Containers
○ KFServing
17. Case Study: Microsoft & Philips Clinical Drift
Monitoring During Covid-19
• ICUs having to make difficult decisions to
optimize patient health outcomes.
• Built models to predict outcomes such as
patient mortality, length of ventilation,
length of stay.
• Challenges: catching changes to model
performance; time-intensive and
computationally expensive training pipeline.
• Solution needed to be scalable, repeatable
and secure: Azure Databricks, Azure
DevOps and Alibi Detect.
16/06/2021
MONITORING AT SCALE
18
18. Making your organisation proactive rather than
reactive
16/06/2021
MONITORING AT SCALE
19
Service Metrics Statistical Performance
Drift and outliers
Explainability
19. Service Metrics
– Microservice metrics such as requests
per second, latency, CPU usage,
memory usage, etc
– Performance monitoring leveraging
Prometheus and ELK
– Seldon Deploy Configures Metrics with
Prometheus
16/06/2021
MONITORING AT SCALE
20
Model A
API
(REST,
gRPC,
Kafka)
Request Logs
Tracing
Production microservice
From model weights
Model Metrics
Why does this matter?
Manage compute costs
and response times
associated with SLAs.
20. Statistical Monitoring
– Monitor the impact on business KPIs
– Advanced metrics exposed directly by model
servers
– Metrics can be calculated using “feedback”
– Custom metrics can be added by extending
metrics servers
16/06/2021
MONITORING AT SCALE
21
Model A
Metrics Server
Sends
Feedback
Reads inference
data
Statistical Metrics
Stores inference
data
Sends
inference
data
Sends
correct label
Request routing via
cloudevent KNative
infrastructure
Why does this matter?
Understand and monitor
the impact on your
business KPIs.
21. Outlier Monitoring
– Detecting anomalies in data instances
and flagging/alerting
– Identifying potential metadata that could
help diagnose outliers
– Do outliers indicate there’s an issue with
the model or data?
– Outlier detection runs as a separate
component and can receive input and
prediction data from model
16/06/2021
MONITORING AT SCALE
22
Model A
Outlier
Detector
Server
Sends model
input data
Stores
inference data
Sends
inference
data
Request routing via
cloudevent KNative
infrastructure
Stores
Outlier Data
Request +
outlier data
available
Why does this matter?
Outliers are more like to
have a negative impact if
acted upon automatically.
22. Drift
– Over time, live data in production
environments differs from the process
that generated the training data.
– Model performance during
deployment no longer matches that
observed on held out training data.
– Goal is to identify drifts in data
distribution and relationships between
input and output.
16/06/2021
MONITORING AT SCALE
23
Why does this matter?
Model performance has a
direct correlation with
business value or safety in
some use cases.
23. Challenges of online drift detection
– In production, data points arrive in
sequence – and we need to detect
drift ASAP
– So how do we decide whether
fluctuations are due to drift or natural
fluctuations?
– Statistical hypothesis testing
– Windowing strategies
16/06/2021
MONITORING AT SCALE
24
Why does this matter?
Detecting drift at the right
time enable you to improve
performance and reduce
costs. Request routing via
cloudevent KNative
infrastructure
Model A
Drift
Detector
Server
Sends model
input data Sends
inference
data
Drift Metrics
25. Case Study: Explainability for Insurance
16/06/2021
EXPLAINABILITY AT SCALE
26
Context
Explainability is a critical requirement for all production models.
Operations staff require models to be interpretable to justify algorithmic decisions.
Before Seldon Deploy
Advanced algorithms can not be deployed to production due to a lack of interpretability.
After Seldon Deploy
Improvements to claims automation and payments processing can be realised as these
models can now be made interpretable.
26. ML models are a black box
● Lending decision
(yes/no)
● Medical diagnosis
● Credit applicant
data
● Medical image
Model
EXPLAINABILITY AT SCALE
27. Why explain machine learning models?
– Build trust in machine learning outputs
– Increase transparency
– Improve the customer experience
– Check for bias
– Gain insights for data scientists to
understand how models are working
– Avoid damage to business reputation
– Meet regulatory requirements
16/06/2021
EXPLAINABILITY AT SCALE
28
Why does this matter?
Lack of explainability is one
of the biggest blockers to
production ML and causes
of risk in organisations
28. Explaining model predictions
Types of explanations
– By scope (local vs global)
– By model type (black-box vs white-box)
– By task (classification, regression, structured prediction)
– By data type (tabular, images, text…)
– By insight (feature attributions, counterfactuals, influential training instances…)
16/06/2021 29
Image credit: Scott Lundberg (https://github.com/slundberg/shap)
Image credit: Barshan et al., RelatIF: Identifying Explanatory
Training Examples via Relative Influence (2020)
EXPLAINABILITY AT SCALE
29. How can we explain the black-box?
Anchors
Feature Attribution: what input subsets are necessary for a prediction to hold? [1]
[1] Ribeiro et al., Anchors: High-Precision Model-Agnostic Explanations (2018)
16/06/2021 30
Image source: Alibi Explain repository home page
EXPLAINABILITY AT SCALE
30. How can we explain the black-box?
Counterfactuals
How can you (minimally) change input to obtain a desired prediction? [2, 3]
[2] Wachter et al., Counterfactual Explanations without Opening the Black Box (2017)
[3] Van Looveren A., Klaise J. Interpretable Counterfactual Explanations Guided by Prototypes (2018)
16/06/2021 31
a) Images of digits minimally altered to
change a classifier’s prediction
b) A person’s attributes minimally altered
to change a classifier’s prediction (low
income to high income)
EXPLAINABILITY AT SCALE
34. Explainability Monitoring
– Explanations are useful when paired
with a monitoring system. For
example, explain why a outlier may
have occurred.
– View model explanations on UI
– Trigger explanations for specific
requests on-demand
– Close integration with auditing
16/06/2021 35
EXPLAINABILITY AT SCALE
36. Critical infrastructure increasingly depend on ML
systems
The impact of a bad solution can be worse than no solution at all
16/06/2021
#COGX2021
37
Cybersecurity Attacks
Misuse of personal data
Software Outages
Algorithmic Bias
37. Range of varying strategies at a national level
16/06/2021 38
GOVERNANCE AT SCALE
38. Mapping Global AI Ethics
16/06/2021 39
Harvard. 2020. Principled Artificial Intelligence. [ONLINE] Available at:
https://cyber.harvard.edu/publication/2020/principled-ai. [Accessed 21 October 2020].
GOVERNANCE AT SCALE
39. EU AI Regulation
What does it mean?
– Emphasis on “trustworthy AI”
– Categorising risk. Regulating “high risk” AI
(e.g. autonomous driving) and prohibiting
uses (e.g. mass social scoring).
– Currently focuses more on e2e systems,
which would apply for the platforms applied AI
projects built within organisations.
– Post-market monitoring of AI systems to
evaluate the continued compliance with
regulation.
Timespan: expect 2 years given EU leaders want
it to be fast-tracked.
40
GOVERNANCE AT SCALE
40. Principles for Trusted AI
The 8 LFAI Principles for Trusted AI (R)REPEATS
16/06/2021 41
Robustness Privacy
Reproducibility Equitability
Accountability
Explainability Transparency Security
Adopted by Open Source Projects
GOVERNANCE AT SCALE
41. Alignment between capabilities
and governance, compliance & AI ethics
16/06/2021 42
Robustness Privacy
Reproducibility Equitability
Accountability
Explainability Transparency Security
Model
metadata
Request
logging
Language
wrappers
OpenAPI
Schema
APIs
Prepack.
servers
Out-of-
the-box
prom
metrics
Explainer
compo-
nents
Metrics
monitor-
ing
RBAC via
service
account
Historical
feedback
labelling
Namesp
aced
access
Auth via
Ingress
Explainer
compo-
nents
Metrics
monitor-
ing
Historical
feedback
labelling
GitOps
integrati-
on
Request
logging
Model
Metadata
Request
logging
Metrics
monitor-
ing
Model
Metadata
Auth via
Ingress
RBAC via
service
account
Model
Metadata
GOVERNANCE AT SCALE
42. Programmatic governance with open & closed source
as policy
16/06/2021 43
Open & Closed Source
Tools & Frameworks
3
Regulation, Compliance,
Organisational Policy
GDPR, ISO, etc.
2
Ethics Frameworks,
Principles, Guidelines
LF AI Principles
1
GOVERNANCE AT SCALE
Ensuring principles by design which can map into higher level
organisational principles and policies
43. Model Metadata Store
16/06/2021 44
GOVERNANCE AT SCALE
GitOps
Deploy Metadata
Store
External customer
metadata store
Discover “find available models”
Enrich “Add metadata to models”
Lineage/Audit “Check model history”
Artifact
Store
Metadata extraction
from artifacts
Model
Explainer
Drift Detector
Outlier
Detector
Automated
Why does this matter?
Ensure proper governance,
auditing and discoverability of
models for better compliance
and risk management
46. Final thoughts
– As practitioners, we have a growing
professional responsibility to our craft
– Democratisation through COSS and
cloud-native tools
– Engage your peers in discussions
about Responsible AI
– Map Trusted AI principles to your
roadmap requirements
16/06/2021 47
47. Get access to production machine learning at scale
– Connect with us at #CogX2021
– Product demos at our virtual booth
– Free trials for delegates
16/06/2021 48