SlideShare ist ein Scribd-Unternehmen logo
1 von 90
Downloaden Sie, um offline zu lesen
Practical Considerations for Interactive AI: Robustness, Privacy,
Fairness, Transparency
Tom Diethe
tdiethe@amazon.com
Interactive AI CDT Winter School
January 29 2020
Outline
1 Interactive AI at Amazon
2 Robustness & Transparency via Continual Learning
Bayesian Continual Learning
Continual Learning in Practice
3 Algorithmic Privacy
Differential Privacy
Privacy for Text
Experiments on Text Data
Optimizing the Privacy Utility Trade-off
DPareto experiments
4 Algorithmic Fairness
5 Summary
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 1 / 44
Outline
1 Interactive AI at Amazon
2 Robustness & Transparency via Continual Learning
3 Algorithmic Privacy
4 Algorithmic Fairness
5 Summary
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 2 / 44
Interactive AI at Amazon
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 3 / 44
Alexa AI
What is Alexa?
A cloud-based voice service that can help
you with tasks, entertainment, general
information, shopping, and more
The more you talk to Alexa, the more
Alexa adapts to your speech patterns,
vocabulary, and personal preferences
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 4 / 44
Alexa AI
What is Alexa?
A cloud-based voice service that can help
you with tasks, entertainment, general
information, shopping, and more
The more you talk to Alexa, the more
Alexa adapts to your speech patterns,
vocabulary, and personal preferences
How do we ensure that ...
we create robust and efficient AI systems?
we ensure that the privacy of customer
data is safeguarded?
customers are treated fairly by ML
algorithms?
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 4 / 44
Failure Modes
Unintentional failures: ML system produces a formally correct but completely unsafe
outcome
Outliers/anomalies
Dataset shift
Limited memory
Intentional failures: failure is caused by an active adversary attempting to subvert the
system to attain her goals, such as to:
misclassify the result
infer private training data
steal the underlying algorithm
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 5 / 44
Outline
1 Interactive AI at Amazon
2 Robustness & Transparency via Continual Learning
Bayesian Continual Learning
Continual Learning in Practice
3 Algorithmic Privacy
4 Algorithmic Fairness
5 Summary
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 6 / 44
FX (xt1 , . . . , xtn ) = FX (xt1+τ , . . . , xtn+τ )
for all τ, t1, . . . , tn
for all n ∈ N
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 7 / 44
Sagemaker
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 8 / 44
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
Bayesian Continual Learning [Nguyen 2018]
Given e.g. data in task t as Dt = x
(nt )
t , y
(nt )
t
Nt
n=1
, parameters θ (e.g. BLR, BNN, GP ...)
p(θ|D1:T ) ∝ p(θ)p(D1:T |θ)
= p(θ)
T
t−1
NT
n=1
p y
(nt )
t |θ, x
(nt )
t
= p(θ|D1:T−1)p(DT |θ).
Natural recursive algorithm!
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 10 / 44
Bayesian Continual Learning [Nguyen 2018]
Given e.g. data in task t as Dt = x
(nt )
t , y
(nt )
t
Nt
n=1
, parameters θ (e.g. BLR, BNN, GP ...)
p(θ|D1:T ) ∝ p(θ)p(D1:T |θ)
= p(θ)
T
t−1
NT
n=1
p y
(nt )
t |θ, x
(nt )
t
= p(θ|D1:T−1)p(DT |θ).
Natural recursive algorithm!
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 10 / 44
Generative models in continual learning
Generative models in continual learning. Task i consists of items of class i and generated samples from the previous task;
the goal is to generate samples from all previously seen classes
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 11 / 44
Why is this Useful?
Fashion-MNIST examples generated
by a Wasserstein GAN in Bayesian
continual learning
Generative models play an important role in
mitigating this, as they can be used to generate
samples of previous tasks [Wu 2018], a method
known as generative replay
For deep learning models this is a form of
transparency: a window onto what the model has
learnt
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 12 / 44
Engineering a Continual Learning System
Automating Data Retention Policies:
Sketcher/Compressor: when the data rate is too high
Joiner: when labels arrive late
Shared infrastructure: optimal use of space, like an OS cache
Automating Monitoring and Quality Control:
Data monitoring: dataset shift detection, anomaly detection
Prediction monitoring: monitor performance of models
Automating the ML Life-Cycle:
Trainer and HPO: store provenance, warm start training
Model policy engine: ensure re-training performed at right cadence
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 13 / 44
“Zero-Touch” Machine Learning
Model Policy
Engine
Streams
Model
Stream
Trainer
HPO
Data
Statistics
Data Monitoring
Anomaly Detection,
Distribution Shift
Measurement
Retrain
Rollback
Prediction
statistics
Prediction
Statistics
Prediction
Monitoring
Accuracy, Shift
Predictor
Business Metrics
Business Logic
Business metrics
Costs
Desired accuracy
Joiner
System State
DB
Diagnostic
Logs
Sketcher/
Sampler
Predictions
Predictions
Shared Infrastructure
Model DB
Training Data
Reservoir
Validation Data
Reservoir
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 14 / 44
Summary: Continual Learning
Continual Learning
Bayesian methods are a natural fit for continual learning
However it’s tricky to make them work well with deep learning methods
Engineering viewpoint is also required
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 15 / 44
Outline
1 Interactive AI at Amazon
2 Robustness & Transparency via Continual Learning
3 Algorithmic Privacy
Differential Privacy
Privacy for Text
Experiments on Text Data
Optimizing the Privacy Utility Trade-off
DPareto experiments
4 Algorithmic Fairness
5 Summary
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 16 / 44
A first attempt: Can’t I just anonymize my data?
k-anonymity: information for each person cannot be distinguished from at least k − 1
individuals whose information also appear in the release
Suppose a company is audited for salary discrimination
The auditor can see salaries by gender, age and nationality for each department and office
If the auditor has a friend, an ex, a date, working for the company she will learn the salary
of that person
Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case)
Office Dept. Salary D.O.B. Nationality Gender
London IT £##### May 1985 Portuguese Female
Still presents risk of re-identification!. If there are 10 females born between 80-85 in the
whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 17 / 44
A first attempt: Can’t I just anonymize my data?
k-anonymity: information for each person cannot be distinguished from at least k − 1
individuals whose information also appear in the release
Suppose a company is audited for salary discrimination
The auditor can see salaries by gender, age and nationality for each department and office
If the auditor has a friend, an ex, a date, working for the company she will learn the salary
of that person
Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case)
Office Dept. Salary D.O.B. Nationality Gender
London IT £##### May 1985 Portuguese Female
Still presents risk of re-identification!. If there are 10 females born between 80-85 in the
whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 17 / 44
A first attempt: Can’t I just anonymize my data?
k-anonymity: information for each person cannot be distinguished from at least k − 1
individuals whose information also appear in the release
Suppose a company is audited for salary discrimination
The auditor can see salaries by gender, age and nationality for each department and office
If the auditor has a friend, an ex, a date, working for the company she will learn the salary
of that person
Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case)
Office Dept. Salary D.O.B. Nationality Gender
UK IT £##### 1980-1985 - Female
Still presents risk of re-identification!. If there are 10 females born between 80-85 in the
whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 17 / 44
Anonymized Data Isn’t
Example 1: Mid 1990’s: Massachusetts “Group Insurance Commission” released
“anonymized” data on state employees that showed every hospital visit
Goal was to help researchers. Removed all obvious identifiers such as name, address, and
social security number
MIT PhD student Latanya Sweeney decided to attempt to reverse the anonymization,
requested a copy of the data
Reidentification
William Weld, then Governor of Massachusetts, assured the public that GIC had protected
patient privacy by deleting identifiers. Sweeney started hunting for the Governor’s hospital
records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts,
population 54,000 and 7 ZIP codes. For $20, she purchased the complete voter rolls from the
city of Cambridge, containing the name, address, ZIP code, birth date, and gender of every
voter. Crossing this with the GIC records, Sweeney found Governor Weld with ease: Only 6
people shared his birth date, only 3 of them men, and of them, only he lived in his ZIP code.
Sweeney sent the Governor’s health records (including diagnoses and prescriptions) to his office.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 18 / 44
Anonymized Data Isn’t
Example 1: Mid 1990’s: Massachusetts “Group Insurance Commission” released
“anonymized” data on state employees that showed every hospital visit
Goal was to help researchers. Removed all obvious identifiers such as name, address, and
social security number
MIT PhD student Latanya Sweeney decided to attempt to reverse the anonymization,
requested a copy of the data
Reidentification
William Weld, then Governor of Massachusetts, assured the public that GIC had protected
patient privacy by deleting identifiers. Sweeney started hunting for the Governor’s hospital
records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts,
population 54,000 and 7 ZIP codes. For $20, she purchased the complete voter rolls from the
city of Cambridge, containing the name, address, ZIP code, birth date, and gender of every
voter. Crossing this with the GIC records, Sweeney found Governor Weld with ease: Only 6
people shared his birth date, only 3 of them men, and of them, only he lived in his ZIP code.
Sweeney sent the Governor’s health records (including diagnoses and prescriptions) to his office.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 18 / 44
Anonymized Data Isn’t
Example 2: In 2006, Netflix released data pertaining to how 500,000 of its users rated
movies over a six-year period
Netflix “anonymized” the data before releasing it by removing usernames, but assigned
unique identification numbers to users in order to allow for continuous tracking of user
ratings and trends
Reidentification
Researchers used this information to uniquely identify individual Netflix users by crossing the
data with the public IMDB database. According to the study, if a person has information about
when and how a user rated six movies, that person can identify 99% of people in the Netflix
database.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 19 / 44
Anonymized Data Isn’t
Example 2: In 2006, Netflix released data pertaining to how 500,000 of its users rated
movies over a six-year period
Netflix “anonymized” the data before releasing it by removing usernames, but assigned
unique identification numbers to users in order to allow for continuous tracking of user
ratings and trends
Reidentification
Researchers used this information to uniquely identify individual Netflix users by crossing the
data with the public IMDB database. According to the study, if a person has information about
when and how a user rated six movies, that person can identify 99% of people in the Netflix
database.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 19 / 44
Differential Privacy
A randomised mechanism M : X → Y is -differentially private if for all neighbouring inputs
x x (i.e. x − x 1 = 1) and for all sets of outputs E ⊆ Y we have
P[M(x) ∈ E] ≤ e P M x ∈ E
0 5 10 15 20 25
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Ratio bounded by e
M(D)
M(D')
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 20 / 44
Differential Privacy
A randomised mechanism M : X → Y is -differentially private if for all neighbouring inputs
x x (i.e. x − x 1 = 1) and for all sets of outputs E ⊆ Y we have
P[M(x) ∈ E] ≤ e P M x ∈ E
0 5 10 15 20 25
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Ratio bounded by e
M(D)
M(D')
Mechanisms:
Randomised response −→ plausible
deniability
Laplace mechanism: e.g. ˜µ = µ + ξ,
ξ ∼ Lap 1
n
Output perturbation
...
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 20 / 44
Randomized Response [Warner ’65]
Say you want to release a bit x ∈ {Yes, No}. Do the following:
1 flip a coin
2 if tails, respond truthfully with x
3 if heads, flip a second coin and respond “Yes” if heads; respond “No” if tails
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 21 / 44
Randomized Response [Warner ’65]
Say you want to release a bit x ∈ {Yes, No}. Do the following:
1 flip a coin
2 if tails, respond truthfully with x
3 if heads, flip a second coin and respond “Yes” if heads; respond “No” if tails
Claim: Above algorithm satisfies (log 3)-differential privacy
Pr[Response = Yes|x = Yes]
Pr[Response = Yes|x = No]
=
1/2 × 1 + 1/2 × 1/2
1/2 × 0 + 1/2 × 1/2
=
3/4
1/4
= 3 =⇒ e = 3
Same for Pr[Response=No|x=Yes]
Pr[Response=No|x=No] .
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 21 / 44
Important Properties
Robustness to post-processing: M is ( , δ)-DP, then f (M) is ( , δ)-DP
Composition: if M1, . . . , Mn are ( , δ)-DP, then g (M1, . . . , Mn) is
( n
i=1 i , n
i=1 δi )-DP
Protects against arbitrary side knowledge
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 22 / 44
User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
Desired Functionality
Intent Query x Modified Query x
GetWeather Will it be colder in Cleveland Will it be colder in Ohio
PlayMusic Play Cantopop on lastfm Play C-pop on lastfm
BookRestaurant Book a restaurant in Milladore Book a restaurant in Wood County
SearchCreativeWork I want to watch Manthan film I want to watch Hindi film
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 24 / 44
Word Embeddings
Mapping from words into vectors of real numbers (many ways to do this!)
e.g. Neural network based models (e.g. Word2Vec, GloVe, fastText)
Defines a mapping φ : W → Rn
Nearest neigbours are often synonyms
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 25 / 44
Metric Differential Privacy
Recall the definition of DP ...
P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1
= 1
This can be rewritten into a single equation as:
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e x−x 1
Metric differential privacy generalises this to use any valid metric d(x, x ):
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e d(x,x )
(easy to see that standard DP is metric DP with d(x, x ) = x − x 1)
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 26 / 44
Metric Differential Privacy
Recall the definition of DP ...
P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1
= 1
This can be rewritten into a single equation as:
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e x−x 1
Metric differential privacy generalises this to use any valid metric d(x, x ):
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e d(x,x )
(easy to see that standard DP is metric DP with d(x, x ) = x − x 1)
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 26 / 44
Metric Differential Privacy
Recall the definition of DP ...
P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1
= 1
This can be rewritten into a single equation as:
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e x−x 1
Metric differential privacy generalises this to use any valid metric d(x, x ):
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e d(x,x )
(easy to see that standard DP is metric DP with d(x, x ) = x − x 1)
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 26 / 44
Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020]
Given:
w ∈ W: word to be “privatised” from word space W (dictionary)
φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn
)
v = φ(w): corresponding word vector
d : Z × Z → R: distance function in embedding space
Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1
n , i = 1, ..., n for Rn
)
Metric DP Mechanism for word embeddings
1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( )
2 The new vector v will not be a word (a.s.)
3 Project back to W: w = arg minw∈W d(v , φ(w)), return w
What do we need?
d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle)
A way to sample using Ω in the metric space that respects d and gives us -metric DP
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 27 / 44
Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020]
Given:
w ∈ W: word to be “privatised” from word space W (dictionary)
φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn
)
v = φ(w): corresponding word vector
d : Z × Z → R: distance function in embedding space
Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1
n , i = 1, ..., n for Rn
)
Metric DP Mechanism for word embeddings
1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( )
2 The new vector v will not be a word (a.s.)
3 Project back to W: w = arg minw∈W d(v , φ(w)), return w
What do we need?
d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle)
A way to sample using Ω in the metric space that respects d and gives us -metric DP
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 27 / 44
Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020]
Given:
w ∈ W: word to be “privatised” from word space W (dictionary)
φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn
)
v = φ(w): corresponding word vector
d : Z × Z → R: distance function in embedding space
Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1
n , i = 1, ..., n for Rn
)
Metric DP Mechanism for word embeddings
1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( )
2 The new vector v will not be a word (a.s.)
3 Project back to W: w = arg minw∈W d(v , φ(w)), return w
What do we need?
d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle)
A way to sample using Ω in the metric space that respects d and gives us -metric DP
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 27 / 44
UTILITYPRIVACY
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 28 / 44
Example: Differentially Private SGD
Algorithm 1: Differentially Private SGD
Input: dataset z = (z1, . . . , zn)
Hyperparameters: learning rate η, mini-batch size m, number of epochs T, noise variance
σ2, clipping norm L
Initialize w ← 0
for t ∈ [T] do
for k ∈ [n/m] do
Sample S ⊂ [n] with |S| = m uniformly at random
Let g ← 1
m j∈S clipL( (zj , w)) + 2L
m N(0, σ2I)
Update w ← w − ηg
return w
5+ hyper-parameters affecting both privacy and utility
For deep learning applications we only have empirical utility (not analyitic)
How do we find the hyperparameters that give us an optimal trade-off?
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 29 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
DPareto
DPareto
Repeat:
1 For each objective (privacy, utility):
1 Fit a surrogate model (Gaussian process (GP)) using the available dataset
2 Calculate the predictive distribution using the GP mean and variance functions
2 Use the posterior of the surrogate models to form an acquisition function
3 Collect the next point at the estimated global max. of the acquisition function
until budget exhausted
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 32 / 44
DPareto vs Random Sampling
28
)
20
22
24
26
28
Sampled points
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
PFhypervolume
Hypervolume Evolution
MLP1 (RS)
MLP1 (BO)
MLP2 (RS)
MLP2 (BO)
10−1
100
101
ε
0.0
0.2
0.4
0.6
0.8
1.0
Classificationerror
MLP2 Pareto Fronts
Initial
+256 RS
+256 BO
10−1
100
101
ε
0.16
0.18
0.20
0.22
0.24
Classificationerror
LogReg+SGD Samples
1500 RS
256 BO
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 33 / 44
Summary: Privacy Enhancing Technologies
Privacy
Privacy risks can be counter-intuitive and tricky to formalize
High-dimensional data and side knowledge make privacy hard
Semantic guarantees (eg. DP) behave better than syntactic ones (eg.
k-anonymization)
Differential privacy is a mature privacy enhancing technology
Metric DP provides local plausible deniability, accuracy can be good even in
cases with an infinite number of outcomes
Empirical privacy-utility trade-off evaluation enables application-specific decisions
Bayesian optimization provides computationally efficient method to recover the
Pareto front (esp. with large number of hyper-parameters)
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 34 / 44
Outline
1 Interactive AI at Amazon
2 Robustness & Transparency via Continual Learning
3 Algorithmic Privacy
4 Algorithmic Fairness
5 Summary
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 35 / 44
The Need for Algorithmic Fairness
Risks:
1 ML predictors might discriminate against groups of individuals protected by law or by ethics
2 choosing a model that minimizes the expected loss may be good for the majority population,
but overlooks the minority populations
Examples: image classification [Buolamwini & Gebru, 2018] and natural language tasks
[Bolukbasi et al., 2016]
Causes:
1 training data may contain biases
2 the analysis of the training data may inadvertently introduce biases
3 Unlike privacy, there’s no single agreed on definition!
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 36 / 44
Statistical Bias
Definition: The difference between an estimator’s expected value and the true value
Is statistical bias an adequate fairness criterion?
“The model summarises the data correctly, if the data is biased it’s not the algorithm’s
fault”
Says nothing about the distribution of errors (variance of estimator)
Biases are inevitable! Take ownership ...
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
Statistical Bias
Definition: The difference between an estimator’s expected value and the true value
Is statistical bias an adequate fairness criterion?
“The model summarises the data correctly, if the data is biased it’s not the algorithm’s
fault”
Says nothing about the distribution of errors (variance of estimator)
Biases are inevitable! Take ownership ...
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
Statistical Bias
Definition: The difference between an estimator’s expected value and the true value
Is statistical bias an adequate fairness criterion?
“The model summarises the data correctly, if the data is biased it’s not the algorithm’s
fault”
Says nothing about the distribution of errors (variance of estimator)
Biases are inevitable! Take ownership ...
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
Statistical Bias
Definition: The difference between an estimator’s expected value and the true value
Is statistical bias an adequate fairness criterion?
“The model summarises the data correctly, if the data is biased it’s not the algorithm’s
fault”
Says nothing about the distribution of errors (variance of estimator)
Biases are inevitable! Take ownership ...
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
Statistical Bias
Definition: The difference between an estimator’s expected value and the true value
Is statistical bias an adequate fairness criterion?
“The model summarises the data correctly, if the data is biased it’s not the algorithm’s
fault”
Says nothing about the distribution of errors (variance of estimator)
Biases are inevitable! Take ownership ...
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
Calibration
Calibrated Classifier [Dawid 1982]
“a forecaster is well calibrated if, for example, of those events to which he assigns a
probability 30 percent, the long-run proportion that actually occurs turns out to be 30
percent"
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 38 / 44
Calibration
α-Accuracy: If we do not want a predictor f not to downplay S ⊆ X, we require it to be
(approx.) unbiased over S for some small α ∈ [0, 1]:
|Ei∼S (fi − p∗
i )| ≤ α
α-Calibration: for any v ∈ [0, 1], let Sv = {i ∈ S : fi = v}, then:
|Ei∼Sv (fi − p∗
i )| = |v − Ei∼Sv (p∗
i )| ≤ α
i.e. we are calibrated for all but a small number of items α.
Weakness: Guarantees too coarse. E.g. assign every member in S the value Ei∼S (p∗
i ).
The is perfectly calibrated, but “qualified” members of S with large p∗
i will be hurt.
Typically this is applied over large disjoint sets - e.g. race or gender.
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 39 / 44
Multicalibration [Herbert-Johnson 2018]
Stronger notion: ensure calibration on every subpopulation (including qualified members
from before). But ... requires perfect predictions!
Need an intermediary definition that balances protecting subgroups vs information
bottleneck of small samples
Multicalibration Definition
“A predictor f is multicalibrated w.r.t. a family of subpopulations C if it is
calibrated w.r.t. every S ∈ C”, where C are computationally-identifiable subsets
Let C ⊆ 2X be a collection of subsets of X and α ∈ [0, 1]. A predictor f is
(C, α)-multicalibrated if for all S ∈ C, f is α-calibrated w.r.t. S.
Think of C as a collection of subpopulations where set membership can be determined
efficiently, e.g. through boolean operations or by small decision trees
C can be quite rich, with many overlapping subgroups of a protected group S
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 40 / 44
Summary: Algorithmic Fairness
Multicalibration
One particular notion of algorithmic fairness
Attractive since it can be run as post-hoc
But ... currently limited to small datasets
How does this interact with privacy?
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 41 / 44
Outline
1 Interactive AI at Amazon
2 Robustness & Transparency via Continual Learning
3 Algorithmic Privacy
4 Algorithmic Fairness
5 Summary
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 42 / 44
Summary
www.mbmlbook.com
Interactive AI requires more than just smart algorithms!
Requires us to think also about robustness and ethical implications
Future work (potential CDT projects!):
Multi-calibration using random forests
Optimize the fairness–utility, privacy–utility, privacy–fairness–utility trade-offs
Build privacy and fairness directly into continual learning systems
Leverage crowdsourcing and active learning to test privacy and fairness hypotheses
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 43 / 44
Questions?
tdiethe@amazon.com
Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 44 / 44

Weitere ähnliche Inhalte

Ähnlich wie Practical Considerations for Interactive AI: Robustness, Privacy, Fairness, Transparency

Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
mark madsen
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
butest
 
Response needed 1The paper is well placed on the issues of the.docx
Response needed 1The paper is well placed on the issues of the.docxResponse needed 1The paper is well placed on the issues of the.docx
Response needed 1The paper is well placed on the issues of the.docx
audeleypearl
 

Ähnlich wie Practical Considerations for Interactive AI: Robustness, Privacy, Fairness, Transparency (20)

Master Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno SiebesMaster Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno Siebes
 
Course Title: Introduction to Machine Learning Chapter, One: Introduction
Course Title: Introduction to Machine Learning   Chapter,   One: IntroductionCourse Title: Introduction to Machine Learning   Chapter,   One: Introduction
Course Title: Introduction to Machine Learning Chapter, One: Introduction
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the data
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Introduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdfIntroduction-to-Data-Science.pdf
Introduction-to-Data-Science.pdf
 
Machine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual WorkshopMachine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual Workshop
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Be a Top Notch PM Using Data Science by Farfetch Product Leader
Be a Top Notch PM Using Data Science by Farfetch Product LeaderBe a Top Notch PM Using Data Science by Farfetch Product Leader
Be a Top Notch PM Using Data Science by Farfetch Product Leader
 
Data strategy - The Business Game Changer
Data strategy - The Business Game ChangerData strategy - The Business Game Changer
Data strategy - The Business Game Changer
 
Fantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl WeirFantastic Problems and Where to Find Them: Daryl Weir
Fantastic Problems and Where to Find Them: Daryl Weir
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
 
Response needed 1The paper is well placed on the issues of the.docx
Response needed 1The paper is well placed on the issues of the.docxResponse needed 1The paper is well placed on the issues of the.docx
Response needed 1The paper is well placed on the issues of the.docx
 
Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
A Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the EnterpriseA Journey into bringing (Artificial) Intelligence to the Enterprise
A Journey into bringing (Artificial) Intelligence to the Enterprise
 
Data Analytics Day 1.pptx
Data Analytics Day 1.pptxData Analytics Day 1.pptx
Data Analytics Day 1.pptx
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
 

Kürzlich hochgeladen

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 

Kürzlich hochgeladen (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 

Practical Considerations for Interactive AI: Robustness, Privacy, Fairness, Transparency

  • 1. Practical Considerations for Interactive AI: Robustness, Privacy, Fairness, Transparency Tom Diethe tdiethe@amazon.com Interactive AI CDT Winter School January 29 2020
  • 2. Outline 1 Interactive AI at Amazon 2 Robustness & Transparency via Continual Learning Bayesian Continual Learning Continual Learning in Practice 3 Algorithmic Privacy Differential Privacy Privacy for Text Experiments on Text Data Optimizing the Privacy Utility Trade-off DPareto experiments 4 Algorithmic Fairness 5 Summary Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 1 / 44
  • 3. Outline 1 Interactive AI at Amazon 2 Robustness & Transparency via Continual Learning 3 Algorithmic Privacy 4 Algorithmic Fairness 5 Summary Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 2 / 44
  • 4. Interactive AI at Amazon Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 3 / 44
  • 5. Alexa AI What is Alexa? A cloud-based voice service that can help you with tasks, entertainment, general information, shopping, and more The more you talk to Alexa, the more Alexa adapts to your speech patterns, vocabulary, and personal preferences Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 4 / 44
  • 6. Alexa AI What is Alexa? A cloud-based voice service that can help you with tasks, entertainment, general information, shopping, and more The more you talk to Alexa, the more Alexa adapts to your speech patterns, vocabulary, and personal preferences How do we ensure that ... we create robust and efficient AI systems? we ensure that the privacy of customer data is safeguarded? customers are treated fairly by ML algorithms? Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 4 / 44
  • 7. Failure Modes Unintentional failures: ML system produces a formally correct but completely unsafe outcome Outliers/anomalies Dataset shift Limited memory Intentional failures: failure is caused by an active adversary attempting to subvert the system to attain her goals, such as to: misclassify the result infer private training data steal the underlying algorithm Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 5 / 44
  • 8. Outline 1 Interactive AI at Amazon 2 Robustness & Transparency via Continual Learning Bayesian Continual Learning Continual Learning in Practice 3 Algorithmic Privacy 4 Algorithmic Fairness 5 Summary Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 6 / 44
  • 9. FX (xt1 , . . . , xtn ) = FX (xt1+τ , . . . , xtn+τ ) for all τ, t1, . . . , tn for all n ∈ N Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 7 / 44
  • 10. Sagemaker Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 8 / 44
  • 11. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
  • 12. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
  • 13. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
  • 14. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
  • 15. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
  • 16. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
  • 17. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
  • 18. Robustness & Transparency via Continual Learning Data arrive continually (Possibly) non-IID Tasks may change over time (e.g. trends/fashions in shopping) New tasks may emerge (e.g. new product categories, new marketplaces) Robustness How can we adapt to new data whilst retaining existing knowledge? Transparency: How can we have systems can signal they’re going wrong? Standard approaches: Train individual models on each task. Train combination Maintain single model and use regularization to fix influential parameters Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 9 / 44
  • 19. Bayesian Continual Learning [Nguyen 2018] Given e.g. data in task t as Dt = x (nt ) t , y (nt ) t Nt n=1 , parameters θ (e.g. BLR, BNN, GP ...) p(θ|D1:T ) ∝ p(θ)p(D1:T |θ) = p(θ) T t−1 NT n=1 p y (nt ) t |θ, x (nt ) t = p(θ|D1:T−1)p(DT |θ). Natural recursive algorithm! Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 10 / 44
  • 20. Bayesian Continual Learning [Nguyen 2018] Given e.g. data in task t as Dt = x (nt ) t , y (nt ) t Nt n=1 , parameters θ (e.g. BLR, BNN, GP ...) p(θ|D1:T ) ∝ p(θ)p(D1:T |θ) = p(θ) T t−1 NT n=1 p y (nt ) t |θ, x (nt ) t = p(θ|D1:T−1)p(DT |θ). Natural recursive algorithm! Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 10 / 44
  • 21. Generative models in continual learning Generative models in continual learning. Task i consists of items of class i and generated samples from the previous task; the goal is to generate samples from all previously seen classes Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 11 / 44
  • 22. Why is this Useful? Fashion-MNIST examples generated by a Wasserstein GAN in Bayesian continual learning Generative models play an important role in mitigating this, as they can be used to generate samples of previous tasks [Wu 2018], a method known as generative replay For deep learning models this is a form of transparency: a window onto what the model has learnt Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 12 / 44
  • 23. Engineering a Continual Learning System Automating Data Retention Policies: Sketcher/Compressor: when the data rate is too high Joiner: when labels arrive late Shared infrastructure: optimal use of space, like an OS cache Automating Monitoring and Quality Control: Data monitoring: dataset shift detection, anomaly detection Prediction monitoring: monitor performance of models Automating the ML Life-Cycle: Trainer and HPO: store provenance, warm start training Model policy engine: ensure re-training performed at right cadence Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 13 / 44
  • 24. “Zero-Touch” Machine Learning Model Policy Engine Streams Model Stream Trainer HPO Data Statistics Data Monitoring Anomaly Detection, Distribution Shift Measurement Retrain Rollback Prediction statistics Prediction Statistics Prediction Monitoring Accuracy, Shift Predictor Business Metrics Business Logic Business metrics Costs Desired accuracy Joiner System State DB Diagnostic Logs Sketcher/ Sampler Predictions Predictions Shared Infrastructure Model DB Training Data Reservoir Validation Data Reservoir Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 14 / 44
  • 25. Summary: Continual Learning Continual Learning Bayesian methods are a natural fit for continual learning However it’s tricky to make them work well with deep learning methods Engineering viewpoint is also required Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 15 / 44
  • 26. Outline 1 Interactive AI at Amazon 2 Robustness & Transparency via Continual Learning 3 Algorithmic Privacy Differential Privacy Privacy for Text Experiments on Text Data Optimizing the Privacy Utility Trade-off DPareto experiments 4 Algorithmic Fairness 5 Summary Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 16 / 44
  • 27. A first attempt: Can’t I just anonymize my data? k-anonymity: information for each person cannot be distinguished from at least k − 1 individuals whose information also appear in the release Suppose a company is audited for salary discrimination The auditor can see salaries by gender, age and nationality for each department and office If the auditor has a friend, an ex, a date, working for the company she will learn the salary of that person Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case) Office Dept. Salary D.O.B. Nationality Gender London IT £##### May 1985 Portuguese Female Still presents risk of re-identification!. If there are 10 females born between 80-85 in the whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 17 / 44
  • 28. A first attempt: Can’t I just anonymize my data? k-anonymity: information for each person cannot be distinguished from at least k − 1 individuals whose information also appear in the release Suppose a company is audited for salary discrimination The auditor can see salaries by gender, age and nationality for each department and office If the auditor has a friend, an ex, a date, working for the company she will learn the salary of that person Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case) Office Dept. Salary D.O.B. Nationality Gender London IT £##### May 1985 Portuguese Female Still presents risk of re-identification!. If there are 10 females born between 80-85 in the whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 17 / 44
  • 29. A first attempt: Can’t I just anonymize my data? k-anonymity: information for each person cannot be distinguished from at least k − 1 individuals whose information also appear in the release Suppose a company is audited for salary discrimination The auditor can see salaries by gender, age and nationality for each department and office If the auditor has a friend, an ex, a date, working for the company she will learn the salary of that person Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case) Office Dept. Salary D.O.B. Nationality Gender UK IT £##### 1980-1985 - Female Still presents risk of re-identification!. If there are 10 females born between 80-85 in the whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 17 / 44
  • 30. Anonymized Data Isn’t Example 1: Mid 1990’s: Massachusetts “Group Insurance Commission” released “anonymized” data on state employees that showed every hospital visit Goal was to help researchers. Removed all obvious identifiers such as name, address, and social security number MIT PhD student Latanya Sweeney decided to attempt to reverse the anonymization, requested a copy of the data Reidentification William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identifiers. Sweeney started hunting for the Governor’s hospital records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts, population 54,000 and 7 ZIP codes. For $20, she purchased the complete voter rolls from the city of Cambridge, containing the name, address, ZIP code, birth date, and gender of every voter. Crossing this with the GIC records, Sweeney found Governor Weld with ease: Only 6 people shared his birth date, only 3 of them men, and of them, only he lived in his ZIP code. Sweeney sent the Governor’s health records (including diagnoses and prescriptions) to his office. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 18 / 44
  • 31. Anonymized Data Isn’t Example 1: Mid 1990’s: Massachusetts “Group Insurance Commission” released “anonymized” data on state employees that showed every hospital visit Goal was to help researchers. Removed all obvious identifiers such as name, address, and social security number MIT PhD student Latanya Sweeney decided to attempt to reverse the anonymization, requested a copy of the data Reidentification William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identifiers. Sweeney started hunting for the Governor’s hospital records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts, population 54,000 and 7 ZIP codes. For $20, she purchased the complete voter rolls from the city of Cambridge, containing the name, address, ZIP code, birth date, and gender of every voter. Crossing this with the GIC records, Sweeney found Governor Weld with ease: Only 6 people shared his birth date, only 3 of them men, and of them, only he lived in his ZIP code. Sweeney sent the Governor’s health records (including diagnoses and prescriptions) to his office. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 18 / 44
  • 32. Anonymized Data Isn’t Example 2: In 2006, Netflix released data pertaining to how 500,000 of its users rated movies over a six-year period Netflix “anonymized” the data before releasing it by removing usernames, but assigned unique identification numbers to users in order to allow for continuous tracking of user ratings and trends Reidentification Researchers used this information to uniquely identify individual Netflix users by crossing the data with the public IMDB database. According to the study, if a person has information about when and how a user rated six movies, that person can identify 99% of people in the Netflix database. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 19 / 44
  • 33. Anonymized Data Isn’t Example 2: In 2006, Netflix released data pertaining to how 500,000 of its users rated movies over a six-year period Netflix “anonymized” the data before releasing it by removing usernames, but assigned unique identification numbers to users in order to allow for continuous tracking of user ratings and trends Reidentification Researchers used this information to uniquely identify individual Netflix users by crossing the data with the public IMDB database. According to the study, if a person has information about when and how a user rated six movies, that person can identify 99% of people in the Netflix database. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 19 / 44
  • 34. Differential Privacy A randomised mechanism M : X → Y is -differentially private if for all neighbouring inputs x x (i.e. x − x 1 = 1) and for all sets of outputs E ⊆ Y we have P[M(x) ∈ E] ≤ e P M x ∈ E 0 5 10 15 20 25 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Ratio bounded by e M(D) M(D') Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 20 / 44
  • 35. Differential Privacy A randomised mechanism M : X → Y is -differentially private if for all neighbouring inputs x x (i.e. x − x 1 = 1) and for all sets of outputs E ⊆ Y we have P[M(x) ∈ E] ≤ e P M x ∈ E 0 5 10 15 20 25 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Ratio bounded by e M(D) M(D') Mechanisms: Randomised response −→ plausible deniability Laplace mechanism: e.g. ˜µ = µ + ξ, ξ ∼ Lap 1 n Output perturbation ... Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 20 / 44
  • 36. Randomized Response [Warner ’65] Say you want to release a bit x ∈ {Yes, No}. Do the following: 1 flip a coin 2 if tails, respond truthfully with x 3 if heads, flip a second coin and respond “Yes” if heads; respond “No” if tails Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 21 / 44
  • 37. Randomized Response [Warner ’65] Say you want to release a bit x ∈ {Yes, No}. Do the following: 1 flip a coin 2 if tails, respond truthfully with x 3 if heads, flip a second coin and respond “Yes” if heads; respond “No” if tails Claim: Above algorithm satisfies (log 3)-differential privacy Pr[Response = Yes|x = Yes] Pr[Response = Yes|x = No] = 1/2 × 1 + 1/2 × 1/2 1/2 × 0 + 1/2 × 1/2 = 3/4 1/4 = 3 =⇒ e = 3 Same for Pr[Response=No|x=Yes] Pr[Response=No|x=No] . Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 21 / 44
  • 38. Important Properties Robustness to post-processing: M is ( , δ)-DP, then f (M) is ( , δ)-DP Composition: if M1, . . . , Mn are ( , δ)-DP, then g (M1, . . . , Mn) is ( n i=1 i , n i=1 δi )-DP Protects against arbitrary side knowledge Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 22 / 44
  • 39. User-AI system interaction via natural language User’s goal: meet some specific need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Differential Privacy Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
  • 40. User-AI system interaction via natural language User’s goal: meet some specific need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Differential Privacy Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
  • 41. User-AI system interaction via natural language User’s goal: meet some specific need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Differential Privacy Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
  • 42. User-AI system interaction via natural language User’s goal: meet some specific need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Differential Privacy Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
  • 43. User-AI system interaction via natural language User’s goal: meet some specific need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Differential Privacy Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
  • 44. User-AI system interaction via natural language User’s goal: meet some specific need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Differential Privacy Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 23 / 44
  • 45. Desired Functionality Intent Query x Modified Query x GetWeather Will it be colder in Cleveland Will it be colder in Ohio PlayMusic Play Cantopop on lastfm Play C-pop on lastfm BookRestaurant Book a restaurant in Milladore Book a restaurant in Wood County SearchCreativeWork I want to watch Manthan film I want to watch Hindi film Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 24 / 44
  • 46. Word Embeddings Mapping from words into vectors of real numbers (many ways to do this!) e.g. Neural network based models (e.g. Word2Vec, GloVe, fastText) Defines a mapping φ : W → Rn Nearest neigbours are often synonyms Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 25 / 44
  • 47. Metric Differential Privacy Recall the definition of DP ... P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1 = 1 This can be rewritten into a single equation as: P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e x−x 1 Metric differential privacy generalises this to use any valid metric d(x, x ): P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e d(x,x ) (easy to see that standard DP is metric DP with d(x, x ) = x − x 1) Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 26 / 44
  • 48. Metric Differential Privacy Recall the definition of DP ... P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1 = 1 This can be rewritten into a single equation as: P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e x−x 1 Metric differential privacy generalises this to use any valid metric d(x, x ): P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e d(x,x ) (easy to see that standard DP is metric DP with d(x, x ) = x − x 1) Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 26 / 44
  • 49. Metric Differential Privacy Recall the definition of DP ... P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1 = 1 This can be rewritten into a single equation as: P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e x−x 1 Metric differential privacy generalises this to use any valid metric d(x, x ): P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e d(x,x ) (easy to see that standard DP is metric DP with d(x, x ) = x − x 1) Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 26 / 44
  • 50. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020] Given: w ∈ W: word to be “privatised” from word space W (dictionary) φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn ) v = φ(w): corresponding word vector d : Z × Z → R: distance function in embedding space Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1 n , i = 1, ..., n for Rn ) Metric DP Mechanism for word embeddings 1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( ) 2 The new vector v will not be a word (a.s.) 3 Project back to W: w = arg minw∈W d(v , φ(w)), return w What do we need? d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle) A way to sample using Ω in the metric space that respects d and gives us -metric DP Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 27 / 44
  • 51. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020] Given: w ∈ W: word to be “privatised” from word space W (dictionary) φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn ) v = φ(w): corresponding word vector d : Z × Z → R: distance function in embedding space Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1 n , i = 1, ..., n for Rn ) Metric DP Mechanism for word embeddings 1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( ) 2 The new vector v will not be a word (a.s.) 3 Project back to W: w = arg minw∈W d(v , φ(w)), return w What do we need? d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle) A way to sample using Ω in the metric space that respects d and gives us -metric DP Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 27 / 44
  • 52. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020] Given: w ∈ W: word to be “privatised” from word space W (dictionary) φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn ) v = φ(w): corresponding word vector d : Z × Z → R: distance function in embedding space Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1 n , i = 1, ..., n for Rn ) Metric DP Mechanism for word embeddings 1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( ) 2 The new vector v will not be a word (a.s.) 3 Project back to W: w = arg minw∈W d(v , φ(w)), return w What do we need? d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle) A way to sample using Ω in the metric space that respects d and gives us -metric DP Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 27 / 44
  • 53. UTILITYPRIVACY Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 28 / 44
  • 54. Example: Differentially Private SGD Algorithm 1: Differentially Private SGD Input: dataset z = (z1, . . . , zn) Hyperparameters: learning rate η, mini-batch size m, number of epochs T, noise variance σ2, clipping norm L Initialize w ← 0 for t ∈ [T] do for k ∈ [n/m] do Sample S ⊂ [n] with |S| = m uniformly at random Let g ← 1 m j∈S clipL( (zj , w)) + 2L m N(0, σ2I) Update w ← w − ηg return w 5+ hyper-parameters affecting both privacy and utility For deep learning applications we only have empirical utility (not analyitic) How do we find the hyperparameters that give us an optimal trade-off? Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 29 / 44
  • 55. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 56. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 57. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 58. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 59. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 60. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 61. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 62. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 63. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 30 / 44
  • 64. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 65. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 66. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 67. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 68. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 69. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 70. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 71. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 72. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 73. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 31 / 44
  • 74. DPareto DPareto Repeat: 1 For each objective (privacy, utility): 1 Fit a surrogate model (Gaussian process (GP)) using the available dataset 2 Calculate the predictive distribution using the GP mean and variance functions 2 Use the posterior of the surrogate models to form an acquisition function 3 Collect the next point at the estimated global max. of the acquisition function until budget exhausted Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 32 / 44
  • 75. DPareto vs Random Sampling 28 ) 20 22 24 26 28 Sampled points 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 PFhypervolume Hypervolume Evolution MLP1 (RS) MLP1 (BO) MLP2 (RS) MLP2 (BO) 10−1 100 101 ε 0.0 0.2 0.4 0.6 0.8 1.0 Classificationerror MLP2 Pareto Fronts Initial +256 RS +256 BO 10−1 100 101 ε 0.16 0.18 0.20 0.22 0.24 Classificationerror LogReg+SGD Samples 1500 RS 256 BO Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 33 / 44
  • 76. Summary: Privacy Enhancing Technologies Privacy Privacy risks can be counter-intuitive and tricky to formalize High-dimensional data and side knowledge make privacy hard Semantic guarantees (eg. DP) behave better than syntactic ones (eg. k-anonymization) Differential privacy is a mature privacy enhancing technology Metric DP provides local plausible deniability, accuracy can be good even in cases with an infinite number of outcomes Empirical privacy-utility trade-off evaluation enables application-specific decisions Bayesian optimization provides computationally efficient method to recover the Pareto front (esp. with large number of hyper-parameters) Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 34 / 44
  • 77. Outline 1 Interactive AI at Amazon 2 Robustness & Transparency via Continual Learning 3 Algorithmic Privacy 4 Algorithmic Fairness 5 Summary Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 35 / 44
  • 78. The Need for Algorithmic Fairness Risks: 1 ML predictors might discriminate against groups of individuals protected by law or by ethics 2 choosing a model that minimizes the expected loss may be good for the majority population, but overlooks the minority populations Examples: image classification [Buolamwini & Gebru, 2018] and natural language tasks [Bolukbasi et al., 2016] Causes: 1 training data may contain biases 2 the analysis of the training data may inadvertently introduce biases 3 Unlike privacy, there’s no single agreed on definition! Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 36 / 44
  • 79. Statistical Bias Definition: The difference between an estimator’s expected value and the true value Is statistical bias an adequate fairness criterion? “The model summarises the data correctly, if the data is biased it’s not the algorithm’s fault” Says nothing about the distribution of errors (variance of estimator) Biases are inevitable! Take ownership ... Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
  • 80. Statistical Bias Definition: The difference between an estimator’s expected value and the true value Is statistical bias an adequate fairness criterion? “The model summarises the data correctly, if the data is biased it’s not the algorithm’s fault” Says nothing about the distribution of errors (variance of estimator) Biases are inevitable! Take ownership ... Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
  • 81. Statistical Bias Definition: The difference between an estimator’s expected value and the true value Is statistical bias an adequate fairness criterion? “The model summarises the data correctly, if the data is biased it’s not the algorithm’s fault” Says nothing about the distribution of errors (variance of estimator) Biases are inevitable! Take ownership ... Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
  • 82. Statistical Bias Definition: The difference between an estimator’s expected value and the true value Is statistical bias an adequate fairness criterion? “The model summarises the data correctly, if the data is biased it’s not the algorithm’s fault” Says nothing about the distribution of errors (variance of estimator) Biases are inevitable! Take ownership ... Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
  • 83. Statistical Bias Definition: The difference between an estimator’s expected value and the true value Is statistical bias an adequate fairness criterion? “The model summarises the data correctly, if the data is biased it’s not the algorithm’s fault” Says nothing about the distribution of errors (variance of estimator) Biases are inevitable! Take ownership ... Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 37 / 44
  • 84. Calibration Calibrated Classifier [Dawid 1982] “a forecaster is well calibrated if, for example, of those events to which he assigns a probability 30 percent, the long-run proportion that actually occurs turns out to be 30 percent" Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 38 / 44
  • 85. Calibration α-Accuracy: If we do not want a predictor f not to downplay S ⊆ X, we require it to be (approx.) unbiased over S for some small α ∈ [0, 1]: |Ei∼S (fi − p∗ i )| ≤ α α-Calibration: for any v ∈ [0, 1], let Sv = {i ∈ S : fi = v}, then: |Ei∼Sv (fi − p∗ i )| = |v − Ei∼Sv (p∗ i )| ≤ α i.e. we are calibrated for all but a small number of items α. Weakness: Guarantees too coarse. E.g. assign every member in S the value Ei∼S (p∗ i ). The is perfectly calibrated, but “qualified” members of S with large p∗ i will be hurt. Typically this is applied over large disjoint sets - e.g. race or gender. Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 39 / 44
  • 86. Multicalibration [Herbert-Johnson 2018] Stronger notion: ensure calibration on every subpopulation (including qualified members from before). But ... requires perfect predictions! Need an intermediary definition that balances protecting subgroups vs information bottleneck of small samples Multicalibration Definition “A predictor f is multicalibrated w.r.t. a family of subpopulations C if it is calibrated w.r.t. every S ∈ C”, where C are computationally-identifiable subsets Let C ⊆ 2X be a collection of subsets of X and α ∈ [0, 1]. A predictor f is (C, α)-multicalibrated if for all S ∈ C, f is α-calibrated w.r.t. S. Think of C as a collection of subpopulations where set membership can be determined efficiently, e.g. through boolean operations or by small decision trees C can be quite rich, with many overlapping subgroups of a protected group S Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 40 / 44
  • 87. Summary: Algorithmic Fairness Multicalibration One particular notion of algorithmic fairness Attractive since it can be run as post-hoc But ... currently limited to small datasets How does this interact with privacy? Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 41 / 44
  • 88. Outline 1 Interactive AI at Amazon 2 Robustness & Transparency via Continual Learning 3 Algorithmic Privacy 4 Algorithmic Fairness 5 Summary Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 42 / 44
  • 89. Summary www.mbmlbook.com Interactive AI requires more than just smart algorithms! Requires us to think also about robustness and ethical implications Future work (potential CDT projects!): Multi-calibration using random forests Optimize the fairness–utility, privacy–utility, privacy–fairness–utility trade-offs Build privacy and fairness directly into continual learning systems Leverage crowdsourcing and active learning to test privacy and fairness hypotheses Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 43 / 44
  • 90. Questions? tdiethe@amazon.com Tom Diethe (Amazon) Practical Considerations for Interactive AI January 29 2020 44 / 44