Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Differential Privacy
And Machine Learning
Aaron Roth
March 24, 2017
Protecting Privacy is Important
Class action lawsuit accuses AOL of violating the Electronic Communications
Privacy Act, s...
Protecting Privacy is Important
Class action lawsuit (Doe v. Netflix) accuses Netflix of violating the Video Privacy
Prote...
Protecting Privacy is Important
The National Human Genome Research Institute (NHGRI) immediately restricted
pooled genomic...
So What is Differential Privacy?
So What is Differential Privacy?
• Differential Privacy is about promising people freedom from
harm.
“An analysis of a dat...
D
Differential Privacy
[Dwork-McSherry-Nissim-Smith 06]
Algorithm
Pr [r]
ratio bounded
Alice Bob Chris Donna ErnieXavier
Differential Privacy
𝑋: The data universe.
𝐷 ⊂ 𝑋: The dataset (one element per person)
Definition: Two datasets 𝐷, 𝐷′
⊂ 𝑋 ...
Differential Privacy
𝑋: The data universe.
𝐷 ⊂ 𝑋: The dataset (one element per person)
Definition: An algorithm 𝑀 is 𝜖-dif...
Some Useful Properties
Theorem (Postprocessing): If 𝑀(𝐷) is 𝜖-private, and 𝑓 is any
(randomized) function, then 𝑓(𝑀 𝐷 ) is...
So…
𝑥 =
Definition: An algorithm 𝑀 is 𝜖-differentially
private if for all pairs of neighboring datasets
𝐷, 𝐷′
, and for al...
So…
𝑥 =
Definition: An algorithm 𝑀 is 𝜖-differentially
private if for all pairs of neighboring datasets
𝐷, 𝐷′
, and for al...
So…
𝑥 =
Definition: An algorithm 𝑀 is 𝜖-differentially
private if for all pairs of neighboring datasets
𝐷, 𝐷′
, and for al...
Some Useful Properties
Theorem (Composition): If 𝑀1, … , 𝑀 𝑘
are 𝜖-private, then:
𝑀 𝐷 ≡ (𝑀1 𝐷 , … , 𝑀 𝑘 𝐷 )
is 𝑘𝜖-private.
So…
You can go about designing algorithms as you normally would.
Just access the data using differentially private “subrou...
Some simple operations:
Answering Numeric Queries
Def: A numeric function 𝑓 has sensitivity 𝑐 if for all neighboring
𝐷, 𝐷′...
Some simple operations:
Answering Numeric Queries
The Laplace Mechanism:
𝑀𝐿𝑎𝑝 𝐷, 𝑓, 𝜖 = 𝑓 𝐷 + 𝐿𝑎𝑝
𝑠 𝑓
𝜖
Theorem: 𝑀𝐿𝑎𝑝(⋅, 𝑓...
Some simple operations:
Answering Numeric Queries
The Laplace Mechanism:
𝑀𝐿𝑎𝑝 𝐷, 𝑓, 𝜖 = 𝑓 𝐷 + 𝐿𝑎𝑝
𝑠 𝑓
𝜖
Theorem: The expec...
Some simple operations:
Answering Non-numeric Queries
“What is the modal eye color in the room?”
𝑅 = {Blue, Green, Brown, ...
Some simple operations:
Answering Non-numeric Queries
𝑀 𝐸𝑥𝑝(𝐷, 𝑅, 𝑞, 𝜖):
Output 𝑟 ∈ 𝑅 w.p. ∝ 𝑒2𝜖⋅𝑞 𝐷,𝑟
Theorem: 𝑀 𝐸𝑥𝑝(𝐷, 𝑅...
So what can we do with that?
Empirical Risk Minimization:
*i.e. almost all of supervised learning
Find 𝜃 to minimize:
𝐿 𝜃 ...
So what can we do with that?
Empirical Risk Minimization:
Simple Special Case: Linear Regression
Find 𝜃 ∈ 𝑅 𝑑
to minimize:...
So what can we do with that?
• First Attempt: Try the exponential mechanism!
• Define 𝑞 𝜃, 𝐷 = − 𝑋𝜃 − 𝑦 𝑇
(𝑋𝜃 − 𝑦)
• Outpu...
So what can we do with that?
Empirical Risk Minimization:
The General Case (e.g. deep learning)
Find 𝜃 to minimize:
𝐿 𝜃 =
...
Stochastic Gradient Descent
Convergence depends on the fact that at each round: 𝔼 𝑔𝑡 = 𝛻L(𝜃)
Let 𝜃1 = 0 𝑑
For 𝑡 = 1 to 𝑇:
...
Private Stochastic Gradient Descent
Still have: 𝔼 𝑔𝑡 = 𝛻L 𝜃 !
(Can still prove convergence theorems, and run the algorithm...
What else can we do?
• Statistical Estimation
• Graph Analysis
• Combinatorial Optimization
• Spectral Analysis of Matrice...
Differential Privacy ⇒ Learning
Theorem*: An 𝜖-differentially private algorithm cannot overfit its
training set by more th...
from numpy import *
def Thresholdout(sample, holdout, q, sigma, threshold):
sample_mean = mean([q(x) for x in sample])
hol...
Reusable holdout example
• Data set with 2n = 20,000 rows and d = 10,000
variables. Class labels in {-1,1}
• Analyst perfo...
Classification after feature selection
No signal: data are random gaussians
labels are drawn independently at random from ...
Strong signal: 20 features are mildly correlated with target
remaining attributes are uncorrelated
Thresholdout correctly ...
So…
• Differential privacy provides:
– A rigorous, provable guarantee with a strong privacy semantics.
– A set of tools an...
Thanks!
To learn more:
• Our textbook on differential privacy:
– Available for free on my website: http://www.cis.upenn.ed...
Nächste SlideShare
Wird geladen in …5
×

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017

551 Aufrufe

Veröffentlicht am

Aaron Roth is an Associate Professor of Computer and Information Sciences at the University of Pennsylvania, affiliated with the Warren Center for Network and Data Science, and co-director of the Networked and Social Systems Engineering (NETS) program. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory and mechanism design, learning theory, and the intersections of these topics. Together with Cynthia Dwork, he is the author of the book “The Algorithmic Foundations of Differential Privacy.”

Abstract Summary:

Differential Privacy and Machine Learning:
In this talk, we will give a friendly introduction to Differential Privacy, a rigorous methodology for analyzing data subject to provable privacy guarantees, that has recently been widely deployed in several settings. The talk will specifically focus on the relationship between differential privacy and machine learning, which is surprisingly rich. This includes both the ability to do machine learning subject to differential privacy, and tools arising from differential privacy that can be used to make learning more reliable and robust (even when privacy is not a concern).

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017

  1. 1. Differential Privacy And Machine Learning Aaron Roth March 24, 2017
  2. 2. Protecting Privacy is Important Class action lawsuit accuses AOL of violating the Electronic Communications Privacy Act, seeks $5,000 in damages per user. AOL’s director of research is fired.
  3. 3. Protecting Privacy is Important Class action lawsuit (Doe v. Netflix) accuses Netflix of violating the Video Privacy Protection Act, seeks $2,000 in compensation for each of Netflix’s 2,000,000 subscribers. Settled for undisclosed sum, 2nd Netflix Challenge is cancelled.
  4. 4. Protecting Privacy is Important The National Human Genome Research Institute (NHGRI) immediately restricted pooled genomic data that had previously been publically available.
  5. 5. So What is Differential Privacy?
  6. 6. So What is Differential Privacy? • Differential Privacy is about promising people freedom from harm. “An analysis of a dataset D is differentially private if the data analyst knows almost no more about Alice after the analysis than he would have known had he conducted the same analysis on an identical data set with Alice’s data removed.”
  7. 7. D Differential Privacy [Dwork-McSherry-Nissim-Smith 06] Algorithm Pr [r] ratio bounded Alice Bob Chris Donna ErnieXavier
  8. 8. Differential Privacy 𝑋: The data universe. 𝐷 ⊂ 𝑋: The dataset (one element per person) Definition: Two datasets 𝐷, 𝐷′ ⊂ 𝑋 are neighbors if they differ in the data of a single individual.
  9. 9. Differential Privacy 𝑋: The data universe. 𝐷 ⊂ 𝑋: The dataset (one element per person) Definition: An algorithm 𝑀 is 𝜖-differentially private if for all pairs of neighboring datasets 𝐷, 𝐷′ , and for all outputs x: Pr 𝑀 𝐷 = 𝑥 ≤ (1 + 𝜖) Pr 𝑀 𝐷′ = 𝑥
  10. 10. Some Useful Properties Theorem (Postprocessing): If 𝑀(𝐷) is 𝜖-private, and 𝑓 is any (randomized) function, then 𝑓(𝑀 𝐷 ) is 𝜖-private.
  11. 11. So… 𝑥 = Definition: An algorithm 𝑀 is 𝜖-differentially private if for all pairs of neighboring datasets 𝐷, 𝐷′ , and for all outputs x: Pr 𝑀 𝐷 = 𝑥 ≤ (1 + 𝜖) Pr 𝑀 𝐷′ = 𝑥
  12. 12. So… 𝑥 = Definition: An algorithm 𝑀 is 𝜖-differentially private if for all pairs of neighboring datasets 𝐷, 𝐷′ , and for all outputs x: Pr 𝑀 𝐷 = 𝑥 ≤ (1 + 𝜖) Pr 𝑀 𝐷′ = 𝑥
  13. 13. So… 𝑥 = Definition: An algorithm 𝑀 is 𝜖-differentially private if for all pairs of neighboring datasets 𝐷, 𝐷′ , and for all outputs x: Pr 𝑀 𝐷 = 𝑥 ≤ (1 + 𝜖) Pr 𝑀 𝐷′ = 𝑥
  14. 14. Some Useful Properties Theorem (Composition): If 𝑀1, … , 𝑀 𝑘 are 𝜖-private, then: 𝑀 𝐷 ≡ (𝑀1 𝐷 , … , 𝑀 𝑘 𝐷 ) is 𝑘𝜖-private.
  15. 15. So… You can go about designing algorithms as you normally would. Just access the data using differentially private “subroutines”, and keep track of your “privacy budget” as a resource. Private algorithm design, like regular algorithm design, can be modular.
  16. 16. Some simple operations: Answering Numeric Queries Def: A numeric function 𝑓 has sensitivity 𝑐 if for all neighboring 𝐷, 𝐷′ : 𝑓 𝐷 − 𝑓 𝐷′ ≤ 𝑐 Write 𝑠 𝑓 ≡ 𝑐 • e.g. “How many software engineers are in the room?” has sensitivity 1. • “What fraction of people in the room are software engineers?” has sensitivity 1 𝑛 .
  17. 17. Some simple operations: Answering Numeric Queries The Laplace Mechanism: 𝑀𝐿𝑎𝑝 𝐷, 𝑓, 𝜖 = 𝑓 𝐷 + 𝐿𝑎𝑝 𝑠 𝑓 𝜖 Theorem: 𝑀𝐿𝑎𝑝(⋅, 𝑓, 𝜖) is 𝜖-private.
  18. 18. Some simple operations: Answering Numeric Queries The Laplace Mechanism: 𝑀𝐿𝑎𝑝 𝐷, 𝑓, 𝜖 = 𝑓 𝐷 + 𝐿𝑎𝑝 𝑠 𝑓 𝜖 Theorem: The expected error is 𝑠 𝑓 𝜖 (can answer “what fraction of people in the room are software engineers?” with error 0.2%)
  19. 19. Some simple operations: Answering Non-numeric Queries “What is the modal eye color in the room?” 𝑅 = {Blue, Green, Brown, Red} • If you can define a function that determines how “good” each outcome is for a fixed input: – E.g. 𝑞 𝐷, Red =“fraction of people in D with red eyes”
  20. 20. Some simple operations: Answering Non-numeric Queries 𝑀 𝐸𝑥𝑝(𝐷, 𝑅, 𝑞, 𝜖): Output 𝑟 ∈ 𝑅 w.p. ∝ 𝑒2𝜖⋅𝑞 𝐷,𝑟 Theorem: 𝑀 𝐸𝑥𝑝(𝐷, 𝑅, 𝑞, 𝜖) is 𝑠 𝑞 ⋅ 𝜖-private, and outputs 𝑟 ∈ 𝑅 such that: 𝐸 𝑞 𝐷, 𝑟 − max 𝑟∗∈𝑅 𝑞 𝐷, 𝑟∗ ≤ 2𝑠 𝑞 𝜖 ⋅ ln 𝑅 (can find a color that has frequency within 0.5% of the modal color in the room)
  21. 21. So what can we do with that? Empirical Risk Minimization: *i.e. almost all of supervised learning Find 𝜃 to minimize: 𝐿 𝜃 = 𝑖=1 𝑛 ℓ 𝜃, 𝑥𝑖, 𝑦𝑖
  22. 22. So what can we do with that? Empirical Risk Minimization: Simple Special Case: Linear Regression Find 𝜃 ∈ 𝑅 𝑑 to minimize: 𝑖=1 𝑛 𝜃, 𝑥𝑖 − 𝑦𝑖 2
  23. 23. So what can we do with that? • First Attempt: Try the exponential mechanism! • Define 𝑞 𝜃, 𝐷 = − 𝑋𝜃 − 𝑦 𝑇 (𝑋𝜃 − 𝑦) • Output 𝜃 ∈ ℝ 𝑑 w.p. ∝ 𝑒2𝜖𝑞(𝜃,𝐷) – How? • In this case, reduces to: Output 𝜃∗ + 𝑁 0, 𝑋 𝑇 𝑋 −1 2𝜖 Where 𝜃∗ = arg min 𝑖=1 𝑛 𝜃, 𝑥𝑖 − 𝑦𝑖 2
  24. 24. So what can we do with that? Empirical Risk Minimization: The General Case (e.g. deep learning) Find 𝜃 to minimize: 𝐿 𝜃 = 𝑖=1 𝑛 ℓ 𝜃, 𝑥𝑖, 𝑦𝑖
  25. 25. Stochastic Gradient Descent Convergence depends on the fact that at each round: 𝔼 𝑔𝑡 = 𝛻L(𝜃) Let 𝜃1 = 0 𝑑 For 𝑡 = 1 to 𝑇: Pick 𝑖 at random. Let 𝑔𝑡 ← 𝛻ℓ 𝜃 𝑡, 𝑥𝑖, 𝑦𝑖 Let 𝜃 𝑡+1 ← 𝜃 𝑡 − 𝜂 ⋅ 𝑔𝑡
  26. 26. Private Stochastic Gradient Descent Still have: 𝔼 𝑔𝑡 = 𝛻L 𝜃 ! (Can still prove convergence theorems, and run the algorithm…) Privacy guarantees can be computed from: 1) The privacy of the Laplace mechanism 2) Preservation of privacy under post-processing, and 3) Composition of privacy guarantees. Let 𝜃1 = 0 𝑑 For 𝑡 = 1 to 𝑇: Pick 𝑖 at random. Let 𝑔𝑡 ← 𝛻ℓ 𝜃 𝑡, 𝑥𝑖, 𝑦𝑖 + 𝐿𝑎𝑝 𝜎 𝑑 Let 𝜃 𝑡+1 ← 𝜃 𝑡 − 𝜂 ⋅ 𝑔𝑡
  27. 27. What else can we do? • Statistical Estimation • Graph Analysis • Combinatorial Optimization • Spectral Analysis of Matrices • Anomaly Detection/Analysis of Data Streams • Convex Optimization • Equilibrium computation • Computation of optimal 1-sided and 2-sided matchings • Pareto Optimal Exchanges • …
  28. 28. Differential Privacy ⇒ Learning Theorem*: An 𝜖-differentially private algorithm cannot overfit its training set by more than 𝜖. *Lots of interesting details missing! See our paper in Science.
  29. 29. from numpy import * def Thresholdout(sample, holdout, q, sigma, threshold): sample_mean = mean([q(x) for x in sample]) holdout_mean = mean([q(x) for x in holdout]) if (abs(sample_mean - holdout_mean) < random.normal(threshold, sigma)): # q does not overfit return sample_mean else: # q overfits return holdout_mean + random.normal(0, sigma) thresholdout.py: Thresholdout [DFHPRR15]
  30. 30. Reusable holdout example • Data set with 2n = 20,000 rows and d = 10,000 variables. Class labels in {-1,1} • Analyst performs stepwise variable selection: 1. Split data into training/holdout of size n 2. Select “best” k variables on training data 3. Only use variables also good on holdout 4. Build linear predictor out of k variables 5. Find best k = 10,20,30,…
  31. 31. Classification after feature selection No signal: data are random gaussians labels are drawn independently at random from {-1,1} Thresholdout correctly detects overfitting!
  32. 32. Strong signal: 20 features are mildly correlated with target remaining attributes are uncorrelated Thresholdout correctly detects right model size! Classification after feature selection
  33. 33. So… • Differential privacy provides: – A rigorous, provable guarantee with a strong privacy semantics. – A set of tools and composition theorems that allow for modular, easy design of privacy preserving algorithms. – Protection against overfitting even when privacy is not a concern.
  34. 34. Thanks! To learn more: • Our textbook on differential privacy: – Available for free on my website: http://www.cis.upenn.edu/~aaroth • Connections between Privacy and Overfitting: – Dwork, Feldman, Hardt, Pitassi, Reingold, Roth, “The Reusable Holdout: Preserving Validity in Adaptive Data Analysis”, Science, August 7 2015. – Dwork, Feldman, Hardt, Pitassi, Reingold, Roth, “Preserving Statistical Validity in Adaptive Data Analysis”, STOC 2015. – Bassily, Nissim, Stemmer, Smith, Steinke, Ullman, “Algorithmic Stability for Adaptive Data Analysis”, STOC 2016. – Rogers, Roth, Smith, Thakkar, “Max Information, Differential Privacy, and Post-Selection Hypothesis Testing”, FOCS 2016. – Cummings, Ligett, Nissim, Roth, Wu, “Adaptive Learning with Robust Generalization Guarantees”, COLT 2016.

×