SlideShare ist ein Scribd-Unternehmen logo
1 von 67
Fairness and Privacy in
AI/ML Systems
Krishnaram Kenthapadi
Amazon AWS AI
LinkedIn-MSR-IISc workshop on
Fairness & Ethics in ML
January 2020 https://www.csa.iisc.ac.in/fate.htm
Massachusetts Group
Insurance Commission
(1997): Anonymized
medical history of state
employees
William Weld vs
Latanya Sweeney
Latanya Sweeney (MIT grad
student): $20 – Cambridge
voter roll
born July 31, 1945
resident of 02138
Uniquely identifiable with ZIP
+ birth date + gender (in the
US population)
Golle, “Revisiting the Uniqueness of Simple Demographics in the US Population”, WPES 2006
The Coded Gaze [Joy Buolamwini 2016]
Face detection software: Fails for some darker faces
https://www.youtube.com/watch?v=KB9sI9rY3cA
• Facial analysis software:
Higher accuracy for light
skinned men
• Error rates for dark skinned
women: 20% - 34%
Gender Shades
[Joy Buolamwini &
Timnit Gebru,
2018]
• Ethical challenges posed
by AI systems
• Inherent biases present
in society
• Reflected in training
data
• AI/ML models prone to
amplifying such biases
Algorithmic Bias
Laws against Discrimination
Immigration Reform and Control Act
Citizenship
Rehabilitation Act of 1973;
Americans with Disabilities Act
of 1990
Disability status
Civil Rights Act of 1964
Race
Age Discrimination in Employment Act
of 1967
Age
Equal Pay Act of 1963;
Civil Rights Act of 1964
Sex
And more...
Fairness Privacy
Transparency Explainability
Fairness
and Privacy
by Design”
for AI
products
“
AI @ Scale
Case Studies @ LinkedIn*
Fairness
Privacy
Reflections
*Work done while at LinkedIn
12© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
Our mission at AWS
Put machine learning in the
hands of every developer
13© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved |
The AWS ML Stack
Broadest and most complete set of Machine Learning capabilities
VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS
NEW
Amazon SageMaker Ground
Truth
Augmented
AI
SageMaker
Neo
Built-in
algorithms
SageMaker
Notebooks NEW
SageMaker
Experiments NEW
Model
tuning
SageMaker
Debugger NEW
SageMaker
Autopilot NEW
Model
hosting
SageMaker
Model Monitor NEW
Deep Learning
AMIs & Containers
GPUs &
CPUs
Elastic
Inference
Inferentia FPGA
Amazon
Rekognition
Amazon
Polly
Amazon
Transcribe
+Medical
Amazon
Comprehend
+Medical
Amazon
Translate
Amazon
Lex
Amazon
Personalize
Amazon
Forecast
Amazon
Fraud Detector
Amazon
CodeGuru
AI SERVICES
ML SERVICES
ML FRAMEWORKS & INFRASTRUCTURE
Amazon
Textract
Amazon
Kendra
Contact Lens
For Amazon
Connect
SageMaker Studio IDE NEW
NEW
NEW
NEW
NEW
LinkedIn operates the largest professional
network on the Internet
Tell your
story
645M+ members
30M+
companies are
represented
on LinkedIn
90K+
schools listed
(high school &
college)
35K+
skills listed
20M+
open jobs
on LinkedIn
Jobs
280B
Feed updates
How AI is transforming LinkedIn’s
ecosystem
2 PB+
Data processed nearline
and offline per day
25 B+
Parameters in Machine
Learning models
200+
Machine Learning A/B
experiments per week
Contributors Advertising Revenue Confirmed Hires
Fairness in
AI @
LinkedIn
Fairness-aware Talent
Search Ranking
Guiding Principle:
“Diversity by Design”
Insights to
Identify Diverse
Talent Pools
Representative
Talent Search
Results
Diversity
Learning
Curriculum
“Diversity by Design” in LinkedIn’s Talent
Solutions
Plan for Diversity
Plan for Diversity
Identify Diverse Talent Pools
Inclusive Job Descriptions / Recruiter Outreach
Representative Ranking for Talent Search
S. C. Geyik, S. Ambler,
K. Kenthapadi, Fairness-
Aware Ranking in Search &
Recommendation Systems with
Application to LinkedIn Talent
Search, KDD’19.
[Microsoft’s AI/ML
conference
(MLADS’18). Distinguished
Contribution Award]
Building Representative
Talent Search at LinkedIn
(LinkedIn engineering blog)
Intuition for Measuring and Achieving Representativeness
Ideal: Top ranked results should follow a desired distribution on
gender/age/…
E.g., same distribution as the underlying talent pool
Inspired by “Equal Opportunity” definition [Hardt et al, NIPS’16]
Defined measures (skew, divergence) based on this intuition
Desired Proportions within the Attribute of Interest
Compute the proportions of the values of the attribute (e.g., gender,
gender-age combination) amongst the set of qualified candidates
“Qualified candidates” = Set of candidates that match the search query
criteria
Retrieved by LinkedIn’s Galene search engine
Desired proportions could also be obtained based on legal mandate
/ voluntary commitment
Measuring (Lack of) Representativeness
Skew@k
(Logarithmic) ratio of the proportion of candidates having a given attribute
value among the top k ranked results to the corresponding desired proportion
Variants:
MinSkew: Minimum over all attribute values
MaxSkew: Maximum over all attribute values
Normalized Discounted Cumulative Skew
Normalized Discounted Cumulative KL-divergence
Fairness-aware Reranking Algorithm (Simplified)
Partition the set of potential candidates into different buckets for
each attribute value
Rank the candidates in each bucket according to the scores assigned
by the machine-learned model
Merge the ranked lists, balancing the representation requirements
and the selection of highest scored candidates
Algorithmic variants based on how we choose the next attribute
Architecture
Validating Our Approach
Gender Representativeness
Over 95% of all searches are representative compared to the qualified
population of the search
Business Metrics
A/B test over LinkedIn Recruiter users for two weeks
No significant change in business metrics (e.g., # InMails sent or accepted)
Ramped to 100% of LinkedIn Recruiter users worldwide
Lessons
learned
• Post-processing approach desirable
• Model agnostic
• Scalable across different model choices
for our application
• Acts as a “fail-safe”
• Robust to application-specific business
logic
• Easier to incorporate as part of existing
systems
• Build a stand-alone service or
component for post-processing
• No significant modifications to the
existing components
• Complementary to efforts to reduce bias
from training data & during model training
Engineering for Fairness in AI Lifecycle
Problem
Formation
Dataset
Construction
Algorithm
Selection
Training
Process
Testing
Process
Deployment
Feedback
Is an algorithm an
ethical solution to our
problem?
Does our data include enough
minority samples?
Are there missing/biased
features?
Do we need to apply debiasing
algorithms to preprocess our
data?
Do we need to include fairness
constraints in the function?
Have we evaluated the model
using relevant fairness metrics?
Are we deploying our
model on a population
that we did not
train/test on?
Does the model encourage
feedback loops that can
produce increasingly unfair
outcomes?
Credit: K. Browne & J. Draper
Engineering for Fairness in AI Lifecycle
S.Vasudevan, K. Kenthapadi, FairScale: A Scalable Framework for Measuring Fairness in AI Applications, 2019
FairScale System Architecture [Vasudevan & Kenthapadi, 2019]
• Flexibility of Use
(Platform agnostic)
• Ad-hoc exploratory
analyses
• Deployment in offline
workflows
• Integration with ML
Frameworks
• Scalability
• Diverse fairness
metrics
• Conventional fairness
metrics
• Benefit metrics
• Statistical tests
Fairness-aware Experimentation
[Saint-Jacques & Sepehri, KDD’19 Social Impact Workshop]
Imagine LinkedIn has 10 members.
Each of them has 1 session a day.
A new product increases sessions by +1 session per member on average.
Both of these are +1 session / member on average!
One is much more unequal than the other. We want to catch that.
Acknowledgements
LinkedIn Talent Solutions Diversity team, Hire & Careers AI team, Anti-abuse AI team, Data Science
Applied Research team
Special thanks to Deepak Agarwal, Parvez Ahammad, Stuart Ambler, Kinjal Basu, Jenelle Bray, Erik
Buchanan, Bee-Chung Chen, Fei Chen, Patrick Cheung, Gil Cottle, Cyrus DiCiccio, Patrick Driscoll,
Carlos Faham, Nadia Fawaz, Priyanka Gariba, Meg Garlinghouse, Sahin Cem Geyik, Gurwinder Gulati,
Rob Hallman, Sara Harrington, Joshua Hartman, Daniel Hewlett, Nicolas Kim, Rachel Kumar, Monica
Lewis, Nicole Li, Heloise Logan, Stephen Lynch, Divyakumar Menghani, Varun Mithal, Arashpreet
Singh Mor, Tanvi Motwani, Preetam Nandy, Lei Ni, Nitin Panjwani, Igor Perisic, Hema Raghavan,
Romer Rosales, Guillaume Saint-Jacques, Badrul Sarwar, Amir Sepehri, Arun Swami, Ram
Swaminathan, Grace Tang, Ketan Thakkar, Sriram Vasudevan, Janardhanan Vembunarayanan, James
Verbus, Xin Wang, Hinkmond Wong, Ya Xu, Lin Yang, Yang Yang, Chenhui Zhai, Liang Zhang, Yani
Zhang
Privacy in
AI @
LinkedIn
PriPeARL: Framework to
compute robust,
privacy-preserving
analytics
Analytics & Reporting Products at LinkedIn
Profile View
Analytics
37
Content
Analytics
Ad Campaign
Analytics
All showing
demographics of
members engaging with
the product
Admit only a small # of predetermined query types
Querying for the number of member actions, for a specified time period,
together with the top demographic breakdowns
Analytics & Reporting Products at LinkedIn
Admit only a small # of predetermined query types
Querying for the number of member actions, for a specified time period,
together with the top demographic breakdowns
Analytics & Reporting Products at LinkedIn
E.g., Title = “Senior
Director”
E.g., Clicks on a
given ad
Privacy Requirements
Attacker cannot infer whether a member performed an action
E.g., click on an article or an ad
Attacker may use auxiliary knowledge
E.g., knowledge of attributes associated with the target member (say,
obtained from this member’s LinkedIn profile)
E.g., knowledge of all other members that performed similar action (say, by
creating fake accounts)
Possible Privacy Attacks
41
Targeting:
Senior directors in US, who studied at Cornell
Matches ~16k LinkedIn members
→ over minimum targeting threshold
Demographic breakdown:
Company = X
May match exactly one person
→ can determine whether the person
clicks on the ad or not
Require minimum reporting threshold
Attacker could create fake profiles!
E.g. if threshold is 10, create 9 fake profiles
that all click.
Rounding mechanism
E.g., report incremental of 10
Still amenable to attacks
E.g. using incremental counts over time to
infer individuals’ actions
Need rigorous techniques to preserve member privacy
(not reveal exact aggregate counts)
Problem Statement
Compute robust, reliable analytics in a privacy-
preserving manner, while addressing the product
needs.
Differential Privacy
Curator
Defining Privacy
Defining Privacy
45
CuratorCurator
+ your data
- your data
Differential Privacy
46
Databases D and D′ are neighbors if they differ in one person’s data.
Differential Privacy: The distribution of the curator’s output M(D) on database
D is (nearly) the same as M(D′).
Curator
+ your data
- your data
Dwork, McSherry, Nissim, Smith [TCC 2006]
Curator
(ε, 𝛿)-Differential Privacy: The distribution of the curator’s output M(D) on
database D is (nearly) the same as M(D′).
Differential Privacy
47
Curator
Parameter ε quantifies
information leakage
∀S: Pr[M(D)∊S] ≤ exp(ε) ∙ Pr[M(D′)∊S]+𝛿.Curator
Parameter 𝛿 gives
some slack
Dwork, Kenthapadi, McSherry, Mironov, Naor [EUROCRYPT 2006]
+ your data
- your data
Dwork, McSherry, Nissim, Smith [TCC 2006]
Differential Privacy: Random Noise Addition
If ℓ1-sensitivity of f : D → ℝn:
maxD,D′ ||f(D) − f(D′)||1 = s,
then adding Laplacian noise to true output
f(D) + Laplacen(s/ε)
offers (ε,0)-differential privacy.
Dwork, McSherry, Nissim, Smith [TCC 2006]
PriPeARL: A Framework for Privacy-Preserving Analytics
K. Kenthapadi, T. T. L. Tran, ACM CIKM 2018
49
Pseudo-random noise generation, inspired by differential privacy
● Entity id (e.g., ad
creative/campaign/account)
● Demographic dimension
● Stat type (impressions, clicks)
● Time range
● Fixed secret seed
Uniformly Random
Fraction
● Cryptographic
hash
● Normalize to
(0,1)
Random
Noise
Laplace
Noise
● Fixed ε
True
Count
Noisy
Count
To satisfy consistency
requirements
● Pseudo-random noise → same query has same result over time, avoid
averaging attack.
● For non-canonical queries (e.g., time ranges, aggregate multiple entities)
○ Use the hierarchy and partition into canonical queries
○ Compute noise for each canonical queries and sum up the noisy
counts
PriPeARL System Architecture
Lessons Learned from Deployment (> 1
year)
Semantic consistency vs. unbiased, unrounded noise
Suppression of small counts
Online computation and performance requirements
Scaling across analytics applications
Tools for ease of adoption (code/API library, hands-on how-to tutorial) help!
Having a few entry points (all analytics apps built over Pinot)  wider adoption
Summary
Framework to compute robust, privacy-preserving analytics
Addressing challenges such as preserving member privacy, product
coverage, utility, and data consistency
Future
Utility maximization problem given constraints on the ‘privacy loss budget’
per user
E.g., noise with larger variance to impressions but less noise to clicks (or conversions)
E.g., more noise to broader time range sub-queries and less noise to granular time
range sub-queries
Reference: K. Kenthapadi, T. Tran, PriPeARL: A Framework for Privacy-
Preserving Analytics and Reporting at LinkedIn, ACM CIKM 2018.
Acknowledgements
Team:
AI/ML: Krishnaram Kenthapadi, Thanh T. L. Tran
Ad Analytics Product & Engineering: Mark Dietz, Taylor Greason, Ian
Koeppe
Legal / Security: Sara Harrington, Sharon Lee, Rohit Pitke
Acknowledgements
Deepak Agarwal, Igor Perisic, Arun Swami
LinkedIn Salary
LinkedIn Salary (launched in Nov, 2016)
Data Privacy Challenges
Minimize the risk of inferring any one
individual’s compensation data
Protection against data breach
No single point of failure
Problem Statement
How do we design LinkedIn Salary system taking into
account the unique privacy and security challenges,
while addressing the product requirements?
K. Kenthapadi, A. Chudhary, and S.
Ambler, LinkedIn Salary: A System
for Secure Collection and
Presentation of Structured
Compensation Insights to Job
Seekers, IEEE PAC 2017
(arxiv.org/abs/1705.06976)
Title Region
$$
User Exp
Designer
SF Bay
Area
100K
User Exp
Designer
SF Bay
Area
115K
... ...
...
Title Region
$$
User Exp
Designer
SF Bay
Area
100K
De-identification Example
Title Region Company Industry Years of
exp
Degree FoS Skills
$$
User Exp
Designer
SF Bay
Area
Google Internet 12 BS Interactive
Media
UX,
Graphics,
...
100K
Title Region Industry
$$
User Exp
Designer
SF Bay
Area
Internet
100K
Title Region Years of
exp $$
User Exp
Designer
SF Bay
Area
10+
100K
Title Region Company Years of
exp $$
User Exp
Designer
SF Bay
Area
Google 10+
100K
#data
points >
threshold?
Yes ⇒ Copy to
Hadoop (HDFS)
Note: Original submission stored as encrypted objects.
System
Architecture
Acknowledgements
Team:
AI/ML: Krishnaram Kenthapadi, Stuart Ambler, Xi Chen, Yiqun Liu, Parul
Jain, Liang Zhang, Ganesh Venkataraman, Tim Converse, Deepak Agarwal
Application Engineering: Ahsan Chudhary, Alan Yang, Alex Navasardyan,
Brandyn Bennett, Hrishikesh S, Jim Tao, Juan Pablo Lomeli Diaz, Patrick
Schutz, Ricky Yan, Lu Zheng, Stephanie Chou, Joseph Florencio, Santosh
Kumar Kancha, Anthony Duerr
Product: Ryan Sandler, Keren Baruch
Other teams (UED, Marketing, BizOps, Analytics, Testing, Voice of
Members, Security, …): Julie Kuang, Phil Bunge, Prateek Janardhan, Fiona
Li, Bharath Shetty, Sunil Mahadeshwar, Cory Scott, Tushar Dalvi, and team
Acknowledgements
David Freeman, Ashish Gupta, David Hardtke, Rong Rong, Ram
Beyond
Accuracy
Performance and Cost
Fairness and Bias
Transparency and Explainability
Privacy
Security
Safety
Robustness
Fairness, Explainability &
Privacy: Opportunities
Fairness in ML
Application specific challenges
Conversational AI systems: Unique bias/fairness/ethics considerations
E.g., Hate speech, Complex failure modes
Beyond protected categories, e.g., accent, dialect
Entire ecosystem (e.g., including apps such as Alexa skills)
Two-sided markets: e.g., fairness to buyers and to sellers, or to content
consumers and producers
Fairness in advertising (externalities)
Tools for ensuring fairness (measuring & mitigating bias) in AI lifecycle
Pre-processing (representative datasets; modifying features/labels)
ML model training with fairness constraints
Post-processing
Experimentation & Post-deployment
Explainability in ML
Actionable explanations
Balance between explanations & model secrecy
Robustness of explanations to failure modes (Interaction between ML
components)
Application-specific challenges
Conversational AI systems: contextual explanations
Gradation of explanations
Tools for explanations across AI lifecycle
Pre & post-deployment for ML models
Model developer vs. End user focused
Privacy in ML
Privacy-preserving model training, robust against adversarial
membership inference attacks
Privacy for highly sensitive data: model training & analytics using
secure enclaves, homomorphic encryption, federated learning / on-
device learning, or a hybrid
Privacy-preserving transfer learning (broadly, privacy-preserving
mechanisms for data marketplaces)
Reflections
“Fairness and Privacy by Design” when
building AI products
Collaboration/consensus across key
stakeholders
NYT / WSJ / ProPublica / ToI / The Hindu
test :)
Thanks! Questions?
S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness-Aware Ranking in Search &
Recommendation Systems with Application to LinkedIn Talent Search, KDD’19
[Microsoft’s AI/ML conference (MLADS’18). Distinguished Contribution Award]
K. Kenthapadi, T. T. L. Tran, PriPeARL: A Framework for Privacy-Preserving
Analytics and Reporting at LinkedIn, CIKM’18
K. Kenthapadi, A. Chudhary, S. Ambler, LinkedIn Salary, IEEE Symposium on
Privacy-Aware Computing (PAC), 2017 [Related: our KDD’18 & CIKM’17 (Best
Case Studies Paper Award) papers]
Our tutorials on privacy, on fairness, and on explainability in industry at
KDD/WSDM/WWW/FAccT/AAAI (combining experiences at Apple, Facebook,
Google, LinkedIn, Microsoft)

Weitere ähnliche Inhalte

Was ist angesagt?

Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
Krishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Krishnaram Kenthapadi
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
Krishnaram Kenthapadi
 

Was ist angesagt? (20)

Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
 
Introduction to the ethics of machine learning
Introduction to the ethics of machine learningIntroduction to the ethics of machine learning
Introduction to the ethics of machine learning
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
Responsible Data Use in AI - core tech pillars
Responsible Data Use in AI - core tech pillarsResponsible Data Use in AI - core tech pillars
Responsible Data Use in AI - core tech pillars
 
Bias in AI
Bias in AIBias in AI
Bias in AI
 
Bias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approachBias in AI-systems: A multi-step approach
Bias in AI-systems: A multi-step approach
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...
Algorithmic Impact Assessment: Fairness, Robustness and Explainability in Aut...
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)Explainable Machine Learning (Explainable ML)
Explainable Machine Learning (Explainable ML)
 
Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms. (Part 3)
 
Ethics of Analytics and Machine Learning
Ethics of Analytics and Machine LearningEthics of Analytics and Machine Learning
Ethics of Analytics and Machine Learning
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI Landscape
 
Explainable AI (XAI)
Explainable AI (XAI)Explainable AI (XAI)
Explainable AI (XAI)
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 

Ähnlich wie Fairness and Privacy in AI/ML Systems

Intro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixIntro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMix
Louis Dorard
 
[DSC Adria 23] Muthu Ramachandran AI Ethics Framework for Generative AI such ...
[DSC Adria 23] Muthu Ramachandran AI Ethics Framework for Generative AI such ...[DSC Adria 23] Muthu Ramachandran AI Ethics Framework for Generative AI such ...
[DSC Adria 23] Muthu Ramachandran AI Ethics Framework for Generative AI such ...
DataScienceConferenc1
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 

Ähnlich wie Fairness and Privacy in AI/ML Systems (20)

Intro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMixIntro to machine learning for web folks @ BlendWebMix
Intro to machine learning for web folks @ BlendWebMix
 
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Trusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open SourceTrusted, Transparent and Fair AI using Open Source
Trusted, Transparent and Fair AI using Open Source
 
Projects
ProjectsProjects
Projects
 
[DSC Adria 23] Muthu Ramachandran AI Ethics Framework for Generative AI such ...
[DSC Adria 23] Muthu Ramachandran AI Ethics Framework for Generative AI such ...[DSC Adria 23] Muthu Ramachandran AI Ethics Framework for Generative AI such ...
[DSC Adria 23] Muthu Ramachandran AI Ethics Framework for Generative AI such ...
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Automation, Analytics, and Artificial Intelligence - Panel
Automation, Analytics, and Artificial Intelligence - PanelAutomation, Analytics, and Artificial Intelligence - Panel
Automation, Analytics, and Artificial Intelligence - Panel
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
CRM Options for Enterprise Nonprofits - Blackbaud CRM Solutions
CRM Options for Enterprise Nonprofits - Blackbaud CRM SolutionsCRM Options for Enterprise Nonprofits - Blackbaud CRM Solutions
CRM Options for Enterprise Nonprofits - Blackbaud CRM Solutions
 
Level Up Your Skills to Lead IT Projects Successfully
Level Up Your Skills to Lead IT Projects SuccessfullyLevel Up Your Skills to Lead IT Projects Successfully
Level Up Your Skills to Lead IT Projects Successfully
 
[DSC Europe 22] AI Ethics and AI Quality By Design - Muthu Ramachandran
[DSC Europe 22] AI Ethics and AI Quality By Design - Muthu Ramachandran[DSC Europe 22] AI Ethics and AI Quality By Design - Muthu Ramachandran
[DSC Europe 22] AI Ethics and AI Quality By Design - Muthu Ramachandran
 
Loras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium KeynoteLoras College 2016 Business Analytics Symposium Keynote
Loras College 2016 Business Analytics Symposium Keynote
 
Towards Responsible AI - NY.pptx
Towards Responsible AI - NY.pptxTowards Responsible AI - NY.pptx
Towards Responsible AI - NY.pptx
 
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
Practical Explainable AI: How to build trustworthy, transparent and unbiased ...
 
Neo4j - Responsible AI
Neo4j - Responsible AINeo4j - Responsible AI
Neo4j - Responsible AI
 
Keynote@CADE2018_HalukDemirkan
Keynote@CADE2018_HalukDemirkanKeynote@CADE2018_HalukDemirkan
Keynote@CADE2018_HalukDemirkan
 

Mehr von Krishnaram Kenthapadi

Mehr von Krishnaram Kenthapadi (9)

Amazon SageMaker Clarify
Amazon SageMaker ClarifyAmazon SageMaker Clarify
Amazon SageMaker Clarify
 
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedPrivacy in AI/ML Systems: Practical Challenges and Lessons Learned
Privacy in AI/ML Systems: Practical Challenges and Lessons Learned
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
Privacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedInPrivacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedIn
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 

Kürzlich hochgeladen

AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
ellan12
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Sheetaleventcompany
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
soniya singh
 

Kürzlich hochgeladen (20)

AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
Russian Call Girls in %(+971524965298  )#  Call Girls in DubaiRussian Call Girls in %(+971524965298  )#  Call Girls in Dubai
Russian Call Girls in %(+971524965298 )# Call Girls in Dubai
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Rani Bagh Escort Service Delhi N.C.R.
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 

Fairness and Privacy in AI/ML Systems

  • 1. Fairness and Privacy in AI/ML Systems Krishnaram Kenthapadi Amazon AWS AI LinkedIn-MSR-IISc workshop on Fairness & Ethics in ML January 2020 https://www.csa.iisc.ac.in/fate.htm
  • 2. Massachusetts Group Insurance Commission (1997): Anonymized medical history of state employees William Weld vs Latanya Sweeney Latanya Sweeney (MIT grad student): $20 – Cambridge voter roll born July 31, 1945 resident of 02138
  • 3. Uniquely identifiable with ZIP + birth date + gender (in the US population) Golle, “Revisiting the Uniqueness of Simple Demographics in the US Population”, WPES 2006
  • 4. The Coded Gaze [Joy Buolamwini 2016] Face detection software: Fails for some darker faces https://www.youtube.com/watch?v=KB9sI9rY3cA
  • 5. • Facial analysis software: Higher accuracy for light skinned men • Error rates for dark skinned women: 20% - 34% Gender Shades [Joy Buolamwini & Timnit Gebru, 2018]
  • 6.
  • 7. • Ethical challenges posed by AI systems • Inherent biases present in society • Reflected in training data • AI/ML models prone to amplifying such biases Algorithmic Bias
  • 8. Laws against Discrimination Immigration Reform and Control Act Citizenship Rehabilitation Act of 1973; Americans with Disabilities Act of 1990 Disability status Civil Rights Act of 1964 Race Age Discrimination in Employment Act of 1967 Age Equal Pay Act of 1963; Civil Rights Act of 1964 Sex And more...
  • 11. AI @ Scale Case Studies @ LinkedIn* Fairness Privacy Reflections *Work done while at LinkedIn
  • 12. 12© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | Our mission at AWS Put machine learning in the hands of every developer
  • 13. 13© 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved | The AWS ML Stack Broadest and most complete set of Machine Learning capabilities VISION SPEECH TEXT SEARCH NEW CHATBOTS PERSONALIZATION FORECASTING FRAUD NEW DEVELOPMENT NEW CONTACT CENTERS NEW Amazon SageMaker Ground Truth Augmented AI SageMaker Neo Built-in algorithms SageMaker Notebooks NEW SageMaker Experiments NEW Model tuning SageMaker Debugger NEW SageMaker Autopilot NEW Model hosting SageMaker Model Monitor NEW Deep Learning AMIs & Containers GPUs & CPUs Elastic Inference Inferentia FPGA Amazon Rekognition Amazon Polly Amazon Transcribe +Medical Amazon Comprehend +Medical Amazon Translate Amazon Lex Amazon Personalize Amazon Forecast Amazon Fraud Detector Amazon CodeGuru AI SERVICES ML SERVICES ML FRAMEWORKS & INFRASTRUCTURE Amazon Textract Amazon Kendra Contact Lens For Amazon Connect SageMaker Studio IDE NEW NEW NEW NEW NEW
  • 14. LinkedIn operates the largest professional network on the Internet Tell your story 645M+ members 30M+ companies are represented on LinkedIn 90K+ schools listed (high school & college) 35K+ skills listed 20M+ open jobs on LinkedIn Jobs 280B Feed updates
  • 15. How AI is transforming LinkedIn’s ecosystem 2 PB+ Data processed nearline and offline per day 25 B+ Parameters in Machine Learning models 200+ Machine Learning A/B experiments per week Contributors Advertising Revenue Confirmed Hires
  • 18. Insights to Identify Diverse Talent Pools Representative Talent Search Results Diversity Learning Curriculum “Diversity by Design” in LinkedIn’s Talent Solutions
  • 22. Inclusive Job Descriptions / Recruiter Outreach
  • 23. Representative Ranking for Talent Search S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness- Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search, KDD’19. [Microsoft’s AI/ML conference (MLADS’18). Distinguished Contribution Award] Building Representative Talent Search at LinkedIn (LinkedIn engineering blog)
  • 24. Intuition for Measuring and Achieving Representativeness Ideal: Top ranked results should follow a desired distribution on gender/age/… E.g., same distribution as the underlying talent pool Inspired by “Equal Opportunity” definition [Hardt et al, NIPS’16] Defined measures (skew, divergence) based on this intuition
  • 25. Desired Proportions within the Attribute of Interest Compute the proportions of the values of the attribute (e.g., gender, gender-age combination) amongst the set of qualified candidates “Qualified candidates” = Set of candidates that match the search query criteria Retrieved by LinkedIn’s Galene search engine Desired proportions could also be obtained based on legal mandate / voluntary commitment
  • 26. Measuring (Lack of) Representativeness Skew@k (Logarithmic) ratio of the proportion of candidates having a given attribute value among the top k ranked results to the corresponding desired proportion Variants: MinSkew: Minimum over all attribute values MaxSkew: Maximum over all attribute values Normalized Discounted Cumulative Skew Normalized Discounted Cumulative KL-divergence
  • 27. Fairness-aware Reranking Algorithm (Simplified) Partition the set of potential candidates into different buckets for each attribute value Rank the candidates in each bucket according to the scores assigned by the machine-learned model Merge the ranked lists, balancing the representation requirements and the selection of highest scored candidates Algorithmic variants based on how we choose the next attribute
  • 29. Validating Our Approach Gender Representativeness Over 95% of all searches are representative compared to the qualified population of the search Business Metrics A/B test over LinkedIn Recruiter users for two weeks No significant change in business metrics (e.g., # InMails sent or accepted) Ramped to 100% of LinkedIn Recruiter users worldwide
  • 30. Lessons learned • Post-processing approach desirable • Model agnostic • Scalable across different model choices for our application • Acts as a “fail-safe” • Robust to application-specific business logic • Easier to incorporate as part of existing systems • Build a stand-alone service or component for post-processing • No significant modifications to the existing components • Complementary to efforts to reduce bias from training data & during model training
  • 31. Engineering for Fairness in AI Lifecycle Problem Formation Dataset Construction Algorithm Selection Training Process Testing Process Deployment Feedback Is an algorithm an ethical solution to our problem? Does our data include enough minority samples? Are there missing/biased features? Do we need to apply debiasing algorithms to preprocess our data? Do we need to include fairness constraints in the function? Have we evaluated the model using relevant fairness metrics? Are we deploying our model on a population that we did not train/test on? Does the model encourage feedback loops that can produce increasingly unfair outcomes? Credit: K. Browne & J. Draper
  • 32. Engineering for Fairness in AI Lifecycle S.Vasudevan, K. Kenthapadi, FairScale: A Scalable Framework for Measuring Fairness in AI Applications, 2019
  • 33. FairScale System Architecture [Vasudevan & Kenthapadi, 2019] • Flexibility of Use (Platform agnostic) • Ad-hoc exploratory analyses • Deployment in offline workflows • Integration with ML Frameworks • Scalability • Diverse fairness metrics • Conventional fairness metrics • Benefit metrics • Statistical tests
  • 34. Fairness-aware Experimentation [Saint-Jacques & Sepehri, KDD’19 Social Impact Workshop] Imagine LinkedIn has 10 members. Each of them has 1 session a day. A new product increases sessions by +1 session per member on average. Both of these are +1 session / member on average! One is much more unequal than the other. We want to catch that.
  • 35. Acknowledgements LinkedIn Talent Solutions Diversity team, Hire & Careers AI team, Anti-abuse AI team, Data Science Applied Research team Special thanks to Deepak Agarwal, Parvez Ahammad, Stuart Ambler, Kinjal Basu, Jenelle Bray, Erik Buchanan, Bee-Chung Chen, Fei Chen, Patrick Cheung, Gil Cottle, Cyrus DiCiccio, Patrick Driscoll, Carlos Faham, Nadia Fawaz, Priyanka Gariba, Meg Garlinghouse, Sahin Cem Geyik, Gurwinder Gulati, Rob Hallman, Sara Harrington, Joshua Hartman, Daniel Hewlett, Nicolas Kim, Rachel Kumar, Monica Lewis, Nicole Li, Heloise Logan, Stephen Lynch, Divyakumar Menghani, Varun Mithal, Arashpreet Singh Mor, Tanvi Motwani, Preetam Nandy, Lei Ni, Nitin Panjwani, Igor Perisic, Hema Raghavan, Romer Rosales, Guillaume Saint-Jacques, Badrul Sarwar, Amir Sepehri, Arun Swami, Ram Swaminathan, Grace Tang, Ketan Thakkar, Sriram Vasudevan, Janardhanan Vembunarayanan, James Verbus, Xin Wang, Hinkmond Wong, Ya Xu, Lin Yang, Yang Yang, Chenhui Zhai, Liang Zhang, Yani Zhang
  • 36. Privacy in AI @ LinkedIn PriPeARL: Framework to compute robust, privacy-preserving analytics
  • 37. Analytics & Reporting Products at LinkedIn Profile View Analytics 37 Content Analytics Ad Campaign Analytics All showing demographics of members engaging with the product
  • 38. Admit only a small # of predetermined query types Querying for the number of member actions, for a specified time period, together with the top demographic breakdowns Analytics & Reporting Products at LinkedIn
  • 39. Admit only a small # of predetermined query types Querying for the number of member actions, for a specified time period, together with the top demographic breakdowns Analytics & Reporting Products at LinkedIn E.g., Title = “Senior Director” E.g., Clicks on a given ad
  • 40. Privacy Requirements Attacker cannot infer whether a member performed an action E.g., click on an article or an ad Attacker may use auxiliary knowledge E.g., knowledge of attributes associated with the target member (say, obtained from this member’s LinkedIn profile) E.g., knowledge of all other members that performed similar action (say, by creating fake accounts)
  • 41. Possible Privacy Attacks 41 Targeting: Senior directors in US, who studied at Cornell Matches ~16k LinkedIn members → over minimum targeting threshold Demographic breakdown: Company = X May match exactly one person → can determine whether the person clicks on the ad or not Require minimum reporting threshold Attacker could create fake profiles! E.g. if threshold is 10, create 9 fake profiles that all click. Rounding mechanism E.g., report incremental of 10 Still amenable to attacks E.g. using incremental counts over time to infer individuals’ actions Need rigorous techniques to preserve member privacy (not reveal exact aggregate counts)
  • 42. Problem Statement Compute robust, reliable analytics in a privacy- preserving manner, while addressing the product needs.
  • 46. Differential Privacy 46 Databases D and D′ are neighbors if they differ in one person’s data. Differential Privacy: The distribution of the curator’s output M(D) on database D is (nearly) the same as M(D′). Curator + your data - your data Dwork, McSherry, Nissim, Smith [TCC 2006] Curator
  • 47. (ε, 𝛿)-Differential Privacy: The distribution of the curator’s output M(D) on database D is (nearly) the same as M(D′). Differential Privacy 47 Curator Parameter ε quantifies information leakage ∀S: Pr[M(D)∊S] ≤ exp(ε) ∙ Pr[M(D′)∊S]+𝛿.Curator Parameter 𝛿 gives some slack Dwork, Kenthapadi, McSherry, Mironov, Naor [EUROCRYPT 2006] + your data - your data Dwork, McSherry, Nissim, Smith [TCC 2006]
  • 48. Differential Privacy: Random Noise Addition If ℓ1-sensitivity of f : D → ℝn: maxD,D′ ||f(D) − f(D′)||1 = s, then adding Laplacian noise to true output f(D) + Laplacen(s/ε) offers (ε,0)-differential privacy. Dwork, McSherry, Nissim, Smith [TCC 2006]
  • 49. PriPeARL: A Framework for Privacy-Preserving Analytics K. Kenthapadi, T. T. L. Tran, ACM CIKM 2018 49 Pseudo-random noise generation, inspired by differential privacy ● Entity id (e.g., ad creative/campaign/account) ● Demographic dimension ● Stat type (impressions, clicks) ● Time range ● Fixed secret seed Uniformly Random Fraction ● Cryptographic hash ● Normalize to (0,1) Random Noise Laplace Noise ● Fixed ε True Count Noisy Count To satisfy consistency requirements ● Pseudo-random noise → same query has same result over time, avoid averaging attack. ● For non-canonical queries (e.g., time ranges, aggregate multiple entities) ○ Use the hierarchy and partition into canonical queries ○ Compute noise for each canonical queries and sum up the noisy counts
  • 51. Lessons Learned from Deployment (> 1 year) Semantic consistency vs. unbiased, unrounded noise Suppression of small counts Online computation and performance requirements Scaling across analytics applications Tools for ease of adoption (code/API library, hands-on how-to tutorial) help! Having a few entry points (all analytics apps built over Pinot)  wider adoption
  • 52. Summary Framework to compute robust, privacy-preserving analytics Addressing challenges such as preserving member privacy, product coverage, utility, and data consistency Future Utility maximization problem given constraints on the ‘privacy loss budget’ per user E.g., noise with larger variance to impressions but less noise to clicks (or conversions) E.g., more noise to broader time range sub-queries and less noise to granular time range sub-queries Reference: K. Kenthapadi, T. Tran, PriPeARL: A Framework for Privacy- Preserving Analytics and Reporting at LinkedIn, ACM CIKM 2018.
  • 53. Acknowledgements Team: AI/ML: Krishnaram Kenthapadi, Thanh T. L. Tran Ad Analytics Product & Engineering: Mark Dietz, Taylor Greason, Ian Koeppe Legal / Security: Sara Harrington, Sharon Lee, Rohit Pitke Acknowledgements Deepak Agarwal, Igor Perisic, Arun Swami
  • 55. LinkedIn Salary (launched in Nov, 2016)
  • 56. Data Privacy Challenges Minimize the risk of inferring any one individual’s compensation data Protection against data breach No single point of failure
  • 57. Problem Statement How do we design LinkedIn Salary system taking into account the unique privacy and security challenges, while addressing the product requirements? K. Kenthapadi, A. Chudhary, and S. Ambler, LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers, IEEE PAC 2017 (arxiv.org/abs/1705.06976)
  • 58. Title Region $$ User Exp Designer SF Bay Area 100K User Exp Designer SF Bay Area 115K ... ... ... Title Region $$ User Exp Designer SF Bay Area 100K De-identification Example Title Region Company Industry Years of exp Degree FoS Skills $$ User Exp Designer SF Bay Area Google Internet 12 BS Interactive Media UX, Graphics, ... 100K Title Region Industry $$ User Exp Designer SF Bay Area Internet 100K Title Region Years of exp $$ User Exp Designer SF Bay Area 10+ 100K Title Region Company Years of exp $$ User Exp Designer SF Bay Area Google 10+ 100K #data points > threshold? Yes ⇒ Copy to Hadoop (HDFS) Note: Original submission stored as encrypted objects.
  • 60. Acknowledgements Team: AI/ML: Krishnaram Kenthapadi, Stuart Ambler, Xi Chen, Yiqun Liu, Parul Jain, Liang Zhang, Ganesh Venkataraman, Tim Converse, Deepak Agarwal Application Engineering: Ahsan Chudhary, Alan Yang, Alex Navasardyan, Brandyn Bennett, Hrishikesh S, Jim Tao, Juan Pablo Lomeli Diaz, Patrick Schutz, Ricky Yan, Lu Zheng, Stephanie Chou, Joseph Florencio, Santosh Kumar Kancha, Anthony Duerr Product: Ryan Sandler, Keren Baruch Other teams (UED, Marketing, BizOps, Analytics, Testing, Voice of Members, Security, …): Julie Kuang, Phil Bunge, Prateek Janardhan, Fiona Li, Bharath Shetty, Sunil Mahadeshwar, Cory Scott, Tushar Dalvi, and team Acknowledgements David Freeman, Ashish Gupta, David Hardtke, Rong Rong, Ram
  • 61. Beyond Accuracy Performance and Cost Fairness and Bias Transparency and Explainability Privacy Security Safety Robustness
  • 63. Fairness in ML Application specific challenges Conversational AI systems: Unique bias/fairness/ethics considerations E.g., Hate speech, Complex failure modes Beyond protected categories, e.g., accent, dialect Entire ecosystem (e.g., including apps such as Alexa skills) Two-sided markets: e.g., fairness to buyers and to sellers, or to content consumers and producers Fairness in advertising (externalities) Tools for ensuring fairness (measuring & mitigating bias) in AI lifecycle Pre-processing (representative datasets; modifying features/labels) ML model training with fairness constraints Post-processing Experimentation & Post-deployment
  • 64. Explainability in ML Actionable explanations Balance between explanations & model secrecy Robustness of explanations to failure modes (Interaction between ML components) Application-specific challenges Conversational AI systems: contextual explanations Gradation of explanations Tools for explanations across AI lifecycle Pre & post-deployment for ML models Model developer vs. End user focused
  • 65. Privacy in ML Privacy-preserving model training, robust against adversarial membership inference attacks Privacy for highly sensitive data: model training & analytics using secure enclaves, homomorphic encryption, federated learning / on- device learning, or a hybrid Privacy-preserving transfer learning (broadly, privacy-preserving mechanisms for data marketplaces)
  • 66. Reflections “Fairness and Privacy by Design” when building AI products Collaboration/consensus across key stakeholders NYT / WSJ / ProPublica / ToI / The Hindu test :)
  • 67. Thanks! Questions? S. C. Geyik, S. Ambler, K. Kenthapadi, Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search, KDD’19 [Microsoft’s AI/ML conference (MLADS’18). Distinguished Contribution Award] K. Kenthapadi, T. T. L. Tran, PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn, CIKM’18 K. Kenthapadi, A. Chudhary, S. Ambler, LinkedIn Salary, IEEE Symposium on Privacy-Aware Computing (PAC), 2017 [Related: our KDD’18 & CIKM’17 (Best Case Studies Paper Award) papers] Our tutorials on privacy, on fairness, and on explainability in industry at KDD/WSDM/WWW/FAccT/AAAI (combining experiences at Apple, Facebook, Google, LinkedIn, Microsoft)