SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Finding and Fixing Bias
in Natural Language Processing
Yves Peirsman
Artificial Intelligence
Natural Language Processing
A primer in NLP
Machine
translation
Sentiment
analysis
Information
retrieval
Information
extraction
Text
classification
We provide consultancy
for companies that need
guidance in the NLP domain
We develop software
and train custom NLP
models for challenging
or domain-specific
applications.
Training data Training process Model
We integrate
models with
workflows.
NLP Town
We help annotate
training data.
We train models
for NLP
applications.
We provide consultancy
for NLP projects.
Bias in Natural Language Processing
Bias in Natural Language Processing
A primer in NLP
Training data Training process Model
A primer in NLP
Word Embeddings
Word embeddings allow NLP models to generalize better.
Word Embeddings
Word embeddings capture both general and linguistic knowledge.
Word Embeddings
Word embeddings also encode bias:
● Man is to king as woman is to ___.
● Man is to programmer as woman is to ___.
Experiment:
● Measure the similarity between occupations and
○ A set of “male” words: man, son, father, he, him, etc.
○ A set of “female” words: woman, daughter, mother, she, her, etc.
Word Embeddings
Pretrained NLP models
Pretrained language models are a recent significant breakthrough in NLP:
● Language models predict masked words.
● They learn a lot about language.
● This knowledge can be reused in “downstream” tasks.
This movie won her an Oscar for best actress.
The keys to the house are on the table.
Pretrained NLP models
ULMFit, Howard and Ruder 2018
Pretrained language models
Experiment: association with a large number
of positive adjectives
● One of the several recent Dutch Bert
models
● Association between 240 positive
adjectives and hij/zij:
○ aantrekkelijk, ambitieus, intelligent,
slim, knap, nauwkeurig,
nieuwsgierig, etc.
The problem with bias
or
Step 1: Identify bias with explainable AI
Challenge
● First we need to find out our models are biased: search for known, but also
unexpected bias
● An important role for explainable AI
Experiment
● A simple classifier for toxic comments
● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a
hole go to hell!"
Step 1: Identify bias with explainable AI
● Visualize the classifier features and their weights:
Step 1: Identify bias with explainable AI
Step 1: Identify bias with explainable AI
Step 2: Fixing and avoiding bias
Training data Training process Model
Training data Training process Model
Ensure the training
data is free of bias.
Step 2: Fixing and avoiding bias
Bias in annotation
Inform annotators about possible confounding factors, such as dialect.
● Example: if people are informed that a tweet contains African American
English dialect, they are less likely to label it as offensive (Sap et al. 2019)
Bias in text
● If you create a new corpus, ensure your texts contain as little bias as
possible.
● If you use existing data, try mitigating biases through data
augmentation, over- and/or undersampling, etc.
Step 2: Fixing and avoiding bias
Training data Training process Model
Pick a training
procedure that
makes the system
blind to bias.
Step 2: Fixing and avoiding bias
Adversarial training
Train your model to shine at your task, but to fail at
predicting “protected variables”, such as gender or race.
ModelCV
Step 2: Fixing and avoiding bias
Training data Training process Model
Change the
weights of the
model so that the
bias is reduced.
Step 2: Fixing and avoiding bias
Word embeddings
Transform the embeddings so that bias is removed.
Pre-trained models
Fine-tune on non-biased data, so that the models “forget” their bias.
Step 2: Fixing and avoiding bias
None of these methods are foolproof:
● You need to be aware of the bias before you can remove it
● Often only “superficial” bias is removed, but deeper bias remains (Honen
and Goldberg 2019)
As AI developers, it is our responsibility to deploy our system in such a way that
potentially harmful side effects are minimized.
● Effective feedback loops
● Human-in-the-loop AI
Step 2: Fixing and avoiding bias
http://www.nlp.town yves@nlp.town
Thanks! Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

"Introduction to Machine Learning and its Applications" at sapthgiri engineer...
"Introduction to Machine Learning and its Applications" at sapthgiri engineer..."Introduction to Machine Learning and its Applications" at sapthgiri engineer...
"Introduction to Machine Learning and its Applications" at sapthgiri engineer...
 
Machine learning
Machine learningMachine learning
Machine learning
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
Introduction to AI
Introduction to AIIntroduction to AI
Introduction to AI
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Fairness and Bias in Machine Learning
Fairness and Bias in Machine LearningFairness and Bias in Machine Learning
Fairness and Bias in Machine Learning
 
machine learning
machine learningmachine learning
machine learning
 
From Narrow AI to Artificial General Intelligence (AGI)
From Narrow AI to Artificial General Intelligence (AGI)From Narrow AI to Artificial General Intelligence (AGI)
From Narrow AI to Artificial General Intelligence (AGI)
 
Lecture 1. Introduction to AI and it's applications.ppt
Lecture 1. Introduction to AI and it's applications.pptLecture 1. Introduction to AI and it's applications.ppt
Lecture 1. Introduction to AI and it's applications.ppt
 
Artifical Intelligence
Artifical IntelligenceArtifical Intelligence
Artifical Intelligence
 
Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning Introduction to Artificial Intelligence and Machine Learning
Introduction to Artificial Intelligence and Machine Learning
 
Introduction To Machine Learning | Edureka
Introduction To Machine Learning | EdurekaIntroduction To Machine Learning | Edureka
Introduction To Machine Learning | Edureka
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
Innovation report: Artificial Intelligence
Innovation report: Artificial IntelligenceInnovation report: Artificial Intelligence
Innovation report: Artificial Intelligence
 
Introduction to Machine learning with Python
Introduction to Machine learning with PythonIntroduction to Machine learning with Python
Introduction to Machine learning with Python
 
Introduction to AI Ethics
Introduction to AI EthicsIntroduction to AI Ethics
Introduction to AI Ethics
 
Introduction To A.I
Introduction To A.IIntroduction To A.I
Introduction To A.I
 
A Glimpse Into the Future of Data Science - What's Next for AI, Big Data & Ma...
A Glimpse Into the Future of Data Science - What's Next for AI, Big Data & Ma...A Glimpse Into the Future of Data Science - What's Next for AI, Big Data & Ma...
A Glimpse Into the Future of Data Science - What's Next for AI, Big Data & Ma...
 
Introduction to LLMs
Introduction to LLMsIntroduction to LLMs
Introduction to LLMs
 
Artificial inteligence
Artificial inteligenceArtificial inteligence
Artificial inteligence
 

Ähnlich wie He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

ConveyUX Elegant Precision
ConveyUX Elegant PrecisionConveyUX Elegant Precision
ConveyUX Elegant Precision
laurentgc
 
Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx
Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docxLab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx
Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx
croysierkathey
 

Ähnlich wie He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town (20)

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman
Openbar Leuven // Less is more. Working with less data in NLP by Yves PeirsmanOpenbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman
Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman
 
Reflective Plan Examples
Reflective Plan ExamplesReflective Plan Examples
Reflective Plan Examples
 
What can Natural Language Processing do for you?
What can Natural Language Processing do for you?What can Natural Language Processing do for you?
What can Natural Language Processing do for you?
 
DataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdfDataScientist Job : Between Myths and Reality.pdf
DataScientist Job : Between Myths and Reality.pdf
 
ConveyUX Elegant Precision
ConveyUX Elegant PrecisionConveyUX Elegant Precision
ConveyUX Elegant Precision
 
Fine-tuning Pre-Trained Models for Generative AI Applications
Fine-tuning Pre-Trained Models for Generative AI ApplicationsFine-tuning Pre-Trained Models for Generative AI Applications
Fine-tuning Pre-Trained Models for Generative AI Applications
 
Clark ch 8 and 9
Clark ch 8 and 9Clark ch 8 and 9
Clark ch 8 and 9
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Clark ch 8 and 9
Clark ch 8 and 9Clark ch 8 and 9
Clark ch 8 and 9
 
ChatGPT in academic settings H2.de
ChatGPT in academic settings H2.deChatGPT in academic settings H2.de
ChatGPT in academic settings H2.de
 
Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx
Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docxLab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx
Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Ai demystified for HR and TA leaders
Ai demystified for HR and TA leadersAi demystified for HR and TA leaders
Ai demystified for HR and TA leaders
 
E-Learning Balancing Act: Good vs Efficient development-web_version092010
E-Learning Balancing Act: Good vs Efficient development-web_version092010E-Learning Balancing Act: Good vs Efficient development-web_version092010
E-Learning Balancing Act: Good vs Efficient development-web_version092010
 
Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...
Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...
Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...
 
Pair Programming with a Large Language Model
Pair Programming with a Large Language ModelPair Programming with a Large Language Model
Pair Programming with a Large Language Model
 
[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...
[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...
[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...
 
Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptx
Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptxAsk Not What AI Can Do For You - Nov 2023 - Slideshare.pptx
Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptx
 

Mehr von Patrick Van Renterghem

Mehr von Patrick Van Renterghem (20)

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...
 
Implementing error-proof, business-critical Machine Learning, presentation by...
Implementing error-proof, business-critical Machine Learning, presentation by...Implementing error-proof, business-critical Machine Learning, presentation by...
Implementing error-proof, business-critical Machine Learning, presentation by...
 
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...
 
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...
 
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Responsible AI: An Example AI Development Process with Focus on Risks and Con...Responsible AI: An Example AI Development Process with Focus on Risks and Con...
Responsible AI: An Example AI Development Process with Focus on Risks and Con...
 
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...
 
How obedient digital twins and intelligent beings contribute to ethics and ex...
How obedient digital twins and intelligent beings contribute to ethics and ex...How obedient digital twins and intelligent beings contribute to ethics and ex...
How obedient digital twins and intelligent beings contribute to ethics and ex...
 
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...
 
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...
 
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
Digital Workplace Case Study: How the Municipality of Duffel successfully swi...
 
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...
 
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...
 
Engie's Digital Workplace and "Connecting the company" business case, present...
Engie's Digital Workplace and "Connecting the company" business case, present...Engie's Digital Workplace and "Connecting the company" business case, present...
Engie's Digital Workplace and "Connecting the company" business case, present...
 
Face your communication challenges when implementing a digital workplace, bas...
Face your communication challenges when implementing a digital workplace, bas...Face your communication challenges when implementing a digital workplace, bas...
Face your communication challenges when implementing a digital workplace, bas...
 
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...
 
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...
 
Tim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentationTim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentation
 
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...
 
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...
 
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

  • 1. Finding and Fixing Bias in Natural Language Processing Yves Peirsman
  • 2. Artificial Intelligence Natural Language Processing A primer in NLP Machine translation Sentiment analysis Information retrieval Information extraction Text classification
  • 3. We provide consultancy for companies that need guidance in the NLP domain We develop software and train custom NLP models for challenging or domain-specific applications.
  • 4. Training data Training process Model We integrate models with workflows. NLP Town We help annotate training data. We train models for NLP applications. We provide consultancy for NLP projects.
  • 5. Bias in Natural Language Processing
  • 6. Bias in Natural Language Processing
  • 7. A primer in NLP Training data Training process Model
  • 9. Word Embeddings Word embeddings allow NLP models to generalize better.
  • 10. Word Embeddings Word embeddings capture both general and linguistic knowledge.
  • 11. Word Embeddings Word embeddings also encode bias: ● Man is to king as woman is to ___. ● Man is to programmer as woman is to ___. Experiment: ● Measure the similarity between occupations and ○ A set of “male” words: man, son, father, he, him, etc. ○ A set of “female” words: woman, daughter, mother, she, her, etc.
  • 13. Pretrained NLP models Pretrained language models are a recent significant breakthrough in NLP: ● Language models predict masked words. ● They learn a lot about language. ● This knowledge can be reused in “downstream” tasks. This movie won her an Oscar for best actress. The keys to the house are on the table.
  • 14. Pretrained NLP models ULMFit, Howard and Ruder 2018
  • 15. Pretrained language models Experiment: association with a large number of positive adjectives ● One of the several recent Dutch Bert models ● Association between 240 positive adjectives and hij/zij: ○ aantrekkelijk, ambitieus, intelligent, slim, knap, nauwkeurig, nieuwsgierig, etc.
  • 16. The problem with bias or
  • 17. Step 1: Identify bias with explainable AI Challenge ● First we need to find out our models are biased: search for known, but also unexpected bias ● An important role for explainable AI Experiment ● A simple classifier for toxic comments ● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a hole go to hell!"
  • 18. Step 1: Identify bias with explainable AI ● Visualize the classifier features and their weights:
  • 19. Step 1: Identify bias with explainable AI
  • 20. Step 1: Identify bias with explainable AI
  • 21. Step 2: Fixing and avoiding bias Training data Training process Model
  • 22. Training data Training process Model Ensure the training data is free of bias. Step 2: Fixing and avoiding bias
  • 23. Bias in annotation Inform annotators about possible confounding factors, such as dialect. ● Example: if people are informed that a tweet contains African American English dialect, they are less likely to label it as offensive (Sap et al. 2019) Bias in text ● If you create a new corpus, ensure your texts contain as little bias as possible. ● If you use existing data, try mitigating biases through data augmentation, over- and/or undersampling, etc. Step 2: Fixing and avoiding bias
  • 24. Training data Training process Model Pick a training procedure that makes the system blind to bias. Step 2: Fixing and avoiding bias
  • 25. Adversarial training Train your model to shine at your task, but to fail at predicting “protected variables”, such as gender or race. ModelCV Step 2: Fixing and avoiding bias
  • 26. Training data Training process Model Change the weights of the model so that the bias is reduced. Step 2: Fixing and avoiding bias
  • 27. Word embeddings Transform the embeddings so that bias is removed. Pre-trained models Fine-tune on non-biased data, so that the models “forget” their bias. Step 2: Fixing and avoiding bias
  • 28. None of these methods are foolproof: ● You need to be aware of the bias before you can remove it ● Often only “superficial” bias is removed, but deeper bias remains (Honen and Goldberg 2019) As AI developers, it is our responsibility to deploy our system in such a way that potentially harmful side effects are minimized. ● Effective feedback loops ● Human-in-the-loop AI Step 2: Fixing and avoiding bias