He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

•

0 gefällt mir•173 views

Yves Peirsman presents several instances where bias has posed a risk to the successful adoption of NLP systems, and discusses what techniques exist to discover these biases before the systems are put in production.

Technologie

Finding and Fixing Bias
in Natural Language Processing
Yves Peirsman

Artificial Intelligence
Natural Language Processing
A primer in NLP
Machine
translation
Sentiment
analysis
Information
retrieval
Information
extraction
Text
classification

We provide consultancy
for companies that need
guidance in the NLP domain
We develop software
and train custom NLP
models for challenging
or domain-specific
applications.

Training data Training process Model
We integrate
models with
workflows.
NLP Town
We help annotate
training data.
We train models
for NLP
applications.
We provide consultancy
for NLP projects.

A primer in NLP
Training data Training process Model

Word Embeddings
Word embeddings allow NLP models to generalize better.

Word Embeddings
Word embeddings capture both general and linguistic knowledge.

Word Embeddings
Word embeddings also encode bias:
● Man is to king as woman is to ___.
● Man is to programmer as woman is to ___.
Experiment:
● Measure the similarity between occupations and
○ A set of “male” words: man, son, father, he, him, etc.
○ A set of “female” words: woman, daughter, mother, she, her, etc.

Pretrained NLP models
Pretrained language models are a recent significant breakthrough in NLP:
● Language models predict masked words.
● They learn a lot about language.
● This knowledge can be reused in “downstream” tasks.
This movie won her an Oscar for best actress.
The keys to the house are on the table.

Pretrained NLP models
ULMFit, Howard and Ruder 2018

Pretrained language models
Experiment: association with a large number
of positive adjectives
● One of the several recent Dutch Bert
models
● Association between 240 positive
adjectives and hij/zij:
○ aantrekkelijk, ambitieus, intelligent,
slim, knap, nauwkeurig,
nieuwsgierig, etc.

Step 1: Identify bias with explainable AI
Challenge
● First we need to find out our models are biased: search for known, but also
unexpected bias
● An important role for explainable AI
Experiment
● A simple classifier for toxic comments
● Example: "Stupid peace of shit stop deleting my stuff asshole go die and fall in a
hole go to hell!"

Step 1: Identify bias with explainable AI
● Visualize the classifier features and their weights:

Step 1: Identify bias with explainable AI

Step 2: Fixing and avoiding bias
Training data Training process Model

Training data Training process Model
Ensure the training
data is free of bias.
Step 2: Fixing and avoiding bias

Bias in annotation
Inform annotators about possible confounding factors, such as dialect.
● Example: if people are informed that a tweet contains African American
English dialect, they are less likely to label it as offensive (Sap et al. 2019)
Bias in text
● If you create a new corpus, ensure your texts contain as little bias as
possible.
● If you use existing data, try mitigating biases through data
augmentation, over- and/or undersampling, etc.
Step 2: Fixing and avoiding bias

Training data Training process Model
Pick a training
procedure that
makes the system
blind to bias.
Step 2: Fixing and avoiding bias

Adversarial training
Train your model to shine at your task, but to fail at
predicting “protected variables”, such as gender or race.
ModelCV
Step 2: Fixing and avoiding bias

Training data Training process Model
Change the
weights of the
model so that the
bias is reduced.
Step 2: Fixing and avoiding bias

Word embeddings
Transform the embeddings so that bias is removed.
Pre-trained models
Fine-tune on non-biased data, so that the models “forget” their bias.
Step 2: Fixing and avoiding bias

None of these methods are foolproof:
● You need to be aware of the bias before you can remove it
● Often only “superficial” bias is removed, but deeper bias remains (Honen
and Goldberg 2019)
As AI developers, it is our responsibility to deploy our system in such a way that
potentially harmful side effects are minimized.
● Effective feedback loops
● Human-in-the-loop AI
Step 2: Fixing and avoiding bias

http://www.nlp.town yves@nlp.town
Thanks! Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

"Introduction to Machine Learning and its Applications" at sapthgiri engineer...

Sachin Nagargoje

Machine learning

Sandeep Singh

This presentation is a friendly introduction to Artificial Intelligence, Data Science and Machine Learning. It touches on the beginnings of AI, the steps involved in Data Science, the roles involving operations on data, and the buzz around "Technology Singularity". It ends by looking at tools and system requirements for people who might want to start a career in AI. Have fun exploring Artificial Intelligence!

An Elementary Introduction to Artificial Intelligence, Data Science and Machi...

Dozie Agbo

Introduction to AI

Dymytr Yovchev

Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.

Fairness in Machine Learning and AI

Seth Grimes

Fairness and Bias in Machine Learning

Surya Dutta

machine learning

soundaryasarya

From Narrow AI to Artificial General Intelligence (AGI)

Helgi Páll Helgason, PhD

Lecture 1. Introduction to AI and it's applications.ppt

DebabrataPain1

Artifical Intelligence

Harsha Varyani

Introduction to Artificial Intelligence and Machine Learning

Emad Nabil

** Data Science Certification Training: https://www.edureka.co/data-science ** This Edureka's PPT on "Introduction To Machine Learning" will help you understand the basics of Machine Learning and how it can be used to solve real-world problems. The following topics are covered in this session: Need For Machine Learning What is Machine Learning? Machine Learning Definitions Machine Learning Process Types Of Machine Learning Type Of Problems Solved Using Machine Learning Demo YouTube Video: https://youtu.be/BuezNNeOGCI Blog Series: http://bit.ly/data-science-blogs Data Science Training Playlist: http://bit.ly/data-science-playlist Follow us to never miss an update in the future. YouTube: https://www.youtube.com/user/edurekaIN Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka

Introduction To Machine Learning | Edureka

Edureka!

For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.

An Introduction to Generative AI

Cori Faklaris

Innovation report: Artificial Intelligence

Youssef Rahoui

Introduction to Machine learning with Python

Chariza Pladin

As artificial intelligence (AI) continues to advance and become more integrated into our daily lives, it has become increasingly important to consider the ethical implications of this technology. AI has the potential to transform many industries and improve our lives in numerous ways, but it also raises important ethical questions. In this presentation, the ethical concerns surrounding AI are explored and discussed, with a focus on the need for ethical guidelines to be developed for AI development and use. We will examine issues such as privacy, bias, transparency, accountability, and the impact on jobs and society as a whole. Through this exploration, we will consider the various perspectives on these issues and weigh the benefits and drawbacks of different ethical approaches to AI. We will also examine some of the current efforts being made to address these concerns, including the development of ethical frameworks and best practices. The most important goal of this presentation is to disseminate a deeper understanding of the ethical considerations surrounding AI and the need for ethical guidelines to ensure that this technology is developed and used in a way that benefits all of us while respecting our values and principles.

Introduction to AI Ethics

Gabriele Graffieti

Introduction To A.I

Yasin Asadi

We are living in the era of "the fourth industrial revolution". How did we get here? Read this presentation to explore current application trends in Artificial Intelligence (AI,) The Internet of Things (IoT), Big Data, and Machine Learning (ML) technology. Also, to discover the future implications of big data in our lives. Read the original article here: https://www.pangea.ai/data-science-resources/future-of-data-science/ Work with a data science expert at Pangea: https://www.pangea.ai/

A Glimpse Into the Future of Data Science - What's Next for AI, Big Data & Ma...

Pangea.ai

Introduction to LLMs

Loic Merckel

Artificial inteligence

ankit dubey

Was ist angesagt? (20)

"Introduction to Machine Learning and its Applications" at sapthgiri engineer...

Machine learning

An Elementary Introduction to Artificial Intelligence, Data Science and Machi...

Introduction to AI

Fairness in Machine Learning and AI

Fairness and Bias in Machine Learning

machine learning

From Narrow AI to Artificial General Intelligence (AGI)

Lecture 1. Introduction to AI and it's applications.ppt

Artifical Intelligence

Introduction to Artificial Intelligence and Machine Learning

Introduction To Machine Learning | Edureka

An Introduction to Generative AI

Innovation report: Artificial Intelligence

Introduction to Machine learning with Python

Introduction to AI Ethics

Introduction To A.I

A Glimpse Into the Future of Data Science - What's Next for AI, Big Data & Ma...

Introduction to LLMs

Artificial inteligence

Ähnlich wie He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town

It’s often said we live in the age of big data. Therefore, it may come as a surprise that in the field of natural language processing, machine learning professionals are often faced with data scarcity. Many organizations that would like to apply NLP lack a sufficiently large collection of labeled text in their language or domain to train a high-quality NLP model. Luckily, there’s a wide variety of ways to address this challenge. First, approaches such as active learning reduce the number of training instances that have to be labeled in order to build a high-quality NLP model. Second, techniques such as distant supervision and proxy-label approaches can help label training examples automatically. Finally, recent developments in semisupervised learning, transfer learning, and multitask learning help models improve by making better use of unlabeled data or training them on several tasks at the same time.

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup

Yves Peirsman

In this era of Big Data, finding suitable data to automate a task is often still a challenge for Machine Learning professionals. This is certainly the case in Natural Language Processing, the subdomain of Artificial Intelligence that is concerned with the automatic processing of texts, as in machine translation, text classification, etc. In those tasks, the quality of the results crucially depends on the amount of available data in a given language and a given domain (CVs, medical texts, etc.). To fix this problem, researchers are focusing more attention on ways of training better models with less data. In this presentation, Yves will discuss the recent trends in this domain and show how they have helped his company NLP Town develop NLP solutions.

Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman

Openbar

Reflective Plan Examples

Monica Turner

What can Natural Language Processing do for you?

Yves Peirsman

Swipe through the smoke and mirrors and learn about the "sexiest job of the 21st century" with Nicola, Machine Learning Scientist @ Bumble ✨ Artificial Intelligence? Business Intelligence? Data Science? What do these terms sound like when put into action at one of the world's most forefront dating platforms? Jedha is proud to host an evening with Nicola Ghio, Senior Machine Learning Scientist at Bumble, who will give us a "peek behind the curtain" into what this enviable job title looks like in practice. 😎 Nicola will share some of his experiences working at Bumble. 🎯 Hear first-hand about Bumble's harassment and toxic imaging detector as well as the real skills required to work in the industry. We also look forward to hearing about Nicola's personal story, his background and his advice for those that want to dive deeper into the world of tech. Meet Jedha 😍 Your Data and Cyber Security Bootcamp, ranked #1 in Europe (Switch Up). Our mission is to demystify the world of tech and to make its skills accessible to anyone who desires to learn. We have courses suited to all ambitions and skill levels: From beginners who have never typed a line of code in their lives right through to skilled tech professionals who want to achieve mastery. Our methods and teachers help to unlock human potential in the unlimited world of tech.

DataScientist Job : Between Myths and Reality.pdf

Jedha Bootcamp

ConveyUX Elegant Precision

laurentgc

Fine-tuning Pre-Trained Models for Generative AI Applications

Benjaminlapid1

Clark ch 8 and 9

Christian King

How to fine-tune and develop your own large language model.pptx

Knoldus Inc.

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"

Fwdays

Clark ch 8 and 9

Christian King

ChatGPT in academic settings H2.de

David Döring

Lab Assignment 5 Correlations and Chi-Squares in SPSS 1. Test the research hypothesis that college students who report spend more evenings during socializing with friends during the semester tend to reporting lower 1st year GPAs. Data Analysis: a. Considering the RH: -- Does this research hypothesis suggest a positive, a negative, or no relationship between these variables? Explain. State the null hypothesis b. Examining the Scatterplot -- follow the handout to get the scatterplot of these variables (put GPA on the Y axis) · Is the relationship clearly nonlinear? What would we do (or not do) of there was a nonlinear relationship? · Does the "direction" of the scatterplot seem to support the RH:? Why or why not? c. Statistical analysis -- follow the handout to get the Pearson's correlation between these variables For the 1st year GPA mean = _____________ std = _____________ N = _____________ For the # nights mean = _____________ std = _____________ r = __________________ df = __________ p = ____________ Retain or reject the null ? Support research hypothesis ? Draw the Picture – Given the r-value and p-value, draw the graph. Write-up in APA style -- follow the example in the SPSS how-to and page 257 of the Research Design and SPSS book, including the mean and std for each group in the write-up. 2. Test the research hypothesis that, among college students, men tended to have voted in the last student government election whereas women are about equally divided between those who vote and those who didn't vote. State the null hypothesis: Data Analysis -- follow the handout to get the Pearson's X² between these variables Number of males in the sample _______________number of females ___________________ Number who voted in the sample _____________number who didn't vote___________________ X² = __________________ df = __________ p = ____________ Retain or reject the null? Support research hypothesis? Write-up in APA style -- follow the example in the SPSS how-to and page 261 of the Research Design and SPSS book. NOTE: Get back to me ASAP using ([email protected]) if you can deliver within 3 hours thanks  Are any employees of your company represented by labor unions or covered by collective bargaining agreements? Are any of these employees working outside of the United States? Employees of apple are neither represented by labor unions nor covered by collective bargaining agreements. Apple is responsible for supporting and creating millions of jobs across globe. Some of its employees are hired from various countries while others are hired from US to be deployed in subsidiaries of Apple across the globe.  Does your company employ expatriates in any overseas operations it might have? If so, what resources does the company provide to train expatriates before they go to the foreign location? Does the company also provide training or supp ...

Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx

croysierkathey

Meltwater is a Business Intelligence company of +1000 individuals spread across ~60 offices in ~30 countries with over 26,000 clients. At Meltwater we see ourselves as a Outside Insights company, meaning we seek to deliver similar type of business analytics & insights as traditional CRM dashboards and ERP systems used to, except by leveraging data outside the firewall (social media, news, blogs etc.) we believe the insights can be much more decisive and predictive for our clients business. Part of the challenge with this is of course structuring the unstructured data out there. This is why the Data Science team at Meltwater has the mission to ingest, categorize, label, classify, and a whole range of other enrichments on the content that we crawl in order to index it properly in our big data architecture and make it available for our insights dashboard. We do these enrichments in +17 languages. Babak Rasolzadeh is the Director of Data Science & NLP at Meltwater and has a team of 24 engineers on this team. Prior to Meltwater, Babak was the co-founder of OculusAI, a computer vision start-up in Sweden, that was sold to Meltwater in 2013. He holds a PhD in Computer Vision, from KTH in Sweden, and has worked on things ranging from self-driving cars to humanoid robots and mobile object recognition. He is an advisor for several startups here in US and Sweden.

Babak Rasolzadeh: The importance of entities

Zoltan Varju

Ai demystified for HR and TA leaders

Antonia Macrides

E-Learning Balancing Act: Good vs Efficient development-web_version092010

tmharpster

Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...

IL Group (CILIP Information Literacy Group)

In this session we will Learn how LLMs can enhance, debug, and document our code. AI pair programming is being rapidly adopted by developers to help with tasks across the tech stack, from catching bugs to quickly inserting entire code snippets. We will learn how to use an LLM in pair programming to: Simplify and improve your code. Write test cases. Debug and refactor your code. Explain and document any complex code written in any coding language

Pair Programming with a Large Language Model

Knoldus Inc.

The "life" of a company is the sum of its decisions. Hasty decisions can be disastrous, late decisions could mean loss of opportunity, but these decisions have to be made. Therefore it is important to have a tool that assists in decision making. The main focus of this talk is to show the importance of support to decision making, understand the importance of risk and impediment management in agile environments and to present an approach to identify actions to mitigate risks and solve impediments based on Agile Community Knowledge. This talk includes an example of a simple tool from the company SCRAIM. You can also check the video goo.gl/SBqAW4

[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...

Pedro Henriques

Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptx

D2L Barry

Ähnlich wie He Said, She Said: Finding and Fixing Bias in NLP (Natural Language Processing, presented by Yves Peirsman, CTO at NLP Town (20)

Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup

Openbar Leuven // Less is more. Working with less data in NLP by Yves Peirsman

Reflective Plan Examples

What can Natural Language Processing do for you?

DataScientist Job : Between Myths and Reality.pdf

ConveyUX Elegant Precision

Fine-tuning Pre-Trained Models for Generative AI Applications

Clark ch 8 and 9

How to fine-tune and develop your own large language model.pptx

Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"

Clark ch 8 and 9

ChatGPT in academic settings H2.de

Lab Assignment 5Correlations and Chi-Squares in SPSS1. Tes.docx

Babak Rasolzadeh: The importance of entities

Ai demystified for HR and TA leaders

E-Learning Balancing Act: Good vs Efficient development-web_version092010

Empowering Future-Ready Students: Teaching AI Ethics and Information Literacy...

Pair Programming with a Large Language Model

[Agile Portugal 2014] - Agile Decision Support System for Upper Management - ...

Ask Not What AI Can Do For You - Nov 2023 - Slideshare.pptx

Mehr von Patrick Van Renterghem

Ethical AI at VDAB, presented by Vincent Buekenhout (Ethical AI Lead, VDAB) a...

Patrick Van Renterghem

Implementing error-proof, business-critical Machine Learning, presentation by...

Patrick Van Renterghem

Building Trust and Explainability into Chatbots: the Partena Ziekenfonds Busi...

Patrick Van Renterghem

AI & Ethics: The Belgian Industry Vision & Initiatives, presentation by Jelle...

Patrick Van Renterghem

Organisations need to make sure that they use AI in an appropriate way. Martijn and Hugo explain how to ensure that the developments are ethically sound and comply with regulations, how to have end-to-end governance, and how to address bias and fairness, interpretability and explainability, and robustness and security. During the conference, we looked at an example AI development process with focussing on the risks to be managed and the controls that can be established.

Responsible AI: An Example AI Development Process with Focus on Risks and Con...

Patrick Van Renterghem

Fairness and Transparency: Algorithmic Explainability, some Legal and Ethical...

Patrick Van Renterghem

How obedient digital twins and intelligent beings contribute to ethics and ex...

Patrick Van Renterghem

Introduction to Bias in Machine Learning, presented by Matthias Feys, CTO @ M...

Patrick Van Renterghem

Business Case: Ozitem Groupe, where 80% of the company is working remotely. R...

Patrick Van Renterghem

Digital Workplace Case Study: How the Municipality of Duffel successfully swi...

Patrick Van Renterghem

Unleashing the Full Potential of People, Teams and SOLVAY, presented by Bruce...

Patrick Van Renterghem

The Building Blocks of a Digital Workplace, presented by Sam Marshall at the ...

Patrick Van Renterghem

Engie's Digital Workplace and "Connecting the company" business case, present...

Patrick Van Renterghem

Face your communication challenges when implementing a digital workplace, bas...

Patrick Van Renterghem

The first steps in Recticel's Digital Workplace program by Kenneth Meuleman (...

Patrick Van Renterghem

Presentation by Dave Geentjens at the "Successful Digital Workplace Adoption"...

Patrick Van Renterghem

Tim scottkoenverheyenpresentation

Patrick Van Renterghem

The start of GDPR implementations in Europe was, for most organizations, also the start of rethinking their Data Warehouse strategy. The experience of past implementations gave a better view on the do's and don'ts. One of the important lessons learned was the approach of handling information quality. It's not something you handle on top of your data warehouse. To be successful, information quality goes hand in hand with your data warehouse implementation.

Presentation by Ivan Schotsmans (DV Community) at the Data Vault Modelling an...

Patrick Van Renterghem

Presentation by Luc Delanglez (DataLumen) at the Data Vault Modelling and Dat...

Patrick Van Renterghem

Presentation by Erik van der Hoeven (Wisdom as a Service) at the Data Vault M...

Patrick Van Renterghem

Mehr von Patrick Van Renterghem (20)