Ethical Issues in Machine Learning Algorithms. (Part 3)

Ethical Issues in
Machine Learning Algorithms
(Part 3)
IEEE Young Professionals Bulgaria,
Vladimir Kanchev, PhD
1

Introduction
2
Dr. Kim, (2018, May 31) Human ethics for artificial intelligent
beings. An Ethics Scary Tale. Retrieved from
https://aistrategyblog.com/category/utilitarianism/

Contents
1. Advances in Data Science (DS) and
Machine learning (ML) fields.
2. Ethics and ethical issues.
3. Current legislation. GDPR.
4. ML data bias, algorithmic bias, and
interpretability issues.
5. Ongoing academic research problems.
3

Recent ML ethical issues
Fields of application:
 bias in face recognition systems
 gender-biased results and chatbot issues in NLP
 credit score computation
 user profiling and personalization
4

Bias in face recognition systems
5 https://bit.ly/2ygssbo

Face recognition
Def: a biometric software application capable of
uniquely identifying or verifying a person by
comparing and analyzing patterns based on the
person's facial contours.
https://www.techopedia.com/definition/32071/facial-recognition6

Face recognition
 developed, commercialized biometric technology;
can be found on mobile phones
 widely used by law enforcement agencies in USA
and China
 non-contact, non-invasive technology
 very high accuracy
 depends on lighting; can be tricked by make-up
and glasses
7

Face recognition operations
https://bit.ly/2PyB9Tx8

Face recognition algorithms
Wang, Mei, and Weihong Deng. "Deep face recognition: A
survey." arXiv preprint arXiv:1804.06655 (2018).
9

Bias in face recognition systems
Bias appears in face recognition systems because
of the use of:
 older algorithms
 features related to facial features, such as color
 racial-biased datasets
 deep learning classifiers
.
10

Consequences
 inefficiency of video surveillance systems in public
city areas
 increased privacy concerns caused by video
surveillance systems
 lower accuracy rate in Afro-American and Asian
males and females; innocent black suspects come
under police scrutiny
 a major lag in mass implementation and acceptance
of the technology
.11

Bias cases
http://gendershades.org/overview.html.12

Bias cases
http://gendershades.org/overview.html.13

Bias cases
Use of machine learning to detect features of the
human face, associated with criminality*:
 Some research about the problem in the past,
currently abandoned (Cesare Lambroso).
 Wu and Zhang(2016)* trained few classifiers with
two classes – criminal (from ID photos) and non-
criminal faces (from their professional pages).
 Finally the authors constructed a smile detector **
– with over 90 % accuracy.
Wu, Xiaolin, and Xi Zhang. "Automated inference on criminality using face
images." arXiv preprint arXiv:1611.04135 (2016): 4038-4052.14

Bias cases
15
Wu, Xiaolin, and Xi Zhang. "Automated inference on criminality using face
images." arXiv preprint arXiv:1611.04135 (2016): 4038-4052.
https://bit.ly/2O07D8o

Detection of sexual orientation
by face recognition
16
 Detection wx. people are gay or straight based on
their photo images – 81% accuracy (men) and
74% (women). Human judges-61% and 54% resp.
 Theory said that sexual orientation comes from
exposure to certain hormones.
 Gays have narrower jaws, longer noses and larger
foreheads than straight men, while gay women
have larger jaw and smaller foreheads.
 Use of a sample of 35 thousand images from a US
dating site; no color, transgender, bisexual people
 Use of a deep learning classifier.
Wang, Yilun, and Michal Kosinski. "Deep neural networks are more accurate
than humans at detecting sexual orientation from facial images." Journal of
personality and social psychology114.2 (2018):

Detection of sexual orientation
by face recognition
Wang, Yilun, and Michal Kosinski. "Deep neural networks are more accurate
than humans at detecting sexual orientation from facial images." Journal of
personality and social psychology114.2 (2018):
17

Dealing with bias
How bias can be prevented:
 make train datasets more diverse
 additional operations of detection of faces and set
more sensitive parameters of classifier
 not allow users to search terms as gorilla,
chimpanzee, or monkey (Google photo service)
.
18

19

Ethical issues in NLP
20 https://bit.ly/2WjnceM

Ethical issues in NLP
Natural language processing (NLP):
Def: is an AI branch that deals with analyzing,
understanding and generating the languages
that humans use naturally in order
to interface with computers using natural
human languages.
Challenge: Ambiguity of human language.
https://www.webopedia.com/TERM/N/NLP.html21

Human language and NLP
But human language is also:
 proxy for human behavior
 a sign of membership in a certain group
 always context-specific – is related to and
depends on a specific situation, time and place
22

Current tasks of NLP
 automatic summarization
 translation
 named entity recognition
 parts-of-speech tagging
 sentiment analysis
 speech recognition
 topic segmentation
 question answering
23

Major approaches in NLP
 Until the 80s, hand-written rules; after that
statistical machine learning came into life.
 During the 2010s, DL neural networks and
representation learning; state-of-the-art results.
 Now use of word embeddings to capture the
semantic properties of words; increased end-to-end
learning.
24

Word embeddings
Word embeddings (WE) is a model that maps
English words to high-dimensional vector of numbers.
WE:
 is trained on a large body of text (corpus)–
word2Vec.
 correlates semantic similarity with spatial proximity
- “Man is to Woman as Brother is to Sister”.
 uses cos distance to calculate similarity between
vectors.
 is characterized by a social bias, shown by the
Word Embedding Association Test (WEAT).
25

Word embeddings bias
WEAT says that:
 Male names and pronouns were closer to words
about career, while female ones were closer to
concepts like homemaking and family.
 Young people’s names were closer to pleasant
words, while old people’s names were closer
to unpleasant words.
 Male names were closer to words about math and
science, while female names were closer to the art.
https://bit.ly/2IJXMDP26

Examples:
 Man: Woman as King: Queen
 Man: Computer_Programmer as Woman:
Homemaker
 Father: Doctor as Mother: Nurse
Word embeddings can reflect gender, ethnicity,
age, sexual orientation and other biases of texts used
to train the model.
Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to
homemaker? debiasing word embeddings." Advances in neural information27

homemaker? debiasing word embeddings." Advances in Neural Information
Processing Systems. 2016
28

homemaker? debiasing word embeddings." Advances in Neural Information
Processing Systems. 2016
29

Debiasing word embeddings
Algorithm for debiasing:
1. Learn word embeddings from a text corpus –
obtain vectors for words.
2. Identify bias direction.
Calculate difference values between vectors of pairs of words
- he and she, male and female and average them.
3. Neutralize words, which are not gender-specific –
e.g. doctors.
4. Equalize pairs as girl-boy, grandfather-
grandmother.
Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker?
debiasing word embeddings." Advances in Neural Information Processing Systems.
2016.
30

Discussion
 Blind application of word embeddings can amplify
gender biases presented in data.
 Word embeddings in data reflects biases presenting
in society.
 There are similar biases related to race, ethnic and
cultural groups
 The focus is on word embeddings in English
language.
Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker?
debiasing word embeddings." Advances in Neural Information Processing Systems.
2016.
31

Chatbots
32 https://bit.ly/2ZLUp4j

Chatbots
Chatbot (conversational agent):
Def: is a computer program or an artificial intelligence which
conducts a conversation via auditory or textual
methods*.
Conversational user interface:
Def: A CI is a hybrid UI that interacts with users combining
chat, voice or any other natural language interface
with graphical UI elements like buttons, images,
menus, videos, etc.**
They are also related to Turing test.
*„What is a chatbot? techtarget.com. Retrieved 30 January2017
**https://bit.ly/2CMP8On
33

Types of chatbots
 Basic bots – use pre-written keywords
ELIZA (1966), Alice (1995)
 Text-based Assistant
Facebook M (2015), Google Allo (2016), Slack‘s slackbot
 Voice Assistant
Google Assistant (2016), Apple Siri (2011), Google Now
(2012), Amazon Alexa (2014), Microsoft Cortana (2014)
 A lot of specialized text-based assistants
customer support bots, news bots, entertainment bots, etc.
34

Evolution of chatbots
35 https://bit.ly/2QnAPpy

Major types of chatbots
36 https://bit.ly/2WdPPtN

Current situation
 Chatbots are good for repetitive, well-defined
(common questions & answers) tasks and scenarios.
 Attract a lot of interest of industry to reduce labor
expenses.
 They are integrated in websites, enterprise systems,
etc.
 There are a lot of chatbot development platforms on
market.
 It is hard to reach production quality; often humans
are frustrated when dealing with chatbots.
37

Chatbot brands
38 https://bit.ly/2XRq7LS

Application of chatbots
 Chatbots can replace FAQ sections.
 They are used in customer service operations,
automatic emailing. Straightforward problem are
solved by chatbots, more complicated – by humans.
 By using them, customer support agents improve
the shopping process and personalize it.
 They have improved response rate, compared with
human support agents.
39

Current DS approaches
 Gathering data from users – sex, age, habits; thus
aiming to achieve personalization.
 Using of large data and reinforcement learning.
 Aiming to build feeling of trust and empathy with
human users – use of a sentiment analysis.
 Extracting intent (the purpose) and entity (object,
context for intent) from user input.
 Deciding on the next best action in a conversation
using DL (RNN, LSTM) and input, training data, and
conversation history (Resa chatbot).
40

Chatbot system architecture
41 https://bit.ly/2XRq7LS

Ethical issues
 User information gathered for personalization of
chatbots – data privacy issues
 Training chatbots with obscenities and extremist
view data - bias through interaction (MS chatbot
Tay)
 Algorithmic NLP bias in chatbots
42

Solving ethical issues
 Filtering political topics of conversation (as MS
chatbot Zo – heir of chatbot Tay).
 Training with data of diverse topics, encouraging a
diverse set of real users.
 Building a diverse team of developers – of a
technical and non-technical background.
 Applying a bias tracking system of developers –
more control and then ML algorithm black-box
testing.
 Providing more transparency of ML algorithms (as in
an open-source community).
https://bit.ly/2Hn6AKH
https://bit.ly/2v4L0XH
43

44

Credit score computation
https://bit.ly/2XSJ8gX45

Credit score
 Credit score is a numeric expression, measuring
people’s or company’s credit-worthiness.
 Banks use it for decision-making for credit
application.
 Depends on credit history.
 It indicates how dependable an individual or a
company is.
46

Scorecard algorithm
Scorecard:
Def: a standard and easy to understand credit scoring
algorithm. A Binary problem:
1st class – default – a customer fails to pay install.
2nd class – a customer pays regular installments for
a given time period.
It consists of:
 building and training a statistical or a ML model.
 applying the chosen model to assign a score to
every credit application.
47

Scorecard algorithm
 Use of ML algorithms as logistic regression, random
trees, boosting, neural networks, generalized
additive models
 Use of Area under curve (AUC) based on ROC
analysis for model evaluation, Gini coefficients
 The data should be comprehensive – allowing few
missing values, and including as many data points
as possible from the financial records of customers
and their payment history
48

Credit score algorithm
https://bit.ly/2F1G3Fv49

Data schema & workflow
https://bit.ly/2ZJYWV250

Credit score algorithm
51 https://bit.ly/2DCMhdi

Current DS issues
52
 Customers with no credit history need to be set
into predefined groups.
 Wide introduction of automated credit score - aims
to make markets more efficient and low cost
financial services but introduces algorithmic bias.
 Incomplete data can influence negatively the
accuracy of the final results.

Explainability vs. Accuracy
53 https://bit.ly/2VK6Izj

Ethical issues
 protection of personal data - necessary for credit
score calculation
 explainability and transparency of the used ML
algorithm
 introduction of bias – danger of discrimination for
ethnic minorities by implicit correlation
 lack of accuracy, objectivity, and accountability of
credit score computation
54

 use of interpretable ML algorithms/models
 preparation of training data samples to avoid bias
 protection of personal data against breaches
through anonymization
 training all employees to work with ML algorithms
and know their biases
 continuous human supervision of ML algorithms
 auditability of AI algorithms
55

56

User profiling and personalization
https://bit.ly/2V6MBMd57

User profiling
A user profile:
Def: is a set of information representing a user via user related
rules, settings, needs, interests, behaviors and
preference*.
Personalization:
Def: a process to change the functionality, information content
or distinctiveness of a system to increase its personal
relevance to an individual**.
S. Henczel (2004). Creating user profiles to improve information quality,
Factiva, 28(3), p. 30.
J. Blom (2000). Personalization-a taxonomy, Conference on Human
Factors in Computing Systems, pp. 313-314.
58

User profiling methods
User profile aims to provide a personalized
service – matching users’ requirements, preferences
and needs with the service delivery.
Approaches of retrieving information about the user:
 Explicit method – information is provided
explicitly by the user – static profiling.
 Implicit method – analyzes user‘s behavior
pattern to determine user‘s interest – dynamic user
profiling
 Hybrid method – a combination of both methods.
59

User profiling methods
 Content-Based Method – assumes the user
behaves the same way under the same
circumstances.
Vector-space model, Latent Semantic Indexing,
Learning Information Agents, Neural Network Agents …
 Collaborative method - assumes that users who
belong to the same group behave similarly.
Memory-Based and Model-Based
 Hybrid method – a combination of both methods.
60

Current challenges
 Generation of an initial user profile for a new user
 Continuous update of the profile information to
adapt to user‘s changing preferences, interests and
needs – data drift
 Changing regulations to protect user‘s data – GDPR
legislation
61

Recommender systems
62
 Aim to predict user’s interest, recommend items,
increase sales and revenues of companies.
 Use characteristic information (keywords,
categories) and users (preferences, profiles, etc.);
needs a lot of data for training.
 Use of item-to-item and user-to-user
recommendations to train the RS.
 Reduce feature space by matrix factorization
(SVD) and DL; use injected randomness or
exploitation-exploration to avoid overfitting.
https://bit.ly/2GbUHbV

Recommender systems
63 https://bit.ly/2k05fA9

Recommender system architecture
64 https://bit.ly/2XRlYaO

Content personalization
Def: delivering the right message to the right visitor
at the right time.
Main purposes:
 to increase visitor engagement
 to improve customer experience
 to increase conversion rates
 to increase customer acquisition

Content personalization

Personalization system workflow
67 https://bit.ly/2J33tMn

Ethical issues
 privacy issues during user data gathering
 underrepresentation of minorities, societal bias
 construction of bubbles around users, political
debates within echo chambers
 objectivity of search results (Google) is impaired
due to user profiling and corporate politics
68

 Transparency of personalization ML algorithms -
users should know how it works and to have an
option to change it.
 Ensuring interactivity - opportunity to provide
correction actions, when biases are spotted by
users.
 Robustness of the ML system against manipulation
- against rumors and false information.
 Fast reaction to ethically compromised input.
69

Discussion
 ongoing topic of research, a public debate among
researchers, practioners, and general users
 a major obstacle to the introduction of many ML
systems
 a lack of standardized set of algorithms to solve
them, or debiasing; only general approaches
 What do you think is the most important ethical
issue related to the mentioned (or other) ML
technologies?
70

End
Thank you for your attention!
71

Ethical Issues in Machine Learning Algorithms. (Part 3)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Ethical Issues in Machine Learning Algorithms. (Part 3)

Ähnlich wie Ethical Issues in Machine Learning Algorithms. (Part 3) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Ethical Issues in Machine Learning Algorithms. (Part 3)

Hinweis der Redaktion