SlideShare ist ein Scribd-Unternehmen logo
1 von 36
When
Recommendation
Systems Go Bad
Evan Estola
RecSys 2016
9/17/16
About Me
Evan Estola
Lead Machine Learning Engineer @ Meetup
evan@meetup.com
@estola
We want a world full of real, local community.
Women’s Veterans Meetup, San Antonio, TX
Why Recs
at Meetup
are Hard
Cold Start
Sparsity
Lies
Schenectady
Data
Science
impacts
lives
Ads you see
Apps you download
Friend’s Activity/Facebook feed
News you’re exposed to
If a product is available
If you can get a ride
Price you pay for things
Admittance into college
Job openings you find/get
If you can get a loan
Recommendation Systems: Collaborative Filtering
Completely Normal
Book Recommendations For
Asimov’s Foundation:
Foundation and Empire
Second Foundation
Prelude to Foundation
Forward the Foundation
Foundation’s Edge
Foundation and Earth
https://en.wikipedia.org/wiki/File:Isaac_Asimov_on_Throne.png
Completely Normal Search Engine Results
Query: Obama birth place
1. Honolulu, HI
2. Wikipedia: Obama birth place
conspiracy theories
3. Birth Certificate:
WhiteHouse.gov
Query: Obama birth certificate fake
1. 10 Facts that show Obama Birth
Certificate is FAKE
2. OBAMA’S LAWYERS ADMIT TO
FAKING BIRTH CERTIFICATE
3. Video: Proof Obama Birth
Certificate is Fake
You just wanted a
kitchen scale, now the
internet thinks you’re
a drug dealer
You purchased: Mini digital pocket kitchen scale!
You probably want:
100 pack subtle resealable baggies
250 perfectly legal ‘cigarette’ paper booklets
Totally reasonable number of small plastic bags
1000 ‘cigar’ wraps
Completely normal product results
https://commons.wikimedia.org/wiki/File:Cigar
ette_rolling_papers_%287%29.JPG
Orbitz
https://en.wikipedia.org/wiki/File:CitigroupCenterChicago.jpg
Ego
Member/customer/user first
Focus on building the best product,
not on being the most clever data
scientist
Much harder to spin a positive user
story than a story about how smart
you are
“Google searches
involving black-
sounding names
are more likely to
serve up ads
suggestive of a
criminal record”
“Black-sounding” names 25% more
likely to be served ad suggesting
criminal record
“NAME arrested?” Ads suggest queried
name is associated with an arrest
and warrants a background check
Ads for services related to recovering
from arrest/incarceration
https://www.technologyreview.com/s/510646/racism-is-poisoning-online-ad-delivery-says-harvard-professor/
Ethics
We have accepted that Machine Learning
can seem creepy, how do we prevent it
from becoming immoral?
We have an ethical obligation to not
teach machines to be prejudiced.
Data
Ethics
Awareness
Talk about it!
Identify groups that could be negatively
impacted by your work
Make a choice
Take a stand
Interpretable
Models
For simple problems, simple solutions
are often worth a small concession
in performance
Inspectable models make it easier to
debug problems in data collection,
feature engineering etc.
Only include features that work the
way you want
Don’t include feature interactions that
you don’t want
Logistic Regression
StraightDistanceFeature(-0.0311f),
ChapterZipScore(0.0250f),
RsvpCountFeature(0.0207f),
AgeUnmatchFeature(-1.5876f),
GenderUnmatchFeature(-3.0459f),
StateMatchFeature(0.4931f),
CountryMatchFeature(0.5735f),
FacebookFriendsFeature(1.9617f),
SecondDegreeFacebookFriendsFeature(0.1594f),
ApproxAgeUnmatchFeature(-0.2986f),
SensitiveUnmatchFeature(-0.1937f),
KeywordTopicScoreFeatureNoSuppressed(4.2432f),
TopicScoreBucketFeatureNoSuppressed(1.4469f,0.257f,10f),
TopicScoreBucketFeatureSuppressed(0.2595f,0.099f,10f),
ExtendedTopicsBucketFeatureNoSuppressed(1.6203f,1.091f,10f),
ChapterRelatedTopicsBucketFeatureNoSuppressed(0.1702f,0.252f,0.641f),
ChapterRelatedTopicsBucketFeatureNoSuppressed(0.4983f,0.641f,10f),
DoneChapterTopicsFeatureNoSuppressed(3.3367f)
Feature Engineering and Interactions
● Good Feature:
○ Join! You’re interested in Tech x Meetup is about Tech
● Good Feature:
○ Don’t join! Group is intended only for Women x You are a Man
● Bad Feature:
○ Don’t join! Group is mostly Men x You are a Woman
● Horrible Feature:
○ Don’t join! Meetup is about Tech x You are a Woman
Meetup is not interested in propagating gender stereotypes
Ensemble
Models and
Data
segregation
Ensemble Models: Combine outputs of
several classifiers for increased accuracy
If you have features that are useful but
you’re worried about interaction (and
your model does it automatically) use
ensemble modeling to restrict the
features to separate models.
Ensemble Model, Data Segregation
Data:
*Interests
Searches
Friends
Location
Data:
*Gender
Friends
Location
Data:
Model1 Prediction
Model2 Prediction
Model1 Prediction
Model2 Prediction
Final Prediction
https://commons.wikimedia.org/wiki/File:Animation2.gif
“Women less
likely to be
shown ads for
high-paid jobs
on Google,
study shows”
Carnegie Mellon ‘AdFisher’ project
Fake profiles, track ads
Career coaching for “200k+” Executive
jobs Ad
Male group: 1852 impressions
Female group: 318
https://www.theguardian.com/technology/2015/jul/08/women-less-likely-ads-high-paid-jobs-google-study
Diversity Controlled Testing
Same technique can work to find bias in your own models!
Generate Test Data
Randomize sensitive feature in real data set
Run Model
Evaluate for unacceptable biased treatment
What about automating this?
Fair Test algorithm - Florian Tramèr
Still needs you to decide what features are bad
Humanity required
“‘Holy F**K’:
When Facial
Recognition
Algorithms Go
Wrong”
Google Photos Service
Automatic image tagging
Tagged African American couple as
“gorillas”
http://www.fastcompany.com/3048093/fast-feed/holy-fk-when-facial-recognition-algorithms-go-wrong
● Twitter bot
● “Garbage in,
garbage out”
● Responsibility?
“In the span of 15 hours Tay referred to feminism as a
"cult" and a "cancer," as well as noting "gender equality
= feminism" and "i love feminism now." Tweeting
"Bruce Jenner" at the bot got similar mixed response,
ranging from "caitlyn jenner is a hero & is a stunning,
beautiful woman!" to the transphobic "caitlyn jenner
isn't a real woman yet she won woman of the year?"”
Tay.ai
Twitter taught Microsoft’s AI chatbot to
be a racist asshole in less than a day
http://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist
Diverse
test data
Outliers can matter
The real world is messy
Some people will mess with you
Some people look/act different than
you
Defense
Diversity
Design
“ There’s
software used
across the
country to
predict future
criminals. And
it’s biased
against blacks.”
Algorithm for predicting repeat
offenders used in how harsh the
sentence for a crime should be
Proprietary model, undisclosed
algorithm, features etc.
Claims to not use race as a factor
Nearly twice as likely to falsely label
black defendants as likely future
criminals
More likely to mis-label whites as low
risk
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
You know racist computers are a
bad idea
Don’t let your company invent
racist computers
@estola

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

syllabus
syllabussyllabus
syllabus
 
MLSEV Virtual. Predictions
MLSEV Virtual. PredictionsMLSEV Virtual. Predictions
MLSEV Virtual. Predictions
 
MLSEV Virtual. My first BigML Project
MLSEV Virtual. My first BigML ProjectMLSEV Virtual. My first BigML Project
MLSEV Virtual. My first BigML Project
 
Buttons on forms and surveys: a look at some research 2012
Buttons on forms and surveys: a look at some research 2012Buttons on forms and surveys: a look at some research 2012
Buttons on forms and surveys: a look at some research 2012
 
Rookie Mistakes and Resources AARP TUG
Rookie Mistakes and Resources AARP TUGRookie Mistakes and Resources AARP TUG
Rookie Mistakes and Resources AARP TUG
 
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with AutoencodersUSFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
USFD at SemEval-2016 - Stance Detection on Twitter with Autoencoders
 
Adriana Moscatelli - Robot Games for Girls
Adriana Moscatelli - Robot Games for GirlsAdriana Moscatelli - Robot Games for Girls
Adriana Moscatelli - Robot Games for Girls
 

Andere mochten auch

Presentasjon om biler2
Presentasjon om biler2Presentasjon om biler2
Presentasjon om biler2
Abdelhay1961
 
GWC2013 - Javier Molina - The Platform of Fun
GWC2013 - Javier Molina - The Platform of FunGWC2013 - Javier Molina - The Platform of Fun
GWC2013 - Javier Molina - The Platform of Fun
gamificationworldcongress
 

Andere mochten auch (20)

Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
[SEN#7] Le Top 100 des entreprises qui recrutent dans le numérique
[SEN#7] Le Top 100 des entreprises qui recrutent dans le numérique[SEN#7] Le Top 100 des entreprises qui recrutent dans le numérique
[SEN#7] Le Top 100 des entreprises qui recrutent dans le numérique
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
Time series effects for TV recommendations
Time series effects for TV recommendationsTime series effects for TV recommendations
Time series effects for TV recommendations
 
Characterization cabletv (manuel garcia)
Characterization cabletv (manuel garcia)Characterization cabletv (manuel garcia)
Characterization cabletv (manuel garcia)
 
Datafying public sphere : fragmented audience , ùmedia and democracy
Datafying public sphere : fragmented audience , ùmedia and democracyDatafying public sphere : fragmented audience , ùmedia and democracy
Datafying public sphere : fragmented audience , ùmedia and democracy
 
Balancing Discovery and continuation in recommendation (hossein taghavi netflix)
Balancing Discovery and continuation in recommendation (hossein taghavi netflix)Balancing Discovery and continuation in recommendation (hossein taghavi netflix)
Balancing Discovery and continuation in recommendation (hossein taghavi netflix)
 
Wrap Up EBU Big Data and Society conference at RTBF - Day 2 (13 december 2016)
Wrap Up EBU Big Data and Society conference at RTBF - Day 2 (13 december 2016)Wrap Up EBU Big Data and Society conference at RTBF - Day 2 (13 december 2016)
Wrap Up EBU Big Data and Society conference at RTBF - Day 2 (13 december 2016)
 
Presentasjon om biler2
Presentasjon om biler2Presentasjon om biler2
Presentasjon om biler2
 
Astrologia
AstrologiaAstrologia
Astrologia
 
Brugen af Twitter i Danmark. Twittercensus 2014.
Brugen af Twitter i Danmark. Twittercensus 2014.Brugen af Twitter i Danmark. Twittercensus 2014.
Brugen af Twitter i Danmark. Twittercensus 2014.
 
Peril lhackers
Peril lhackersPeril lhackers
Peril lhackers
 
MUN U.K.
MUN U.K.MUN U.K.
MUN U.K.
 
GWC2013 - Javier Molina - The Platform of Fun
GWC2013 - Javier Molina - The Platform of FunGWC2013 - Javier Molina - The Platform of Fun
GWC2013 - Javier Molina - The Platform of Fun
 
GWC14: Andrzej Marczewski - "User types & player types in gamification"
GWC14: Andrzej Marczewski - "User types & player types in gamification"GWC14: Andrzej Marczewski - "User types & player types in gamification"
GWC14: Andrzej Marczewski - "User types & player types in gamification"
 
Iowa Business Advantages
Iowa Business AdvantagesIowa Business Advantages
Iowa Business Advantages
 
M- Learninga hezkuntzan: LABORATORIO FUNDACIÓN TELEFÓNICA
M- Learninga hezkuntzan: LABORATORIO FUNDACIÓN TELEFÓNICAM- Learninga hezkuntzan: LABORATORIO FUNDACIÓN TELEFÓNICA
M- Learninga hezkuntzan: LABORATORIO FUNDACIÓN TELEFÓNICA
 
Telecom sector-in-india
Telecom sector-in-indiaTelecom sector-in-india
Telecom sector-in-india
 
GWC14: Bastian Kneissl - Location based engagement: the star gate for interac...
GWC14: Bastian Kneissl - Location based engagement: the star gate for interac...GWC14: Bastian Kneissl - Location based engagement: the star gate for interac...
GWC14: Bastian Kneissl - Location based engagement: the star gate for interac...
 

Ähnlich wie 9 17-16 - when recommendation systems go bad - rec sys

Artificial Intelligence and Ethics from MS It seems.docx
Artificial Intelligence and Ethics from MS It seems.docxArtificial Intelligence and Ethics from MS It seems.docx
Artificial Intelligence and Ethics from MS It seems.docx
4934bk
 

Ähnlich wie 9 17-16 - when recommendation systems go bad - rec sys (20)

When recommendation go bad
When recommendation go badWhen recommendation go bad
When recommendation go bad
 
Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?Algorithmic Bias : What is it? Why should we care? What can we do about it?
Algorithmic Bias : What is it? Why should we care? What can we do about it?
 
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
 
Estola 5 20-16 ml_conf - when recommendation systems go bad
Estola   5 20-16 ml_conf - when recommendation systems go badEstola   5 20-16 ml_conf - when recommendation systems go bad
Estola 5 20-16 ml_conf - when recommendation systems go bad
 
Designing Against a Data Dystopia
Designing Against a Data DystopiaDesigning Against a Data Dystopia
Designing Against a Data Dystopia
 
Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it? Algorithmic Bias - What is it? Why should we care? What can we do about it?
Algorithmic Bias - What is it? Why should we care? What can we do about it?
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
 
The Ethics of AI
The Ethics of AIThe Ethics of AI
The Ethics of AI
 
Short Essay On Spirit Of Success
Short Essay On Spirit Of SuccessShort Essay On Spirit Of Success
Short Essay On Spirit Of Success
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 
Eliminating Machine Bias - Mary Ann Brennan - ML4ALL 2018
Eliminating Machine Bias - Mary Ann Brennan - ML4ALL 2018Eliminating Machine Bias - Mary Ann Brennan - ML4ALL 2018
Eliminating Machine Bias - Mary Ann Brennan - ML4ALL 2018
 
AI Fails: Avoiding bias in your systems
AI Fails: Avoiding bias in your systemsAI Fails: Avoiding bias in your systems
AI Fails: Avoiding bias in your systems
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Trying Not to Filter: Internet Filtering Technologies in Libraries
Trying Not to Filter: Internet Filtering Technologies in LibrariesTrying Not to Filter: Internet Filtering Technologies in Libraries
Trying Not to Filter: Internet Filtering Technologies in Libraries
 
IE_expressyourself_EssayH
IE_expressyourself_EssayHIE_expressyourself_EssayH
IE_expressyourself_EssayH
 
Ethical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systemsEthical Dilemmas in AI/ML-based systems
Ethical Dilemmas in AI/ML-based systems
 
Essay On Facebook In Hindi. Online assignment writing service.
Essay On Facebook In Hindi. Online assignment writing service.Essay On Facebook In Hindi. Online assignment writing service.
Essay On Facebook In Hindi. Online assignment writing service.
 
Artificial Intelligence and Ethics from MS It seems.docx
Artificial Intelligence and Ethics from MS It seems.docxArtificial Intelligence and Ethics from MS It seems.docx
Artificial Intelligence and Ethics from MS It seems.docx
 
How To Cite A Website Within An Essay Apa
How To Cite A Website Within An Essay ApaHow To Cite A Website Within An Essay Apa
How To Cite A Website Within An Essay Apa
 
Lynne Parker - Engaging Women in Robotics
Lynne Parker - Engaging Women in RoboticsLynne Parker - Engaging Women in Robotics
Lynne Parker - Engaging Women in Robotics
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

9 17-16 - when recommendation systems go bad - rec sys