SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
"You can't just turn the crank"
Machine learning for fighting abuse on the consumer web
David Freeman
Research Scientist/Engineer, Facebook Inc.
ScAINet 2018
Atlanta, GA USA, 11 May 2018
The consumer web
What do they try to do?
Malware
Payment

Fraud
Scraping
Click
Fraud
Phishing Spam Social

Engineering
Fake
Products
Scams
"Like"
FraudPromotion
Fraud
Identity
Theft
What do we see?
Fake
Reviews
Misinfor-
mation
Financial
Theft
Account
Resale
Fundamental question: Which requests are bad?
• Perfect for machine learning!
What could possibly go wrong?
Machine learning workflow
Label
Train
Validate
Launch
Measure
Profit!Lots!
How do we obtain labeled data?
(hint: not from your users)
Machine learning workflow
Label
• Human labeling of random samples.
• Labelers don't always know what they're looking for
• Labelers are inconsistent (with themselves and each other)
• Labelers get tired (esp. if most samples are good)
• Apply crowdsourcing best practices:
• Precise definitions, multiple labeling, ML-assisted sampling
• But will it scale?
Labeling: Gold standard
Objective measurement
• Find high-precision signals of badness
• Examples: unusual user-agent, malformed header
• DO NOT BLOCK ON THESE SIGNALS
• They are controlled by the adversary
• When the adversary adapts you will lose visibility
• Automatically generate signals using anomaly detection.
Labeling: Silver standard
Automatic labeling
• Use whatever you have!
• CS data, rules, other models

• Mitigate risks of blindness and feedback loops:
• Oversample manually labeled examples
• Oversample false positives and false negatives when retraining.
• Undersample positive examples from previous iterations of this model.
• Sample and label examples near the decision boundary
Labeling: Bronze standard
Be scrappy
• Users are terrible at
reporting.
• Product flows bias
reporting.
• Reports can be gamed.
• Reports can serve as an
directional measure.
Labeling: Iron standard
Have your users do the work
• Segment the problem
• e.g. status with link from country X
• Downsample intelligently
• if your distribution is lumpy, sample from all the lumps
• Learning the prior vs. focusing on the bad stuff
• no golden rule here -- you have to experiment
Assembling a training set
Labeling is just the beginning
{Training set 2
{
Model v2
Refreshing your data
Don't forget the past!
{Training set 1 {
Model v1
Mitigation:
• Keep old attacks around (exponential decay?)
• Keep old models around (raise thresholds?)
{
Training set N
{
Model vN
How do you know your model is ready to go?
Machine learning workflow
Train
Validate
• Labels aren't perfect
• Often miss on recall

• Models interact with each other
• Use offline P-R and ROC to stack-rank model candidates
Validating Performance
Don't trust offline replay Model B
FP
Model A
• Fundamental A/B testing assumption:

Experiment effects are independent of the cohorts chosen


The Perils of A/B Testing
The Perils of A/B Testing
A B
X
• Looks good so far....
Start with a small experiment
The Perils of A/B Testing
A B
X
• Did the adversary give up or iterate?
Roll it out to (almost) everyone — Option 1
The Perils of A/B Testing
A B
• Now your experiment is a vulnerability
Roll it out to (almost) everyone — Option 2
• Run new model online in "log-only" mode
• Evaluate performance where the new
model disagrees with the old one.
• ideally via sampling & labeling
• Push based on FP/FN tradeoff
Using Shadow Mode
Prod model
FP
New model
How do you figure out if it worked?
Machine learning workflow
Launch
Measure
True Positives Don't Matter
What's happening here?
Time
Precision
• Really want # of good users affected
• Solution: use one minus specificity (aka FPR)
True Positives Don't Matter
What's happening here?
Time
TP
Time
FP vs.
Time
FP
Time
TP
1
TN
FP + TN<latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit>
Not so fast!
Machine learning workflow
Profit!Adapt!
What not to Do (I)
Show the adversary what your limits are
Message 500 people
Message 400 people
Message 300 people
🛑
🛑
✅
• Introduce delay in blocking
response (and/or)

• Undo the damage without
telling the user.
What to do (I)
Don't give immediate feedback
"We don't want to be the ones solving the CAPTCHAs"
What not to Do (II)
Look for specific content to block
What to Do (II)
Focus on bad behavior, not only bad content
What to Do (III)
Use data the adversary doesn't know/control
Scoring at Entry Points
prevent access to accounts
Clustering, Anomaly Detection
prevent accounts from doing damage
User Reporting
find false negatives
Behavioral Analysis
detect bad activityIncreasing
speed
More
information
available
What to Do (IV)
Defense in depth
• Think about each step of the ML process.
• It's hard to build a good training set.
• Adversarial adaptation breaks many assumptions.
• Control the data & the response.
Take aways
Thanks to: Hervé Robert, Isaac Fullinwider, Henry Lu, Sagar Patel, Hongyang Li, Nektarios Leontiadis

Weitere ähnliche Inhalte

Ähnlich wie "You can't just turn the crank": Machine learning for fighting abuse on the consumer web

Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
Ajit Ghodke
 
An Overview of automated testing (1)
An Overview of automated testing (1)An Overview of automated testing (1)
An Overview of automated testing (1)
Rodrigo Lopes
 

Ähnlich wie "You can't just turn the crank": Machine learning for fighting abuse on the consumer web (20)

Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakes
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
An Overview of automated testing (1)
An Overview of automated testing (1)An Overview of automated testing (1)
An Overview of automated testing (1)
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
 
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
 
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris..."A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
 
Machine Learning 101 for Product Managers by Amazon Sr PM
Machine Learning 101 for Product Managers by Amazon Sr PMMachine Learning 101 for Product Managers by Amazon Sr PM
Machine Learning 101 for Product Managers by Amazon Sr PM
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine Learning
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

"You can't just turn the crank": Machine learning for fighting abuse on the consumer web

  • 1. "You can't just turn the crank" Machine learning for fighting abuse on the consumer web David Freeman Research Scientist/Engineer, Facebook Inc. ScAINet 2018 Atlanta, GA USA, 11 May 2018
  • 3. What do they try to do? Malware Payment
 Fraud Scraping Click Fraud Phishing Spam Social
 Engineering Fake Products Scams "Like" FraudPromotion Fraud Identity Theft What do we see? Fake Reviews Misinfor- mation Financial Theft Account Resale Fundamental question: Which requests are bad? • Perfect for machine learning!
  • 4. What could possibly go wrong? Machine learning workflow Label Train Validate Launch Measure Profit!Lots!
  • 5. How do we obtain labeled data? (hint: not from your users) Machine learning workflow Label
  • 6. • Human labeling of random samples. • Labelers don't always know what they're looking for • Labelers are inconsistent (with themselves and each other) • Labelers get tired (esp. if most samples are good) • Apply crowdsourcing best practices: • Precise definitions, multiple labeling, ML-assisted sampling • But will it scale? Labeling: Gold standard Objective measurement
  • 7. • Find high-precision signals of badness • Examples: unusual user-agent, malformed header • DO NOT BLOCK ON THESE SIGNALS • They are controlled by the adversary • When the adversary adapts you will lose visibility • Automatically generate signals using anomaly detection. Labeling: Silver standard Automatic labeling
  • 8. • Use whatever you have! • CS data, rules, other models
 • Mitigate risks of blindness and feedback loops: • Oversample manually labeled examples • Oversample false positives and false negatives when retraining. • Undersample positive examples from previous iterations of this model. • Sample and label examples near the decision boundary Labeling: Bronze standard Be scrappy
  • 9. • Users are terrible at reporting. • Product flows bias reporting. • Reports can be gamed. • Reports can serve as an directional measure. Labeling: Iron standard Have your users do the work
  • 10. • Segment the problem • e.g. status with link from country X • Downsample intelligently • if your distribution is lumpy, sample from all the lumps • Learning the prior vs. focusing on the bad stuff • no golden rule here -- you have to experiment Assembling a training set Labeling is just the beginning
  • 11. {Training set 2 { Model v2 Refreshing your data Don't forget the past! {Training set 1 { Model v1 Mitigation: • Keep old attacks around (exponential decay?) • Keep old models around (raise thresholds?) { Training set N { Model vN
  • 12. How do you know your model is ready to go? Machine learning workflow Train Validate
  • 13. • Labels aren't perfect • Often miss on recall
 • Models interact with each other • Use offline P-R and ROC to stack-rank model candidates Validating Performance Don't trust offline replay Model B FP Model A
  • 14. • Fundamental A/B testing assumption:
 Experiment effects are independent of the cohorts chosen 
 The Perils of A/B Testing
  • 15. The Perils of A/B Testing A B X • Looks good so far.... Start with a small experiment
  • 16. The Perils of A/B Testing A B X • Did the adversary give up or iterate? Roll it out to (almost) everyone — Option 1
  • 17. The Perils of A/B Testing A B • Now your experiment is a vulnerability Roll it out to (almost) everyone — Option 2
  • 18. • Run new model online in "log-only" mode • Evaluate performance where the new model disagrees with the old one. • ideally via sampling & labeling • Push based on FP/FN tradeoff Using Shadow Mode Prod model FP New model
  • 19. How do you figure out if it worked? Machine learning workflow Launch Measure
  • 20. True Positives Don't Matter What's happening here? Time Precision
  • 21. • Really want # of good users affected • Solution: use one minus specificity (aka FPR) True Positives Don't Matter What's happening here? Time TP Time FP vs. Time FP Time TP 1 TN FP + TN<latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit>
  • 22. Not so fast! Machine learning workflow Profit!Adapt!
  • 23. What not to Do (I) Show the adversary what your limits are Message 500 people Message 400 people Message 300 people 🛑 🛑 ✅
  • 24. • Introduce delay in blocking response (and/or)
 • Undo the damage without telling the user. What to do (I) Don't give immediate feedback
  • 25. "We don't want to be the ones solving the CAPTCHAs" What not to Do (II) Look for specific content to block
  • 26. What to Do (II) Focus on bad behavior, not only bad content
  • 27. What to Do (III) Use data the adversary doesn't know/control
  • 28. Scoring at Entry Points prevent access to accounts Clustering, Anomaly Detection prevent accounts from doing damage User Reporting find false negatives Behavioral Analysis detect bad activityIncreasing speed More information available What to Do (IV) Defense in depth
  • 29. • Think about each step of the ML process. • It's hard to build a good training set. • Adversarial adaptation breaks many assumptions. • Control the data & the response. Take aways Thanks to: Hervé Robert, Isaac Fullinwider, Henry Lu, Sagar Patel, Hongyang Li, Nektarios Leontiadis