Quora ML Workshop: Content Moderation & Machine Learning

•

8 gefällt mir•614 views

Presentation by Alana Glassco, anti-abuse engineer at Smyte, at Quora ML Workshop: Protecting Online Spaces with Applied Machine Learning, on September 27, 2017.

Technologie

Be Nice, Be Respectful:
Protecting Online Spaces with Applied
Machine Learning

Content Moderation &
Machine Learning
Common Pitfalls & How to Avoid Them
Alana Glassco
Anti-abuse Engineer at Smyte
Alana@smyte.com

Content Policies
● Context is key
● Not black & white
● Designed for humans, not machines

Understand the problem
● Business goals
● Nature of the problem
● Is ML a good fit?

For example...
● Business goals
○ Enforce company values
○ Gain good press
● Nature of the problem
○ Short-term
○ High FP cost
● Is ML a good fit?
○ No
● Business goals
○ Reduce bad press
○ Recover advertising loss
● Nature of the problem
○ Long-term
○ High FN cost
● Is ML a good fit?
○ Yes

Get the right training data
● Understand policies in practice
● “Free” data won’t cut it
● Invest in a human review team

Example: building a “spam” classifier
Repetitive
content
Keyword
stuffing
Artificial traffic Scams /
phishing
Behavioral
signals
Bots / fake
accounts
Real users
Bots / fake
accounts
Bots or real
users
Optics
Looks fine in
isolation
Easy to
identify
Invisible w/o
account
signals
Looks bad to
a trained
reviewer
Severity Harms
reputation
Harms search
results
Harms
ranking
Harms users

Design a solution
● Model selection
● Implementation
● Maintenance & retraining

Empfohlen

Quora ML Workshop: Sock Puppets and Hoaxes on the WebQuora

Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora

Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...Quora

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Empfohlen

Quora ML Workshop: Sock Puppets and Hoaxes on the WebQuora

Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora

Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, ...Quora

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Key Features Of Token Development (1).pptxLBM Solutions

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

How to Remove Document Management Hurdles with X-Docs?XfilesPro

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Install Stable Diffusion in windows machinePadma Pradeep

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Scaling API-first – The story of a global engineering organizationRadu Cotescu

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Key Features Of Token Development (1).pptxLBM Solutions

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

How to Remove Document Management Hurdles with X-Docs?XfilesPro

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Install Stable Diffusion in windows machinePadma Pradeep

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Scaling API-first – The story of a global engineering organizationRadu Cotescu

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Kürzlich hochgeladen (20)

Key Features Of Token Development (1).pptx

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Handwritten Text Recognition for manuscripts and early printed texts

GenCyber Cyber Security Day Presentation

Salesforce Community Group Quito, Salesforce 101

How to Remove Document Management Hurdles with X-Docs?

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Unblocking The Main Thread Solving ANRs and Frozen Frames

Install Stable Diffusion in windows machine

08448380779 Call Girls In Civil Lines Women Seeking Men

Scaling API-first – The story of a global engineering organization

My Hashitalk Indonesia April 2024 Presentation

SQL Database Design For Developers at php[tek] 2024

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Empfohlen

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Empfohlen (20)

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Quora ML Workshop: Content Moderation & Machine Learning

1. Be Nice, Be Respectful: Protecting Online Spaces with Applied Machine Learning

3. Content Moderation & Machine Learning Common Pitfalls & How to Avoid Them Alana Glassco Anti-abuse Engineer at Smyte Alana@smyte.com

4. Content Policies ● Context is key ● Not black & white ● Designed for humans, not machines

5. Content moderation flow

6. Content moderation flow

7. Tips & tricks

8. Understand the problem ● Business goals ● Nature of the problem ● Is ML a good fit?

9. For example... ● Business goals ○ Enforce company values ○ Gain good press ● Nature of the problem ○ Short-term ○ High FP cost ● Is ML a good fit? ○ No ● Business goals ○ Reduce bad press ○ Recover advertising loss ● Nature of the problem ○ Long-term ○ High FN cost ● Is ML a good fit? ○ Yes

10. Get the right training data ● Understand policies in practice ● “Free” data won’t cut it ● Invest in a human review team

11. Example: building a “spam” classifier Repetitive content Keyword stuffing Artificial traffic Scams / phishing Behavioral signals Bots / fake accounts Real users Bots / fake accounts Bots or real users Optics Looks fine in isolation Easy to identify Invisible w/o account signals Looks bad to a trained reviewer Severity Harms reputation Harms search results Harms ranking Harms users

12. Design a solution ● Model selection ● Implementation ● Maintenance & retraining

13. Questions? alana@smyte.com