SlideShare a Scribd company logo
1 of 24
Download to read offline
Beyond Collaborative 
Filtering: ML & 
Recommendations at 
Meetup 
Evan Estola 
Machine Learning Engineer 
Meetup.com 
evan@meetup.com 
@estola
Meetup what are you
Why Meetup data is cool 
● Real people meeting up 
● Every meetup could change someone's life 
● No ads, just do the best thing 
● Oh and >125 million rsvps by 18 million 
members 
● 3 million rsvps in the last 30 days 
○ 1/second
Tools at Meetup 
● Hive - SQL on Hadoop 
● Spark - Distributed Scala on Hadoop cluster 
● Scala - Recommendations service 
● R - Data analysis, Model building 
● Python - Scripting, Data organizing 
● Java - Backend of our web stack
Collaborative Filtering 
● Classic recommendations approach 
● Users who like this also like this
Weaknesses of CF 
● Sparsity 
● Cold Start 
● Coverage 
● Diversity
Why Recs at Meetup are hard 
● Incomplete Data (topics) 
● Cold start 
● Asking user for data is hard 
● Going to meetups is scary 
● Sparsity 
○ Location 
○ Low rsvp/person 
○ Membership: 0.001% 
○ Compare to Netflix Prize Dataset: 1%
Supervised Learning/Classification 
● “Inferring a function from labeled training 
data” 
● Joined Meetup group/Didn’t join Meetup 
group
Preprocessing 
● Schenectady 
● Fake RSVP boosts (+100 guests!) 
● Outliers 
● Bucketing 
● Etc etc
Problem definition and assumptions 
● Assumption: if you’re not in a given group, 
you don’t want to be 
○ Negative samples: groups you’re not in 
○ Also a good classifier... 
● Membership << expected error rate 
○ Solution: sample to 50/50 join/no-join
Ranking 
● Model output label no longer explicitly true 
○ Luckily, we’d rather rank all of the results anyway 
● Use a classifier that gives you a useful 
output 
○ Fancy black box 
○ Logistic Regression 
■ Easier to explain
Meetup what are you
Ensemble Learning 
“... use multiple learning algorithms to obtain 
better predictive performance than could be 
obtained from any of the constituent learning 
algorithms”
Ensemble Learning 
● Topic match (original algorithm) 
● Collaborative Filtering on Topics 
● Social algorithm 
● Other simple features (Popularity, Gender…) 
● Add output of algorithms as features into 
Logistic Regression model
Logistic Regression Output 
● TopicScore 4.14 
● ExtendedTS 0.47 
● RelatedTS 0.66 
● FbFriends 2.02 
● 2ndFbFriends 0.09 
● AgeUnmatch -2.40 
● GenUnmatch -3.37 
● Distance -0.04 
● StateMatch 0.54 
● CountyMatch 0.41 
● ZipScore 0.06 
● RsvpScore 0.02
Facebook Likes 
● Lots of information, but how to use? 
● Map to topics, let training the model take 
care of the rest!
Mapping FB Likes to Meetup Topics 
● Text based? 
○ Go(game) vs Go(lang)? 
○ Burton? 
● Data approach! 
○ Grab most popular topics across all members with 
the same like
Normalization 
● Top topics for Burton-Likers 
○ Meeting New People, Coffee, bla bla 
○ Most popular still dominates 
● Solution: Normalize based on expected topic 
occurrence in sample
Normalization 
● For members with a given Like, compare 
percent with each topic to expected among 
total population 
● Total population 
○ 20% “Meeting New People” 
○ 2% “Snowboarding 
● Burton: 
○ 20% “Meeting New People” 
○ 9% “Snowboarding”
Results 
● Generate top topics for all likes 
○ Path from member to like to topic to group 
● Add Facebook Like based topic match 
feature to model 
● Positive weight 
○ Very good sign! 
● Deploy/Split test 
○ TBD
Summary 
● Supervised Learning for Ranking as 
Recommendations is cool 
● Simple, interpretable models are cool 
● Feature engineering is cool
Thanks! 
Smart people come work with me. 
http://www.meetup.com/jobs/

More Related Content

What's hot

What's hot (9)

Modeling Webinar: Normalization - It's Not Your Friend... or Your Enemy
Modeling Webinar: Normalization - It's Not Your Friend... or Your EnemyModeling Webinar: Normalization - It's Not Your Friend... or Your Enemy
Modeling Webinar: Normalization - It's Not Your Friend... or Your Enemy
 
Discrete Markov Random Field Relaxation
Discrete Markov Random Field RelaxationDiscrete Markov Random Field Relaxation
Discrete Markov Random Field Relaxation
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learning
 
MLSEV Virtual. My first BigML Project
MLSEV Virtual. My first BigML ProjectMLSEV Virtual. My first BigML Project
MLSEV Virtual. My first BigML Project
 
Search, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraSearch, Discovery and Questions at Quora
Search, Discovery and Questions at Quora
 
MLSEV Virtual. Predictions
MLSEV Virtual. PredictionsMLSEV Virtual. Predictions
MLSEV Virtual. Predictions
 
Playing Trivia with a Bot
Playing Trivia with a BotPlaying Trivia with a Bot
Playing Trivia with a Bot
 
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
Alessandro Magnani, Data Scientist, @WalmartLabs at MLconf SF - 11/13/15
 
Barcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons LearnedBarcelona ML Meetup - Lessons Learned
Barcelona ML Meetup - Lessons Learned
 

Viewers also liked

Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
MLconf
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
MLconf
 

Viewers also liked (10)

Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
Ben Hamner, Co-founder and CTO, Kaggle at MLconf SF - 11/13/15
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
Ewa Dominowska, Engineering Manager, Facebook at MLconf SEA - 5/20/16
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
 
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
 
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017 Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 

Similar to Evan Estola – Data Scientist, Meetup.com at MLconf ATL

Similar to Evan Estola – Data Scientist, Meetup.com at MLconf ATL (20)

Estola meetup big_datacampla_6_14_evan_estola
Estola meetup big_datacampla_6_14_evan_estolaEstola meetup big_datacampla_6_14_evan_estola
Estola meetup big_datacampla_6_14_evan_estola
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Getting a Data Science Job
Getting a Data Science JobGetting a Data Science Job
Getting a Data Science Job
 
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteitMeetup 18/10/2018 - Artificiële intelligentie en mobiliteit
Meetup 18/10/2018 - Artificiële intelligentie en mobiliteit
 
Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Machine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup EventMachine Learning Product Managers Meetup Event
Machine Learning Product Managers Meetup Event
 
CP vs Project - Elevate Ep. 02.pdf
CP vs Project  - Elevate Ep. 02.pdfCP vs Project  - Elevate Ep. 02.pdf
CP vs Project - Elevate Ep. 02.pdf
 
Customer segmentation scbcn17
Customer segmentation scbcn17Customer segmentation scbcn17
Customer segmentation scbcn17
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
 
Software Architecture & Design - Our Meetup Group
Software Architecture & Design - Our Meetup GroupSoftware Architecture & Design - Our Meetup Group
Software Architecture & Design - Our Meetup Group
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Cepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxCepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptx
 
Beat the Benchmark.
Beat the Benchmark.Beat the Benchmark.
Beat the Benchmark.
 
Beat the Benchmark.
Beat the Benchmark.Beat the Benchmark.
Beat the Benchmark.
 
Scaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine LearningScaling Quality on Quora Using Machine Learning
Scaling Quality on Quora Using Machine Learning
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Your first 5 PHP design patterns - ThatConference 2012
Your first 5 PHP design patterns - ThatConference 2012Your first 5 PHP design patterns - ThatConference 2012
Your first 5 PHP design patterns - ThatConference 2012
 
Growing up new PostgreSQL developers (pgcon.org 2018)
Growing up new PostgreSQL developers (pgcon.org 2018)Growing up new PostgreSQL developers (pgcon.org 2018)
Growing up new PostgreSQL developers (pgcon.org 2018)
 
Paris ML meetup
Paris ML meetupParis ML meetup
Paris ML meetup
 

More from MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Evan Estola – Data Scientist, Meetup.com at MLconf ATL

  • 1. Beyond Collaborative Filtering: ML & Recommendations at Meetup Evan Estola Machine Learning Engineer Meetup.com evan@meetup.com @estola
  • 3. Why Meetup data is cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and >125 million rsvps by 18 million members ● 3 million rsvps in the last 30 days ○ 1/second
  • 4.
  • 5. Tools at Meetup ● Hive - SQL on Hadoop ● Spark - Distributed Scala on Hadoop cluster ● Scala - Recommendations service ● R - Data analysis, Model building ● Python - Scripting, Data organizing ● Java - Backend of our web stack
  • 6. Collaborative Filtering ● Classic recommendations approach ● Users who like this also like this
  • 7. Weaknesses of CF ● Sparsity ● Cold Start ● Coverage ● Diversity
  • 8.
  • 9. Why Recs at Meetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Low rsvp/person ○ Membership: 0.001% ○ Compare to Netflix Prize Dataset: 1%
  • 10. Supervised Learning/Classification ● “Inferring a function from labeled training data” ● Joined Meetup group/Didn’t join Meetup group
  • 11. Preprocessing ● Schenectady ● Fake RSVP boosts (+100 guests!) ● Outliers ● Bucketing ● Etc etc
  • 12. Problem definition and assumptions ● Assumption: if you’re not in a given group, you don’t want to be ○ Negative samples: groups you’re not in ○ Also a good classifier... ● Membership << expected error rate ○ Solution: sample to 50/50 join/no-join
  • 13. Ranking ● Model output label no longer explicitly true ○ Luckily, we’d rather rank all of the results anyway ● Use a classifier that gives you a useful output ○ Fancy black box ○ Logistic Regression ■ Easier to explain
  • 15. Ensemble Learning “... use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms”
  • 16. Ensemble Learning ● Topic match (original algorithm) ● Collaborative Filtering on Topics ● Social algorithm ● Other simple features (Popularity, Gender…) ● Add output of algorithms as features into Logistic Regression model
  • 17. Logistic Regression Output ● TopicScore 4.14 ● ExtendedTS 0.47 ● RelatedTS 0.66 ● FbFriends 2.02 ● 2ndFbFriends 0.09 ● AgeUnmatch -2.40 ● GenUnmatch -3.37 ● Distance -0.04 ● StateMatch 0.54 ● CountyMatch 0.41 ● ZipScore 0.06 ● RsvpScore 0.02
  • 18. Facebook Likes ● Lots of information, but how to use? ● Map to topics, let training the model take care of the rest!
  • 19. Mapping FB Likes to Meetup Topics ● Text based? ○ Go(game) vs Go(lang)? ○ Burton? ● Data approach! ○ Grab most popular topics across all members with the same like
  • 20. Normalization ● Top topics for Burton-Likers ○ Meeting New People, Coffee, bla bla ○ Most popular still dominates ● Solution: Normalize based on expected topic occurrence in sample
  • 21. Normalization ● For members with a given Like, compare percent with each topic to expected among total population ● Total population ○ 20% “Meeting New People” ○ 2% “Snowboarding ● Burton: ○ 20% “Meeting New People” ○ 9% “Snowboarding”
  • 22. Results ● Generate top topics for all likes ○ Path from member to like to topic to group ● Add Facebook Like based topic match feature to model ● Positive weight ○ Very good sign! ● Deploy/Split test ○ TBD
  • 23. Summary ● Supervised Learning for Ranking as Recommendations is cool ● Simple, interpretable models are cool ● Feature engineering is cool
  • 24. Thanks! Smart people come work with me. http://www.meetup.com/jobs/