SlideShare a Scribd company logo
1 of 30
Download to read offline
Machine Learning and
Data at Meetup
Evan Estola
Meetup.com
evan@meetup.com
@estola
My Background
● Software Engineer/Data Scientist
● Machine learning team
● At Meetup since May 2012
● BS Computer Science
○ Information Retrieval
○ Data Mining
○ Math
■ Linear Algebra
■ Graph Theory
You
● Data Scientists?
● Engineers?
● Statisticians?
● Students?
● Non-technical?
What this talk is
● Super secret peek into Meetup!
● Meetup recommendations examples
● How we do recommendations
(model/features)
● Lessons learned/what’s next
What this talk isn’t
● What is a data scientist?
● What is big data?
● How does matrix factorization or gradient
boosted decision trees or map reduce or this
framework I hope you’ll use work?
Why Meetup data is cool
● Real people meeting up
● Every meetup could change someone's life
● No ads, just do the best thing
● Oh and 114 million rsvps by >14 million
members
● 2.7 million rsvps in the last 30 days
○ ~1/second
Data at Meetup
● User data
● Site monitoring/performance
● AB testing
● Recommendations*
“Everything is a recommendation”
● Not my phrase
● Not actually true yet
● Working on it
Recommendation
Topic Recommendations
● New registrant
● Don’t know anything about you yet!
● Most popular is boring/repetitive
Algorithm:
○ Group local meetups by topic
○ Select topic with most groups
○ Remove those groups
○ Repeat
Group/Event Recommendations
● Replaced a topic only system
● Inputs:
○ Member, location, topics, facebook friends?
demographics?
● Outputs:
○ Ranking
Collaborative Filtering
● Classic recommendations approach
● Users who like this also like this
Why Recs at Meetup are hard
● Incomplete Data (topics)
● Cold start
● Asking user for data is hard
● Going to meetups is scary
● Sparsity
○ Location
○ Groups/person
○ Membership: 0.001%
○ Compare to Netflix: 1%
Supervised Learning/Classification
● “Inferring a function from labeled training
data”
● Joined Meetup/Didn’t join Meetup
● “Features”
Topic Match
State Match
Logistic Regression
● Score
○ “Probability”
○ Ranking
● Fast + Easy
● Weights!
Group recommendation weights
● TopicMatch 1.21
● TopicMatchExtended 0.17
● FacebookFriends 0.15
● SecondDegreeFacebook 0.79
● AgeUnmatch -2.20
● GenderUnmatch -2.6
● StateMatchFeature 0.44
● CityMatch 0.02
● DistanceBucket <2 1.39
● DistanceBucket 2-5 0.83
● DistanceBucket 5-10 0.60
● DistanceBucket >10 n/a
Making up features
● “Zipscore”
● All topics not created equal
● Facebook likes
Real data is gross
● Preprocessing is critical!
○ missing data
○ outliers
○ log scale
○ bucketing
○ selection/sampling (not introducing bias)
Cleaning data
● Schenectady
● Beverly Hills
● Astronaut
● Fake RSVP boosts (+100 guests!)
● Rsvp hogs
TO THE FUTURE!
● Hadoop
● Clicks
● Impressions
● People to people recommendations?
● Recommending people to groups?
Thanks!
Smart people come work with me.
http://www.meetup.com/jobs/
Special thanks:
● Chris Halpert
● Victor J Wang

More Related Content

Viewers also liked

GWC13 - Javier Borderías - BBVA - BBVA Game
GWC13 - Javier Borderías - BBVA - BBVA GameGWC13 - Javier Borderías - BBVA - BBVA Game
GWC13 - Javier Borderías - BBVA - BBVA Game
gamificationworldcongress
 
Sos besu forum_v4
Sos besu forum_v4Sos besu forum_v4
Sos besu forum_v4
rajarshir
 
Cfsa maximizing small spaces 3of 3
Cfsa maximizing small spaces 3of 3Cfsa maximizing small spaces 3of 3
Cfsa maximizing small spaces 3of 3
ffbroadwell
 
Mesi kas ainult maiustus
Mesi   kas ainult maiustusMesi   kas ainult maiustus
Mesi kas ainult maiustus
Elis Sarapuu
 
Lil wayne
Lil wayneLil wayne
Lil wayne
lulyruz
 
Presentasjon om biler2
Presentasjon om biler2Presentasjon om biler2
Presentasjon om biler2
Abdelhay1961
 
Chapter 13 Presentation
Chapter 13 PresentationChapter 13 Presentation
Chapter 13 Presentation
meganmcleod
 

Viewers also liked (20)

GWC14: Michiel van Eunen - "Retail Gamification"
GWC14: Michiel van Eunen - "Retail Gamification"GWC14: Michiel van Eunen - "Retail Gamification"
GWC14: Michiel van Eunen - "Retail Gamification"
 
GWC13 - Javier Borderías - BBVA - BBVA Game
GWC13 - Javier Borderías - BBVA - BBVA GameGWC13 - Javier Borderías - BBVA - BBVA Game
GWC13 - Javier Borderías - BBVA - BBVA Game
 
Sos besu forum_v4
Sos besu forum_v4Sos besu forum_v4
Sos besu forum_v4
 
Group project linux helix
Group project linux helixGroup project linux helix
Group project linux helix
 
TPC CONCEPT Performare echipe manageriale
TPC CONCEPT Performare echipe managerialeTPC CONCEPT Performare echipe manageriale
TPC CONCEPT Performare echipe manageriale
 
GWC14: Nick Pelling - "Gamification: past and present"
GWC14: Nick Pelling - "Gamification: past and present"GWC14: Nick Pelling - "Gamification: past and present"
GWC14: Nick Pelling - "Gamification: past and present"
 
2011 Hamilton County Iowa Laborshed Summary
2011 Hamilton County Iowa Laborshed Summary2011 Hamilton County Iowa Laborshed Summary
2011 Hamilton County Iowa Laborshed Summary
 
Cfsa maximizing small spaces 3of 3
Cfsa maximizing small spaces 3of 3Cfsa maximizing small spaces 3of 3
Cfsa maximizing small spaces 3of 3
 
Program Aplikasi Hasil Penelitian
Program Aplikasi Hasil PenelitianProgram Aplikasi Hasil Penelitian
Program Aplikasi Hasil Penelitian
 
Civil Rights = Labor Unions
Civil Rights = Labor UnionsCivil Rights = Labor Unions
Civil Rights = Labor Unions
 
Variation
VariationVariation
Variation
 
Mesi kas ainult maiustus
Mesi   kas ainult maiustusMesi   kas ainult maiustus
Mesi kas ainult maiustus
 
Marce Flores Exam
Marce Flores ExamMarce Flores Exam
Marce Flores Exam
 
Lil wayne
Lil wayneLil wayne
Lil wayne
 
Astrologia
AstrologiaAstrologia
Astrologia
 
Tugas agama
Tugas agamaTugas agama
Tugas agama
 
Presentasjon om biler2
Presentasjon om biler2Presentasjon om biler2
Presentasjon om biler2
 
„Stykówka” – miasto na wyciągnięcie smartfona
„Stykówka” – miasto na wyciągnięcie smartfona„Stykówka” – miasto na wyciągnięcie smartfona
„Stykówka” – miasto na wyciągnięcie smartfona
 
Chapter 11 presentation
Chapter 11 presentationChapter 11 presentation
Chapter 11 presentation
 
Chapter 13 Presentation
Chapter 13 PresentationChapter 13 Presentation
Chapter 13 Presentation
 

Similar to Machine learning and data at Meetup

Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentation
burnsr
 

Similar to Machine learning and data at Meetup (20)

Estola meetup big_datacampla_6_14_evan_estola
Estola meetup big_datacampla_6_14_evan_estolaEstola meetup big_datacampla_6_14_evan_estola
Estola meetup big_datacampla_6_14_evan_estola
 
Computer Science Career Guidance
Computer Science Career GuidanceComputer Science Career Guidance
Computer Science Career Guidance
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments
 
CV Masterclass
CV MasterclassCV Masterclass
CV Masterclass
 
Research Methods in UX
Research Methods in UXResearch Methods in UX
Research Methods in UX
 
Is IT for me?
Is IT for me?Is IT for me?
Is IT for me?
 
Group Presentation for MGMT-4160
Group Presentation for MGMT-4160Group Presentation for MGMT-4160
Group Presentation for MGMT-4160
 
Final pp
Final ppFinal pp
Final pp
 
Be Part of a Community
Be Part of a CommunityBe Part of a Community
Be Part of a Community
 
Website hub
Website hubWebsite hub
Website hub
 
CP vs Project - Elevate Ep. 02.pdf
CP vs Project  - Elevate Ep. 02.pdfCP vs Project  - Elevate Ep. 02.pdf
CP vs Project - Elevate Ep. 02.pdf
 
Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?
 
Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design
 
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptx
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptxSmall Tasks Make Big Changes - Shmulik Dorinbaum.pptx
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptx
 
4.how to think like a data scientist
4.how to think like a data scientist4.how to think like a data scientist
4.how to think like a data scientist
 
Pivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research Findings
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentation
 
2015 itsa 20 low cost tools v1
2015 itsa 20 low cost tools v12015 itsa 20 low cost tools v1
2015 itsa 20 low cost tools v1
 
Measuring: Promoting Online Education & Certification
 Measuring: Promoting Online Education & Certification  Measuring: Promoting Online Education & Certification
Measuring: Promoting Online Education & Certification
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learning
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Machine learning and data at Meetup

  • 1. Machine Learning and Data at Meetup Evan Estola Meetup.com evan@meetup.com @estola
  • 2. My Background ● Software Engineer/Data Scientist ● Machine learning team ● At Meetup since May 2012 ● BS Computer Science ○ Information Retrieval ○ Data Mining ○ Math ■ Linear Algebra ■ Graph Theory
  • 3. You ● Data Scientists? ● Engineers? ● Statisticians? ● Students? ● Non-technical?
  • 4. What this talk is ● Super secret peek into Meetup! ● Meetup recommendations examples ● How we do recommendations (model/features) ● Lessons learned/what’s next
  • 5. What this talk isn’t ● What is a data scientist? ● What is big data? ● How does matrix factorization or gradient boosted decision trees or map reduce or this framework I hope you’ll use work?
  • 6. Why Meetup data is cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and 114 million rsvps by >14 million members ● 2.7 million rsvps in the last 30 days ○ ~1/second
  • 7.
  • 8. Data at Meetup ● User data ● Site monitoring/performance ● AB testing ● Recommendations*
  • 9. “Everything is a recommendation” ● Not my phrase ● Not actually true yet ● Working on it
  • 11.
  • 12.
  • 13. Topic Recommendations ● New registrant ● Don’t know anything about you yet! ● Most popular is boring/repetitive Algorithm: ○ Group local meetups by topic ○ Select topic with most groups ○ Remove those groups ○ Repeat
  • 14.
  • 15.
  • 16. Group/Event Recommendations ● Replaced a topic only system ● Inputs: ○ Member, location, topics, facebook friends? demographics? ● Outputs: ○ Ranking
  • 17. Collaborative Filtering ● Classic recommendations approach ● Users who like this also like this
  • 18. Why Recs at Meetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Groups/person ○ Membership: 0.001% ○ Compare to Netflix: 1%
  • 19. Supervised Learning/Classification ● “Inferring a function from labeled training data” ● Joined Meetup/Didn’t join Meetup ● “Features”
  • 22. Logistic Regression ● Score ○ “Probability” ○ Ranking ● Fast + Easy ● Weights!
  • 23. Group recommendation weights ● TopicMatch 1.21 ● TopicMatchExtended 0.17 ● FacebookFriends 0.15 ● SecondDegreeFacebook 0.79 ● AgeUnmatch -2.20 ● GenderUnmatch -2.6 ● StateMatchFeature 0.44 ● CityMatch 0.02 ● DistanceBucket <2 1.39 ● DistanceBucket 2-5 0.83 ● DistanceBucket 5-10 0.60 ● DistanceBucket >10 n/a
  • 24. Making up features ● “Zipscore” ● All topics not created equal ● Facebook likes
  • 25. Real data is gross ● Preprocessing is critical! ○ missing data ○ outliers ○ log scale ○ bucketing ○ selection/sampling (not introducing bias)
  • 26. Cleaning data ● Schenectady ● Beverly Hills ● Astronaut ● Fake RSVP boosts (+100 guests!) ● Rsvp hogs
  • 27.
  • 28.
  • 29. TO THE FUTURE! ● Hadoop ● Clicks ● Impressions ● People to people recommendations? ● Recommending people to groups?
  • 30. Thanks! Smart people come work with me. http://www.meetup.com/jobs/ Special thanks: ● Chris Halpert ● Victor J Wang