Beyond Collaborative Filtering: using Machine Learning to power recommendations at Meetup
Collaborative filtering and other common recommendation algorithms are a powerful technique for some scenarios. I will cover how to design a recommendation system from the ground up using an ensemble classifier and supervised learning to avoid some of the pitfalls of collaborative filtering. From sampling to deployment, we’ve had to invent our approach with few non-academic and non-toy examples to follow. At Meetup we’re all about sharing information and empowering communities, so I’ll present the details of our model as well as some of the new features we are still developing.
3. Why Meetup data is cool
● Real people meeting up
● Every meetup could change someone's life
● No ads, just do the best thing
● Oh and >125 million rsvps by 18 million
members
● 3 million rsvps in the last 30 days
○ 1/second
4.
5. Tools at Meetup
● Hive - SQL on Hadoop
● Spark - Distributed Scala on Hadoop cluster
● Scala - Recommendations service
● R - Data analysis, Model building
● Python - Scripting, Data organizing
● Java - Backend of our web stack
9. Why Recs at Meetup are hard
● Incomplete Data (topics)
● Cold start
● Asking user for data is hard
● Going to meetups is scary
● Sparsity
○ Location
○ Low rsvp/person
○ Membership: 0.001%
○ Compare to Netflix Prize Dataset: 1%
12. Problem definition and assumptions
● Assumption: if you’re not in a given group,
you don’t want to be
○ Negative samples: groups you’re not in
○ Also a good classifier...
● Membership << expected error rate
○ Solution: sample to 50/50 join/no-join
13. Ranking
● Model output label no longer explicitly true
○ Luckily, we’d rather rank all of the results anyway
● Use a classifier that gives you a useful
output
○ Fancy black box
○ Logistic Regression
■ Easier to explain
15. Ensemble Learning
“... use multiple learning algorithms to obtain
better predictive performance than could be
obtained from any of the constituent learning
algorithms”
16. Ensemble Learning
● Topic match (original algorithm)
● Collaborative Filtering on Topics
● Social algorithm
● Other simple features (Popularity, Gender…)
● Add output of algorithms as features into
Logistic Regression model
18. Facebook Likes
● Lots of information, but how to use?
● Map to topics, let training the model take
care of the rest!
19. Mapping FB Likes to Meetup Topics
● Text based?
○ Go(game) vs Go(lang)?
○ Burton?
● Data approach!
○ Grab most popular topics across all members with
the same like
20. Normalization
● Top topics for Burton-Likers
○ Meeting New People, Coffee, bla bla
○ Most popular still dominates
● Solution: Normalize based on expected topic
occurrence in sample
21. Normalization
● For members with a given Like, compare
percent with each topic to expected among
total population
● Total population
○ 20% “Meeting New People”
○ 2% “Snowboarding
● Burton:
○ 20% “Meeting New People”
○ 9% “Snowboarding”
22. Results
● Generate top topics for all likes
○ Path from member to like to topic to group
● Add Facebook Like based topic match
feature to model
● Positive weight
○ Very good sign!
● Deploy/Split test
○ TBD
23. Summary
● Supervised Learning for Ranking as
Recommendations is cool
● Simple, interpretable models are cool
● Feature engineering is cool