RecSys 2012 Industry Track - Sumanth Kolar, StumbleUpon
It's human nature to be curious, to learn new things, to want to find out more. Discovery is an innate human need, and with the rise of the Web, the urge to learn more has increased by leaps and bounds. According to David Hornik, investor at August Capital, "The massive scale of the Web not only creates huge challenges for search, it also cripples discovery. Gone are the good old days in which fortuity would lead to the unearthing of interesting new websites." Indeed, we live in the age of "infovores" and there is definitely a need for a service that provides serendipity.
Providing serendipitous discovery that can inform, entertain and enlighten our users is of utmost importance to StumbleUpon. This talk will focus on how StumbleUpon uses several machine learning techniques such as collaborative filtering techniques, active learning, decision trees, Bayesian models and more to solve complex problems involving classification, user behavior analysis, modelling, anti-spam and recommendations. An average StumbleUpon user spends over 7 hours per month using the product, equating to hundreds of varied recommendations and ample feedback. The talk will also provide insights into some of StumbleUpon's rich data and how we can use scale to accomplish what would otherwise not be possible. We will look at innovative ways that StumbleUpon figures out the right metrics to evaluate recommender systems - a very complex problem. We will also discuss our research on StumbleUpon's mobile activity, which is growing 800% year over year and is the fastest growing part of our business, and how mobile recommendations are unique and important.
Bio: As Engineering Director at StumbleUpon, Sumanth Kolar leads the applied research team, overseeing recommendations, anti-spam, content analysis, user modeling, data sciences and infrastructure. ?Sumanth tackles very interesting and challenging research problems as StumbleUpon delivers more than 1 billion personalized recommendations a month to its more than 25 million users. Prior to joining the company in 2009, Sumanth engineered bidding and computer vision systems at Yahoo! and Adobe Research. Sumanth holds a masters degree in computer science from the University of California at Santa Cruz.
2. StumbleUpon’s Mission
Help users find content they did not expect to find
Be the best way to discover new
and interesting things from across
the Web.
3. How StumbleUpon works
1. Register 2. Tell us your interests 3. Start Stumbling and
rating web pages
We use your interests and behavior to
recommend new content for you!
4. Discovery is very different from search
Discovery at StumbleUpon Search
Serendipitous Intent driven
One at a time List of articles
Never repeats Always repeats
Constantly adapting Fixed results
Tailored for you Impersonal
There is a ongoing shift from search to discovery
7. What are the key challenges to
good recommendations?
8. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
9. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
12. Continually Enhance a User’s Interest Graph
Analyze user’s StumbleUpon history to expand on
interest preferences:
• Add/remove topics
• Follow/block particular domains
13. Continually Enhance a User’s Interest Graph
Leverage social network
data:
• Find friends & people
to follow
• Find content trending
in your social circles
• Find additional
interests
14. Continually Enhance a User’s Interest Graph
Mine internal StumbleUpon
rating and sharing data to
suggest other stumblers,
topics.
15. Enhanced Interest Graph
Friends
News
Italian Food/ Trending
User
Recipes Cooking
Cars
nasa.gov
Vintage
Cars 1x.com
16. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
17. Sampling
On average hundreds of URLs are ingested into the
StumbleUpon pipeline every minute.
• Sampling key goals:
1. Determine which URLs to sample and which to skip
completely
2. Examine sampling results to identify good URLs
• URL features used when sampling:
• Known domain performance(ratings, timespent)
• Content related features (#images, #ads, url length etc)
• User features of the discoverer (spammer vs trusted user)
18. Recommendations at StumbleUpon: Sampling
Classifier based on
User Feedback
Random Forest Vote Recommend (Timespent, Ratings)
Rating Timespent
Yes Good 35sec
Good 22sec
Webpage
Bad 15sec
Yes
No Yes
Good 45sec
Good 14sec
Yes
Good 28sec
19. Leveraging In-Network Experts
• Users who thumb-up good content and
thumb-down bad content
• For example
– Joe DiMaggio – Baseball
– Julia Child- Food/Cooking
– Da Vinci- Art and Architecture
• Ratings from Experts are more trustworthy
and earn more weight.
20. Non Expert Expert
P(Thumb Up | Page Quality) P(Thumb Up | Page Quality)
Page Quality
Page Quality
Recommendations at StumbleUpon: Experts
21. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
23. Like-Minded Users
• Find users who like content
similar to the content you do
• Signals can be ratings, time
spent, interests, etc.
• Use the content they’ve liked
24. PLSI based like-minded
Vintage Cars
Action movies Astronomy
Astronomy Space Exploration
Robotics
Physics
Classic Movies
Movies
Cars Space
Neuroscience
Astronomy
Space Exploration
Science Comedy Movies
25. Like-Minded Users: Challenges Scaling
Total Pairwise Similarity Calculations
= 50K users * 5 million users * 1K features
= 250 Trillion
Probabilistic Latent Semantic Index (PLSI)
based similarity over 500 trillion calculations
PLSI based similarity framework computes in
less than an hour
27. Different methods perform differently for
different users at different times
100%
75%
Trending
Follow
50% Bias domains
Experts
News
25%
Like-minded
0%
User 1 User 2 User 3 User 4 User 5
29. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
30. Two Main Signals from Recommendation
Rating Time Spent
Both present numerous challenges . . .
31. Ratings: volume decay
Users rate more during
their initial experience
# Ratings
Time
Why is this happening?
32. Time Spent
?
?
Images
Video Text
Images
Video
T5 sec
T3 sec T4 sec
T2 sec
T1 sec
• Ratings are sparse
• < 10% of recommendations have explicit ratings.
• Using time spent decide whether the stumble was skipped
• Timespent on videos is longer than images.
• Solution: Estimate p(Like | Timespent)
• Model based on user, content patterns
33. Challenges: Time spent on different devices
Stumble Bar
Median time spent per stumble
Mobile / Tablets
Installed plugin
5th percentile time spent per stumble
34. Pillars of good recommendations
Understand who the user is and what he is
interested in.
Separate good content from the bad.
Explore various techniques for matching users
to content.
Learn from your recommendations.
38. Many other interesting problems…
• Dupe detection
• Anti-spam
• News
• Topic classification
• Metrics, quality analysis
• Trending
• Search We are HIRING !!!
• User biases, mood
• Many more…
Hinweis der Redaktion
At the end of this talk, you would have a good understanding of problems with discovery, some solutions, some data insights.
Our goal is to show content that you did not know you would likeTo surprise you, enlighten youBasically to enable exploration, discovery
-During signup, we ask interesting questions to learn more about you – solve the cold start problem
- Think of discovery as search without a term and add the complexities i.e, nothing repeats etcFor example, if you want to learn about astronomy or genetic algorithms its hard to do on search or any other services --- way more work
When I started a couple of years back, we were 6M users and 15 employeesGrowing rapidly, especially on mobileTalk about time spent and how users are super hooked.
Users are good at choosing topics that they like.. We have had repeated good success at increasing the topics they pickBut, the problem is more about having them pick the right topics for them.. Arts vs AI.. Its not simple to build a user experience that accounts for that and gets us that dataHuge area of research for StumbleUpon --- how do we get as much as possible from the user without losing them or setting completely different expectations than what the product is
Now we have a basic version of the interests graph.. Some topics you like
StumbleSenseBased on you likes/dislikes we build a SENSE for other things you may like. Hence suggest topics, domains, etc that we think you will like as you stumble alongMakes interest elicitation a part of the core productYou are learning about the user and the user understands the product a lot.. Dialgoue and back, forthNotice that we give the reason why it was recommended.. Transparency is very important.
Leverage other networks you are part of to get data about what you like and jumpstart interest graph
Also, show suggested stumblers, interests etc
More dense interest graph. Affinity, confidence to the interest varies and depending on that we can exploit, explore.
When new content is discovered /ingested how do we determine if its good or not.You will always have exceptions that need to be handled. For ex: - Domains such as youtube, basically UGC in which content is diverse.. You need to build models that account for thatUser features of the raters/discoverers .. Just because a spammer rated cnn.com you can’t ignore it.. Look at multiple sources of information and decide whether the url is worth sampling or not
Now, one way of doing this is to use a random forest with content features
And also we can sample to expertsThat’s one huge advantage SU has – the fact that we can decide which site to send and get data for that url.But, sometimes you could be recommending bad content to the expert – you get around by telling the expert that we think he is an expert and we need to get more data from him about the url. Again transparency for the win Transparency allows us to set the right expectations..
One way of defining experts is users who thumbup high quality pages and thumbdown low quality pages.. There are multiple ways you can find high quality pages-- Have a seed of experts pick urls and use them to find other experts-- Or looks at your current quality scores and see which user ratings are more predictive of that .. Use them as experts-- Social endorsement.. Have users rate others as experts, use external data sources similar to what klout is doing to do this – very hard problem.
How to you match right content to right user ? User expectations are very different. When you say you like cars and I like cars.. We are not talking about the same thingNeed to deeper understand the interest graph
One solution is find other users that are similar to you.. But then just because you are similar to me in Physics.. does not mean I would like the Music you listen to.
One solution.. Figure out latent topicsand then use them to cluster/find similar users
Now we have an interest graph that is both explicit and implicit
Different users have varying method mix.We learn the mix and balance it.. But this needs to account for mood – for example, we see that you like stumbling news in the morning and videos in the weekend. But there are always exceptions
Context i.e, showing why a recommendation was shown to a user is very important. There should be a back and forth. Recommendations should be very transparent. Context can that your friend on Facebook liked it or it can be that this is trending in Politics
Immediate conclusion is quality of recommendations is not good.. But this is both thumbups and downs Stumbling is cheap and so clicking the stumble button is better than rating. One could argue that we are doing a really good job and the marginal utility of rating is not highSolutions: Use other data such as time spent to figure out what you like. Make you rate more ;) work very closely with product on what we can do to remind the user that their ratings matter
Now, we know we need to use timespent..Last stumble, time spent Great we have a solution