3. Our Mission
âTo share and grow the worldâs
knowledgeâ
âą Millions of questions & answers
âą Millions of users
âą Thousands of topics
âą ...
13. Implicit vs. Explicit
â Many have acknowledged
that implicit feedback is more
useful
â Is implicit feedback really always
more useful?
â If so, why?
14. â Implicit data is (usually):
â More dense, and available for all users
â Better representative of user behavior vs.
user reflection
â More related to final objective function
â Better correlated with AB test results
â E.g. Rating vs watching
Implicit vs. Explicit
15. â However
â It is not always the case that
direct implicit feedback correlates
well with long-term retention
â E.g. clickbait
â Solution:
â Combine different forms of
implicit + explicit to better represent
long-term goal
Implicit vs. Explicit
17. Defining training/testing data
â Training a simple binary classifier for
good/bad answer
â Defining positive and negative labels ->
Non-trivial task
â Is this a positive or a negative?
â funny uninformative answer with many
upvotes
â short uninformative answer by a well-known
expert in the field
â very long informative answer that nobody
reads/upvotes
â informative answer with grammar/spelling
mistakes
â ...
19. Training a model
â Model will learn according to:
â Training data (e.g. implicit and explicit)
â Target function (e.g. probability of user reading an answer)
â Metric (e.g. precision vs. recall)
â Example 1 (made up):
â Optimize probability of a user going to the cinema to
watch a movie and rate it âhighlyâ by using purchase history
and previous ratings. Use NDCG of the ranking as final
metric using only movies rated 4 or higher as positives.
20. Example 2 - Quoraâs feed
â Training data = implicit + explicit
â Target function: Value of showing a
story to a
user ~ weighted sum of actions:
v = âa
va
1{ya
= 1}
â predict probabilities for each action, then compute expected
value: v_pred = E[ V | x ] = âa
va
p(a | x)
â Metric: any ranking metric
26. Ensembles
â Netflix Prize was won by an ensemble
â Initially Bellkor was using GDBTs
â BigChaos introduced ANN-based ensemble
â Most practical applications of ML run an ensemble
â Why wouldnât you?
â At least as good as the best of your methods
â Can add completely different approaches
(e.g. CF and content-based)
â You can use many different models at the
ensemble layer: LR, GDBTs, RFs, ANNs...
27. Ensembles & Feature Engineering
â Ensembles are the way to turn any model into a feature!
â E.g. Donât know if the way to go is to use Factorization
Machines, Tensor Factorization, or RNNs?
â Treat each model as a âfeatureâ
â Feed them into an ensemble
30. Need for feature engineering
In many cases an understanding of the domain will lead to
optimal results.
Feature Engineering
31. Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
âą truthful
âą reusable
âą provides explanation
âą well formatted
âą ...
32. Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
âą Features that relate to the answer
quality itself
âą Interaction features
(upvotes/downvotes, clicks,
commentsâŠ)
âą User features (e.g. expertise in topic)
35. Model debuggability
â Value of a model = value it brings to the product
â Product owners/stakeholders have expectations on
the product
â It is important to answer questions to why did
something fail
â Bridge gap between product design and ML algos
â Model debuggability is so important it can
determine:
â Particular model to use
â Features to rely on
â Implementation of tools
39. Executing A/B tests
â Measure differences in metrics across statistically identical
populations that each experience a different algorithm.
â Decisions on the product always data-driven
â Overall Evaluation Criteria (OEC) = member retention
â Use long-term metrics whenever possible
â Short-term metrics can be informative and allow faster decisions
â But, not always aligned with OEC
40. Offline testing
â Measure model performance,
using (IR) metrics
â Offline performance = indication
to make decisions on follow-up
A/B tests
â A critical (and mostly unsolved)
issue is how offline metrics
correlate with A/B test results.
42. Distributing Recommender Systems
â Most of what people do in practice can fit
into a multi-core machine
â As long as you use:
â Smart data sampling
â Offline schemes
â Efficient parallel code
â (⊠but not Deep ANNs)
â Do you care about costs? How about latencies or
system complexity/debuggability?
45. â Recommender Systems are about much more than
just predicting a rating
â Designing a âreal-lifeâ recsys means paying
attention to issues such as:
â Feature engineering
â Training dataset
â Metrics
â Experimentation and AB Testing
â System scalability
â ...
â Lots of room for improvement & research