3. Our Mission
âTo share and grow the worldâs knowledgeâ
â Millions of questions & answers
â Millions of users
â Thousands of topics
â ...
10. Implicit vs. Explicit
â Many have acknowledged
that implicit feedback is more useful
â Is implicit feedback really always
more useful?
â If so, why?
11. â Implicit data is (usually):
â More dense, and available for all users
â Better representative of user behavior vs.
user reflection
â More related to final objective function
â Better correlated with AB test results
â E.g. Rating vs watching
Implicit vs. Explicit
12. â However
â It is not always the case that
direct implicit feedback correlates
well with long-term retention
â E.g. clickbait
â Solution:
â Combine different forms of
implicit + explicit to better represent
long-term goal
Implicit vs. Explicit
14. Training a model
â Model will learn according to:
â Training data (e.g. implicit and explicit)
â Target function (e.g. probability of user reading an answer)
â Metric (e.g. precision vs. recall)
â Example 1 (made up):
â Optimize probability of a user going to the cinema to
watch a movie and rate it âhighlyâ by using purchase history
and previous ratings. Use NDCG of the ranking as final
metric using only movies rated 4 or higher as positives.
15. Example 2 - Quoraâs feed
â Training data = implicit + explicit
â Target function: Value of showing a story to a
user ~ weighted sum of actions: v = âa
va
1{ya
= 1}
â predict probabilities for each action, then compute expected
value: v_pred = E[ V | x ] = âa
va
p(a | x)
â Metric: any ranking metric
17. Supervised/Unsupervised Learning
â Unsupervised learning as dimensionality reduction
â Unsupervised learning as feature engineering
â The âmagicâ behind combining
unsupervised/supervised learning
â E.g.1 clustering + knn
â E.g.2 Matrix Factorization
â MF can be interpreted as
â Unsupervised:
â Dimensionality Reduction a la PCA
â Clustering (e.g. NMF)
â Supervised
â Labeled targets ~ regression
18. Supervised/Unsupervised Learning
â One of the âtricksâ in Deep Learning is how it
combines unsupervised/supervised learning
â E.g. Stacked Autoencoders
â E.g. training of convolutional nets
20. Ensembles
â Netflix Prize was won by an ensemble
â Initially Bellkor was using GDBTs
â BigChaos introduced ANN-based ensemble
â Most practical applications of ML run an ensemble
â Why wouldnât you?
â At least as good as the best of your methods
â Can add completely different approaches (e.
g. CF and content-based)
â You can use many different models at the
ensemble layer: LR, GDBTs, RFs, ANNs...
21. Ensembles & Feature Engineering
â Ensembles are the way to turn any model into a feature!
â E.g. Donât know if the way to go is to use Factorization
Machines, Tensor Factorization, or RNNs?
â Treat each model as a âfeatureâ
â Feed them into an ensemble
24. Outputs will be inputs
â Ensembles turn any model into a feature
â Thatâs great!
â That can be a mess!
â Make sure the output of your model is ready to
accept data dependencies
â E.g. can you easily change the distribution of the
value without affecting all other models
depending on it?
â Avoid feedback loops
â Can you treat your ML infrastructure as you would
your software one?
25. ML vs Software
â Can you treat your ML infrastructure as you would
your software one?
â Yes and No
â You should apply best Software Engineering
practices (e.g. encapsulation, abstraction, cohesion,
low couplingâŠ)
â However, Design Patterns for Machine Learning
software are not well known/documented
27. Feature Engineering
â Main properties of a well-behaved ML feature
â Reusable
â Transformable
â Interpretable
â Reliable
â Reusability: You should be able to reuse features in different
models, applications, and teams
â Transformability: Besides directly reusing a feature, it
should be easy to use a transformation of it (e.g. log(f), max(f),
âft
over a time windowâŠ)
28. Feature Engineering
â Main properties of a well-behaved ML feature
â Reusable
â Transformable
â Interpretable
â Reliable
â Interpretability: In order to do any of the previous, you
need to be able to understand the meaning of features and
interpret their values.
â Reliability: It should be easy to monitor and detect bugs/issues
in features
29. Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
âą truthful
âą reusable
âą provides explanation
âą well formatted
âą ...
30. Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
âą Features that relate to the answer
quality itself
âą Interaction features
(upvotes/downvotes, clicks,
commentsâŠ)
âą User features (e.g. expertise in topic)
32. Machine Learning Infrastructure
â Whenever you develop any ML infrastructure, you need to
target two different modes:
â Mode 1: ML experimentation
â Flexibility
â Easy-to-use
â Reusability
â Mode 2: ML production
â All of the above + performance & scalability
â Ideally you want the two modes to be as similar as possible
â How to combine them?
33. Machine Learning Infrastructure: Experimentation & Production
â Option 1:
â Favor experimentation and only invest in productionizing
once something shows results
â E.g. Have ML researchers use R and then ask Engineers
to implement things in production when they work
â Option 2:
â Favor production and have âresearchersâ struggle to figure
out how to run experiments
â E.g. Implement highly optimized C++ code and have ML
researchers experiment only through data available in logs/DB
34. Machine Learning Infrastructure: Experimentation & Production
â Option 1:
â Favor experimentation and only invest in productionazing once
something shows results
â E.g. Have ML researchers use R and then ask Engineers to
implement things in production when they work
â Option 2:
â Favor production and have âresearchersâ struggle to figure out
how to run experiments
â E.g. Implement highly optimized C++ code and have ML
researchers experiment only through data available in logs/DB
35. â Good intermediate options:
â Have ML âresearchersâ experiment on iPython Notebooks using
Python tools (scikit-learn, TheanoâŠ). Use same tools in
production whenever possible, implement optimized versions
only when needed.
â Implement abstraction layers on top of optimized
implementations so they can be accessed from regular/friendly
experimentation tools
Machine Learning Infrastructure: Experimentation & Production
37. Model debuggability
â Value of a model = value it brings to the product
â Product owners/stakeholders have expectations on
the product
â It is important to answer questions to why did
something fail
â Bridge gap between product design and ML algos
â Model debuggability is so important it can
determine:
â Particular model to use
â Features to rely on
â Implementation of tools
40. Distributing ML
â Most of what people do in practice can fit into a multi-
core machine
â Smart data sampling
â Offline schemes
â Efficient parallel code
â Dangers of âeasyâ distributed approaches such
as Hadoop/Spark
â Do you care about costs? How about latencies?
41. Distributing ML
â Example of optimizing computations to fit them into
one machine
â Spark implementation: 6 hours, 15 machines
â Developer time: 4 days
â C++ implementation: 10 minutes, 1 machine
â Most practical applications of Big Data can fit into
a (multicore) implementation
43. Data Scientists and ML Engineers
â We all know the definition of a Data Scientist
â Where do Data Scientists fit in an organization?
â Many companies struggling with this
â Valuable to have strong DS who can bring value
from the data
â Strong DS with solid engineering skills are
unicorns and finding them is not scalable
â DS need engineers to bring things to production
â Engineers have enough on their plate to be willing to
âproductionizeâ cool DS projects
44. The data-driven ML innovation funnel
Data Research
ML Exploration -
Product Design
AB Testing
45. Data Scientists and ML Engineers
â Solution:
â (1) Define different parts of the innovation funnel
â Part 1. Data research & hypothesis
building -> Data Science
â Part 2. ML solution building &
implementation -> ML Engineering
â Part 3. Online experimentation, AB
Testing analysis-> Data Science
â (2) Broaden the definition of ML Engineers
to include from coding experts with high-level
ML knowledge to ML experts with good
software skills
Data Research
ML Solution
AB Testing
Data
Science
Data
Science
ML
Engineering
47. â Make sure you teach your model what you
want it to learn
â Ensembles and the combination of
supervised/unsupervised techniques are key
in many ML applications
â Important to focus on feature engineering
â Be thoughtful about
â your ML infrastructure/tools
â about organizing your teams