SlideShare a Scribd company logo
1 of 49
Download to read offline
Optimal Learning 
for Fun and Profit with MOE 
Scott Clark 
MLconf SF 2014 
11/14/14 
Joint work with: Eric Liu, Peter Frazier, Norases Vesdapunt, Deniz Oktay, JaiLei Wang 
sclark@yelp.com @DrScottClark
Outline of Talk 
● Optimal Learning 
○ What is it? 
○ Why do we care? 
● Multi-armed bandits 
○ Definition and motivation 
○ Examples 
● Bayesian global optimization 
○ Optimal experiment design 
○ Uses to extend traditional A/B testing 
○ Examples 
● MOE: Metric Optimization Engine 
○ Examples and Features
What is optimal learning? 
Optimal learning addresses the challenge of 
how to collect information as efficiently as 
possible, primarily for settings where 
collecting information is time consuming 
and expensive. 
Prof. Warren Powell - optimallearning.princeton.edu 
What is the most efficient way to collect 
information? 
Prof. Peter Frazier - people.orie.cornell.edu/pfrazier 
How do we make the most money, as fast 
as possible? 
Me - @DrScottClark
Part I: 
Multi-Armed Bandits
What are multi-armed bandits? 
THE SETUP 
● Imagine you are in front of K slot machines. 
● Each one is set to "free play" (but you can still win $$$) 
● Each has a possibly different, unknown payout rate 
● You have a fixed amount of time to maximize payout 
GO!
What are multi-armed bandits? 
THE SETUP 
(math version)
Real World Bandits 
Why do we care? 
● Maps well onto Click Through Rate (CTR) 
○ Each arm is an ad or search result 
○ Each click is a success 
○ Want to maximize clicks 
● Can be used in experiments (A/B testing) 
○ Want to find the best solutions, fast 
○ Want to limit how often bad solutions are used
Tradeoffs 
Exploration vs. Exploitation 
Gaining more knowledge about the system 
vs. 
Getting largest payout with current knowledge
Naive Example 
Epsilon First Policy 
● Sample sequentially εT < T times 
○ only explore 
● Pick the best and sample for t = εT+1, ..., T 
○ only exploit
Example (K = 3, t = 0) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
0 
0 
- 
0 
0 
- 
0 
0 
- 
Observed Information
Example (K = 3, t = 1) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
1 
1 
1 
0 
0 
- 
0 
0 
- 
Observed Information
Example (K = 3, t = 2) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
1 
1 
1 
1 
1 
1 
0 
0 
- 
Observed Information
Example (K = 3, t = 3) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
1 
1 
1 
1 
1 
1 
1 
0 
0 
Observed Information
Example (K = 3, t = 4) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
2 
1 
0.5 
1 
1 
1 
1 
0 
0 
Observed Information
Example (K = 3, t = 5) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
2 
1 
0.5 
2 
2 
1 
1 
0 
0 
Observed Information
Example (K = 3, t = 6) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
2 
1 
0.5 
2 
2 
1 
2 
0 
0 
Observed Information
Example (K = 3, t = 7) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
3 
2 
0.66 
2 
2 
1 
2 
0 
0 
Observed Information
Example (K = 3, t = 8) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
3 
2 
0.66 
3 
3 
1 
2 
0 
0 
Observed Information
Example (K = 3, t = 9) 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
3 
2 
0.66 
3 
3 
1 
3 
1 
0.33 
Observed Information
Example (K = 3, t > 9) 
Exploit! 
Profit! 
Right?
What if our observed ratio is a poor approx? 
Unknown p = 0.5 p = 0.8 p = 0.2 
payout rate 
PULLS: 
WINS: 
RATIO: 
3 
2 
0.66 
3 
3 
1 
3 
1 
0.33 
Observed Information
What if our observed ratio is a poor approx? 
Unknown p = 0.9 p = 0.5 p = 0.5 
payout rate 
PULLS: 
WINS: 
RATIO: 
3 
2 
0.66 
3 
3 
1 
3 
1 
0.33 
Observed Information
Fixed exploration fails 
Regret is unbounded! 
Amount of exploration 
needs to depend on data 
We need better policies!
What should we do? 
Many different policies 
● Weighted random choice (another naive approach) 
● Epsilon-greedy 
○ Best arm so far with P=1-ε, random otherwise 
● Epsilon-decreasing* 
○ Best arm so far with P=1-(ε * exp(-rt)), random otherwise 
● UCB-exp* 
● UCB-tuned* 
● BLA* 
● SoftMax* 
● etc, etc, etc (60+ years of research) 
*Regret bounded as t->infinity
Bandits in the Wild 
What if... 
● Hardware constraints limit real-time knowledge? (batching) 
● Payoff noisy? Non-binary? Changes in time? (dynamic content) 
● Parallel sampling? (many concurrent users) 
● Arms expire? (events, news stories, etc) 
● You have knowledge of the user? (logged in, contextual history) 
● The number of arms increases? Continuous? (parameter search) 
Every problem is different. 
This is an active area of research.
Part I: 
Global Optimization
THE GOAL 
● Optimize some objective function 
○ CTR, revenue, delivery time, or some combination thereof 
● given some parameters 
○ config values, cuttoffs, ML parameters 
● CTR = f(parameters) 
○ Find best parameters 
● We want to sample the underlying function as few times as possible 
(more mathy version)
Metric Optimization Engine 
A global, black box method for parameter optimization 
History of how past parameters have performed 
MOE 
New, optimal parameters
What does MOE do? 
● MOE optimizes a metric (like CTR) given some 
parameters as inputs (like scoring weights) 
● Given the past performance of different parameters 
MOE suggests new, optimal parameters to test 
Results of A/B 
tests run so far 
MOE 
New, optimal 
values to A/B test
Example Experiment 
Biz details distance in ad 
● Setting a different distance cutoff for each category 
Parameters + Obj Func 
distance_cutoffs = { 
‘shopping’: 20.0, 
‘food’: 14.0, 
‘auto’: 15.0, 
…} 
objective_function = { 
‘value’: 0.012, 
‘std’: 0.00013 
} 
MOE New Parameters 
distance_cutoffs = { 
‘shopping’: 22.1, 
‘food’: 7.3, 
‘auto’: 12.6, 
…} 
to show “X miles away” text in biz_details ad 
● For each category we define a maximum distance 
Run A/B Test
Why do we need MOE? 
● Parameter optimization is hard 
○ Finding the perfect set of parameters takes a long time 
○ Hope it is well behaved and try to move in the right direction 
○ Not possible as number of parameters increases 
● Intractable to find best set of parameters in all situations 
○ Thousands of combinations of program type, flow, category 
○ Finding the best parameters manually is impossible 
● Heuristics quickly break down in the real world 
○ Dependent parameters (changes to one change all others) 
○ Many parameters at once (location, category, map, place, ...) 
○ Non-linear (complexity and chaos break assumptions) 
MOE solves all of these problems in an optimal way
How does it work? 
MOE 
1. Build Gaussian Process (GP) 
with points sampled so far 
2. Optimize covariance 
hyperparameters of GP 
3. Find point(s) of highest 
Expected Improvement 
within parameter domain 
4. Return optimal next best 
point(s) to sample
Rasmussen and 
Williams GPML 
gaussianprocess.org 
Gaussian Processes
Prior: 
Posterior: 
Gaussian Processes
Optimizing Covariance Hyperparameters 
Finding the GP model that fits best 
● All of these GPs are created with the same initial data 
○ with different hyperparameters (length scales) 
● Need to find the model that is most likely given the data 
○ Maximum likelihood, cross validation, priors, etc 
Rasmussen and Williams Gaussian Processes for Machine Learning
Optimizing Covariance Hyperparameters 
Rasmussen and Williams Gaussian Processes for Machine Learning
Find point(s) of highest expected improvement 
We want to find the point(s) that are expected to beat the best point seen so far, by the most. 
[Jones, Schonlau, Welsch 1998] 
[Clark, Frazier 2012]
Tying it all Together #1: A/B Testing 
Users 
Experiment 
Framework 
(users -> cohorts) 
(cohorts -> % traffic, 
params) 
● Optimally assign traffic fractions for 
experiments (Multi-Armed Bandits) 
● Optimally suggest new cohorts to be run 
(Bayesian Global Optimization) 
Metric System (batch) 
Logs, Metrics, Results 
MOE 
Multi-Armed Bandits 
Bayesian Global Opt 
App 
cohorts -> params 
params -> objective function 
optimal cohort % traffic 
optimal new params 
daily/hourly batch 
time consuming and expensive
Tying it all Together #2 
Expensive Batch Systems 
Machine Learning 
Framework 
complex regression, deep 
learning system, etc 
● Optimally suggest new hyperparameters 
for the framework to minimize loss 
(Bayesian Global Optimization) 
Metrics 
Error, Loss, Likelihood, etc 
MOE 
Bayesian Global Opt 
Big Data 
framework output 
time consuming and expensive Hyperparameters 
optimal hyperparameters
What is MOE doing right now? 
MOE is now live in production 
● MOE is informing active experiments 
● MOE is successfully optimizing towards all given metrics 
● MOE treats the underlying system it is optimizing as a black box, 
allowing it to be easily extended to any system
MOE is Open Source! 
github.com/Yelp/MOE
MOE is Fully Documented 
yelp.github.io/MOE
MOE has Examples 
yelp.github.io/MOE/examples.html
● Multi-Armed Bandits 
○ Many policies implemented and more on the way 
● Global Optimization 
○ Bayesian Global Optimization via Expected Improvement on GPs
MOE is Easy to Install 
● yelp.github.io/MOE/install.html#install-in-docker 
● registry.hub.docker.com/u/yelpmoe/latest 
A MOE server is now running at http://localhost:6543
Questions? 
sclark@yelp.com 
@DrScottClark 
github.com/Yelp/MOE
References 
Gaussian Processes for Machine Learning 
Carl edward Rasmussen and Christopher K. I. Williams. 2006. 
Massachusetts Institute of Technology. 55 Hayward St., Cambridge, MA 02142. 
http://www.gaussianprocess.org/gpml/ (free electronic copy) 
Parallel Machine Learning Algorithms In Bioinformatics and Global Optimization 
(PhD Dissertation) 
Part II, EPI: Expected Parallel Improvement 
Scott Clark. 2012. 
Cornell University, Center for Applied Mathematics. Ithaca, NY. 
https://github.com/sc932/Thesis 
Differentiation of the Cholesky Algorithm 
S. P. Smith. 1995. 
Journal of Computational and Graphical Statistics. Volume 4. Number 2. p134-147 
A Multi-points Criterion for Deterministic Parallel Global Optimization based on 
Gaussian Processes. 
David Ginsbourger, Rodolphe Le Riche, and Laurent Carraro. 2008. 
D´epartement 3MI. Ecole Nationale Sup´erieure des Mines. 158 cours Fauriel, Saint-Etienne, France. 
{ginsbourger, leriche, carraro}@emse.fr 
Efficient Global Optimization of Expensive Black-Box Functions 
Jones, D.R., Schonlau, M., Welch,W.J. 1998. 
Journal of Global Optimization, 13, 455-492.
Use Cases 
● Optimizing a system's click-through or conversion rate (CTR). 
○ MOE is useful when evaluating CTR requires running an A/B test on real user traffic, and 
getting statistically significant results requires running this test for a substantial amount of time 
(hours, days, or even weeks). Examples include setting distance thresholds, ad unit properties, 
or internal configuration values. 
○ http://engineeringblog.yelp.com/2014/10/using-moe-the-metric-optimization-engine-to-optimize-an- 
ab-testing-experiment-framework.html 
● Optimizing tunable parameters of a machine-learning prediction method. 
○ MOE can be used when calculating the prediction error for one choice of the parameters takes a 
long time, which might happen because the prediction method is complex and takes a long 
time to train, or because the data used to evaluate the error is huge. Examples include deep 
learning methods or hyperparameters of features in logistic regression.
More Use Cases 
● Optimizing the design of an engineering system. 
○ MOE helps when evaluating a design requires running a complex physics-based numerical 
simulation on a supercomputer. Examples include designing and modeling airplanes, the 
traffic network of a city, a combustion engine, or a hospital. 
● Optimizing the parameters of a real-world experiment. 
○ MOE can help guide design when every experiment needs to be physically created in a lab or 
very few experiments can be run in parallel. Examples include chemistry, biology, or physics 
experiments or a drug trial. 
● Any time sampling a tunable, unknown function is time consuming or 
expensive.

More Related Content

What's hot

notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine LearningPranav Ainavolu
 
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsDay 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsAseda Owusua Addai-Deseh
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Introduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningIntroduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningSardar Alam
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1Roger Barga
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scaleOwen Zhang
 
Rinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine LearningRinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine LearningAnna Chaney
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5Roger Barga
 
Lean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven CompaniesLean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven CompaniesXavier Amatriain
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7Roger Barga
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learningSara Hooker
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning SystemsXavier Amatriain
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningShimi Bandiel
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 

What's hot (20)

notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsDay 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Introduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learningIntroduction to machine learningunsupervised learning
Introduction to machine learningunsupervised learning
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
Barga Data Science lecture 1
Barga Data Science lecture 1Barga Data Science lecture 1
Barga Data Science lecture 1
 
Machine learning
Machine learningMachine learning
Machine learning
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
 
Rinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine LearningRinse and Repeat : The Spiral of Applied Machine Learning
Rinse and Repeat : The Spiral of Applied Machine Learning
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Lean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven CompaniesLean DevOps - Lessons Learned from Innovation-driven Companies
Lean DevOps - Lessons Learned from Innovation-driven Companies
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 

Viewers also liked

Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFLise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFMLconf
 
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFQuoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFMLconf
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
 
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFAmeet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFMLconf
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsJohann Schleier-Smith
 

Viewers also liked (8)

Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFLise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
 
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFQuoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SF
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
 
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFAmeet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 

Similar to Scott Clark, Software Engineer, Yelp at MLconf SF

Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEYelp Engineering
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...Yelp Engineering
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning SystemsTrieu Nguyen
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102Linaro
 
User Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayUser Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayAhmed Hassan
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing frameworkAgnes van Belle
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Monomi: Practical Analytical Query Processing over Encrypted Data
Monomi: Practical Analytical Query Processing over Encrypted DataMonomi: Practical Analytical Query Processing over Encrypted Data
Monomi: Practical Analytical Query Processing over Encrypted DataShurenBi1
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
Lec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchLec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchRonald Teo
 
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво....NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...NETFest
 
Online learning &amp; adaptive game playing
Online learning &amp; adaptive game playingOnline learning &amp; adaptive game playing
Online learning &amp; adaptive game playingSaeid Ghafouri
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
 

Similar to Scott Clark, Software Engineer, Yelp at MLconf SF (20)

Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning Systems
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102
 
User Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-PlayUser Payment Prediction in Free-to-Play
User Payment Prediction in Free-to-Play
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing framework
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Monomi: Practical Analytical Query Processing over Encrypted Data
Monomi: Practical Analytical Query Processing over Encrypted DataMonomi: Practical Analytical Query Processing over Encrypted Data
Monomi: Practical Analytical Query Processing over Encrypted Data
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Lec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchLec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-research
 
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво....NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
.NET Fest 2017. Игорь Кочетов. Классификация результатов тестирования произво...
 
Online learning &amp; adaptive game playing
Online learning &amp; adaptive game playingOnline learning &amp; adaptive game playing
Online learning &amp; adaptive game playing
 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 

Scott Clark, Software Engineer, Yelp at MLconf SF

  • 1. Optimal Learning for Fun and Profit with MOE Scott Clark MLconf SF 2014 11/14/14 Joint work with: Eric Liu, Peter Frazier, Norases Vesdapunt, Deniz Oktay, JaiLei Wang sclark@yelp.com @DrScottClark
  • 2. Outline of Talk ● Optimal Learning ○ What is it? ○ Why do we care? ● Multi-armed bandits ○ Definition and motivation ○ Examples ● Bayesian global optimization ○ Optimal experiment design ○ Uses to extend traditional A/B testing ○ Examples ● MOE: Metric Optimization Engine ○ Examples and Features
  • 3. What is optimal learning? Optimal learning addresses the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive. Prof. Warren Powell - optimallearning.princeton.edu What is the most efficient way to collect information? Prof. Peter Frazier - people.orie.cornell.edu/pfrazier How do we make the most money, as fast as possible? Me - @DrScottClark
  • 5. What are multi-armed bandits? THE SETUP ● Imagine you are in front of K slot machines. ● Each one is set to "free play" (but you can still win $$$) ● Each has a possibly different, unknown payout rate ● You have a fixed amount of time to maximize payout GO!
  • 6. What are multi-armed bandits? THE SETUP (math version)
  • 7. Real World Bandits Why do we care? ● Maps well onto Click Through Rate (CTR) ○ Each arm is an ad or search result ○ Each click is a success ○ Want to maximize clicks ● Can be used in experiments (A/B testing) ○ Want to find the best solutions, fast ○ Want to limit how often bad solutions are used
  • 8. Tradeoffs Exploration vs. Exploitation Gaining more knowledge about the system vs. Getting largest payout with current knowledge
  • 9. Naive Example Epsilon First Policy ● Sample sequentially εT < T times ○ only explore ● Pick the best and sample for t = εT+1, ..., T ○ only exploit
  • 10. Example (K = 3, t = 0) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 0 0 - 0 0 - 0 0 - Observed Information
  • 11. Example (K = 3, t = 1) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 1 1 1 0 0 - 0 0 - Observed Information
  • 12. Example (K = 3, t = 2) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 1 1 1 1 1 1 0 0 - Observed Information
  • 13. Example (K = 3, t = 3) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 1 1 1 1 1 1 1 0 0 Observed Information
  • 14. Example (K = 3, t = 4) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 2 1 0.5 1 1 1 1 0 0 Observed Information
  • 15. Example (K = 3, t = 5) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 2 1 0.5 2 2 1 1 0 0 Observed Information
  • 16. Example (K = 3, t = 6) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 2 1 0.5 2 2 1 2 0 0 Observed Information
  • 17. Example (K = 3, t = 7) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 3 2 0.66 2 2 1 2 0 0 Observed Information
  • 18. Example (K = 3, t = 8) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 3 2 0.66 3 3 1 2 0 0 Observed Information
  • 19. Example (K = 3, t = 9) Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 3 2 0.66 3 3 1 3 1 0.33 Observed Information
  • 20. Example (K = 3, t > 9) Exploit! Profit! Right?
  • 21. What if our observed ratio is a poor approx? Unknown p = 0.5 p = 0.8 p = 0.2 payout rate PULLS: WINS: RATIO: 3 2 0.66 3 3 1 3 1 0.33 Observed Information
  • 22. What if our observed ratio is a poor approx? Unknown p = 0.9 p = 0.5 p = 0.5 payout rate PULLS: WINS: RATIO: 3 2 0.66 3 3 1 3 1 0.33 Observed Information
  • 23. Fixed exploration fails Regret is unbounded! Amount of exploration needs to depend on data We need better policies!
  • 24. What should we do? Many different policies ● Weighted random choice (another naive approach) ● Epsilon-greedy ○ Best arm so far with P=1-ε, random otherwise ● Epsilon-decreasing* ○ Best arm so far with P=1-(ε * exp(-rt)), random otherwise ● UCB-exp* ● UCB-tuned* ● BLA* ● SoftMax* ● etc, etc, etc (60+ years of research) *Regret bounded as t->infinity
  • 25. Bandits in the Wild What if... ● Hardware constraints limit real-time knowledge? (batching) ● Payoff noisy? Non-binary? Changes in time? (dynamic content) ● Parallel sampling? (many concurrent users) ● Arms expire? (events, news stories, etc) ● You have knowledge of the user? (logged in, contextual history) ● The number of arms increases? Continuous? (parameter search) Every problem is different. This is an active area of research.
  • 26. Part I: Global Optimization
  • 27. THE GOAL ● Optimize some objective function ○ CTR, revenue, delivery time, or some combination thereof ● given some parameters ○ config values, cuttoffs, ML parameters ● CTR = f(parameters) ○ Find best parameters ● We want to sample the underlying function as few times as possible (more mathy version)
  • 28. Metric Optimization Engine A global, black box method for parameter optimization History of how past parameters have performed MOE New, optimal parameters
  • 29. What does MOE do? ● MOE optimizes a metric (like CTR) given some parameters as inputs (like scoring weights) ● Given the past performance of different parameters MOE suggests new, optimal parameters to test Results of A/B tests run so far MOE New, optimal values to A/B test
  • 30. Example Experiment Biz details distance in ad ● Setting a different distance cutoff for each category Parameters + Obj Func distance_cutoffs = { ‘shopping’: 20.0, ‘food’: 14.0, ‘auto’: 15.0, …} objective_function = { ‘value’: 0.012, ‘std’: 0.00013 } MOE New Parameters distance_cutoffs = { ‘shopping’: 22.1, ‘food’: 7.3, ‘auto’: 12.6, …} to show “X miles away” text in biz_details ad ● For each category we define a maximum distance Run A/B Test
  • 31. Why do we need MOE? ● Parameter optimization is hard ○ Finding the perfect set of parameters takes a long time ○ Hope it is well behaved and try to move in the right direction ○ Not possible as number of parameters increases ● Intractable to find best set of parameters in all situations ○ Thousands of combinations of program type, flow, category ○ Finding the best parameters manually is impossible ● Heuristics quickly break down in the real world ○ Dependent parameters (changes to one change all others) ○ Many parameters at once (location, category, map, place, ...) ○ Non-linear (complexity and chaos break assumptions) MOE solves all of these problems in an optimal way
  • 32. How does it work? MOE 1. Build Gaussian Process (GP) with points sampled so far 2. Optimize covariance hyperparameters of GP 3. Find point(s) of highest Expected Improvement within parameter domain 4. Return optimal next best point(s) to sample
  • 33. Rasmussen and Williams GPML gaussianprocess.org Gaussian Processes
  • 35. Optimizing Covariance Hyperparameters Finding the GP model that fits best ● All of these GPs are created with the same initial data ○ with different hyperparameters (length scales) ● Need to find the model that is most likely given the data ○ Maximum likelihood, cross validation, priors, etc Rasmussen and Williams Gaussian Processes for Machine Learning
  • 36. Optimizing Covariance Hyperparameters Rasmussen and Williams Gaussian Processes for Machine Learning
  • 37. Find point(s) of highest expected improvement We want to find the point(s) that are expected to beat the best point seen so far, by the most. [Jones, Schonlau, Welsch 1998] [Clark, Frazier 2012]
  • 38. Tying it all Together #1: A/B Testing Users Experiment Framework (users -> cohorts) (cohorts -> % traffic, params) ● Optimally assign traffic fractions for experiments (Multi-Armed Bandits) ● Optimally suggest new cohorts to be run (Bayesian Global Optimization) Metric System (batch) Logs, Metrics, Results MOE Multi-Armed Bandits Bayesian Global Opt App cohorts -> params params -> objective function optimal cohort % traffic optimal new params daily/hourly batch time consuming and expensive
  • 39. Tying it all Together #2 Expensive Batch Systems Machine Learning Framework complex regression, deep learning system, etc ● Optimally suggest new hyperparameters for the framework to minimize loss (Bayesian Global Optimization) Metrics Error, Loss, Likelihood, etc MOE Bayesian Global Opt Big Data framework output time consuming and expensive Hyperparameters optimal hyperparameters
  • 40. What is MOE doing right now? MOE is now live in production ● MOE is informing active experiments ● MOE is successfully optimizing towards all given metrics ● MOE treats the underlying system it is optimizing as a black box, allowing it to be easily extended to any system
  • 41. MOE is Open Source! github.com/Yelp/MOE
  • 42. MOE is Fully Documented yelp.github.io/MOE
  • 43. MOE has Examples yelp.github.io/MOE/examples.html
  • 44. ● Multi-Armed Bandits ○ Many policies implemented and more on the way ● Global Optimization ○ Bayesian Global Optimization via Expected Improvement on GPs
  • 45. MOE is Easy to Install ● yelp.github.io/MOE/install.html#install-in-docker ● registry.hub.docker.com/u/yelpmoe/latest A MOE server is now running at http://localhost:6543
  • 47. References Gaussian Processes for Machine Learning Carl edward Rasmussen and Christopher K. I. Williams. 2006. Massachusetts Institute of Technology. 55 Hayward St., Cambridge, MA 02142. http://www.gaussianprocess.org/gpml/ (free electronic copy) Parallel Machine Learning Algorithms In Bioinformatics and Global Optimization (PhD Dissertation) Part II, EPI: Expected Parallel Improvement Scott Clark. 2012. Cornell University, Center for Applied Mathematics. Ithaca, NY. https://github.com/sc932/Thesis Differentiation of the Cholesky Algorithm S. P. Smith. 1995. Journal of Computational and Graphical Statistics. Volume 4. Number 2. p134-147 A Multi-points Criterion for Deterministic Parallel Global Optimization based on Gaussian Processes. David Ginsbourger, Rodolphe Le Riche, and Laurent Carraro. 2008. D´epartement 3MI. Ecole Nationale Sup´erieure des Mines. 158 cours Fauriel, Saint-Etienne, France. {ginsbourger, leriche, carraro}@emse.fr Efficient Global Optimization of Expensive Black-Box Functions Jones, D.R., Schonlau, M., Welch,W.J. 1998. Journal of Global Optimization, 13, 455-492.
  • 48. Use Cases ● Optimizing a system's click-through or conversion rate (CTR). ○ MOE is useful when evaluating CTR requires running an A/B test on real user traffic, and getting statistically significant results requires running this test for a substantial amount of time (hours, days, or even weeks). Examples include setting distance thresholds, ad unit properties, or internal configuration values. ○ http://engineeringblog.yelp.com/2014/10/using-moe-the-metric-optimization-engine-to-optimize-an- ab-testing-experiment-framework.html ● Optimizing tunable parameters of a machine-learning prediction method. ○ MOE can be used when calculating the prediction error for one choice of the parameters takes a long time, which might happen because the prediction method is complex and takes a long time to train, or because the data used to evaluate the error is huge. Examples include deep learning methods or hyperparameters of features in logistic regression.
  • 49. More Use Cases ● Optimizing the design of an engineering system. ○ MOE helps when evaluating a design requires running a complex physics-based numerical simulation on a supercomputer. Examples include designing and modeling airplanes, the traffic network of a city, a combustion engine, or a hospital. ● Optimizing the parameters of a real-world experiment. ○ MOE can help guide design when every experiment needs to be physically created in a lab or very few experiments can be run in parallel. Examples include chemistry, biology, or physics experiments or a drug trial. ● Any time sampling a tunable, unknown function is time consuming or expensive.