This document discusses using Thompson sampling for search query recommendation. It introduces the multi-arm bandit problem and how Thompson sampling can be applied to solve it. The key aspects covered are:
1) Thompson sampling frames query recommendation as a multi-arm bandit problem to balance exploring new queries and exploiting popular ones.
2) It models the success probability of each query as a beta distribution and randomly selects queries based on these distributions to decide the next query to recommend.
3) An experiment on real search log data tested Thompson sampling for query recommendation with different numbers of queries to identify, showing it can quickly find the most popular queries.
13. x = random.random() #0<=x<1
y = 0.49
if (x < y):
return true
else:
return false
(Observed)
13
(Estimate)
I played 10 times -- win 5 and lose 5
I played 100 times -- win 45 and lose 55
#Learn#
Chanceofhavingμ
μ
``prior’’
21. 0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
The motivation of Thompson-S (2)
21
Beta(20,10)
Beta(60,40)
See a
good one;
“learn more”
22. 0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
Intuition
(Underdog, but worth to learn)
22
Beta(4,6)
Beta(60,40)
23. 0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
0.0 0.2 0.4 0.6 0.8 1.0
02468
PDF
The motivation of Thompson-S (1)
23
Beta(10,15)
Beta(60,40)
avoid exploring
“low potential” arm
early on
25. 25
Init: a=1, b=1, Sx=Fx=0 for all x
each arm corresponds to Beta(Sx+a, Fx+b) prior
1. Draw a random number from each arm
based on Beta(Sx+a, Fx+b)
2. Play the arm (x’) with the highest number
3. If (see a reward)
Sx’ += 1
else
Fx’ += 1
Algorithm
38. 38
Hsieh, C. Neufeld, J. Holloway King, T. Cho, J.J.
Efficient Approximate Thompson Sampling for
Search Query Recommendation The 30th
ACM/SIGAPP Symposium On Applied Computing
(SAC 2015)
About the author & download:
http://oak.cs.ucla.edu/~chucheng/
Hinweis der Redaktion
Small training period? => low recall
One related keyword is allowed
(2) We have more items than we can displayed.
Find an old problem so that you can stand on the shoulder of giants
In each time, you can pick a slot machine to play. You are given choice of selecting which slot machine to play.
You are assuming info you learn from 30% is reliable
One related keyword is allowed
用Slot Machine當例子,暫時別用Related Search
William R. Thompson
“Learn” is more intersting, earn is simple
You learn from observing “rewriting”
用80%當example
You don’t know the code. You only observe what happens after. So you need to make assumption.
The simple assumption is to assume the prior is “normal distribution”
beta is conjugate prior of binomial likelihood function
\documentclass[10pt]{article}
\usepackage[usenames]{color} %used for font color
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{bm}
\usepackage[utf8]{inputenc} %useful to type directly diacritic characters
\newcommand{\defined}{\stackrel{{\rm def}}{=}}
Beta(1,1) is uniform. If you randomly draw a number from Beta, the chance you see 0 or 0.5 or 1 are the same.
When alpha and beta is large enough, it’s a bell curve (normal distribution)
75.46%
10.9%
3.5%
0.188%
TS is a strategy of MAP
Jello
Multiple choice to dispaly
TS is a strategy of MAP
M=1, how quickly we can find the best?
Gamma = “when no response, how much discount you would like to apply to penalty (beta)”?