In this talk Ilias will discuss some variations of the Multi-Armed Bandits (MABs), a less popular although important area of Machine Learning. MABs enable us to build adaptive systems capable of finding solutions for tasks based on the interactions with their environment. MABs solve a task by acquiring useful knowledge at every step of an iterative process while they balance the exploration-exploitation dilemma. They are used to tackle practical problems like selecting appropriate online ads and personalized content for presentation to users; assigning people to cohorts in controlled trials; supporting decision making and more. To solve these kinds of problems solutions need to be identified as fast as possible since accepting errors can be costly. Ilias will discuss some examples from industry and academia as well as some of the related work at Atlassian.
2. Motivation
Increase awareness of some very useful but less
known techniques
Demo some current work at Atlassian
Connect it with some research from my past
Hopefully, there will be something useful for everybody
— apologies for the few equations and loose notation
6. 1. e-greedy: the best arm is selected for a proportion
of 1-e of the trials and a random arm in e trials.
2. e-greedy with variable e
3. Pure exploration first, then pure exploitation.
4. …
5. Thompson sampling
(Draw from the estimated beta-distrom
6. Upper Confidence Bound (UCB)
Many solutions…
7.
8.
9.
10.
11.
12.
13.
14.
15. Disadvantages
Reaching significance for
non-winning arms takes
longer
Unclear stopping criteria
Hard to order non-winning
arms and assess reliably
their impact
Advantages
Reaching significance for
the winning arm is faster
Best arm can change over
time
There are no false
positives in the long term
19. We introduce a notion
of proximity or similarity
between arms
A -> {xA,1, xA,2, xA,3…}
B -> {xB,1, xB,2, xB,3…}
Contextual Multi-Armed Bandits
20. LinUCB
L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to Personalized News
Article Recommendation”, WWW, 2010.
The UCB is some expectation plus some confidence level:
µ↵(t) + ↵(t)
We assume there is some unknown vector θ∗, the same for each arm,
for which:
E[ra,t|xa,t] = xT
a,t✓⇤
21. ˆ✓t := C 1
t XT
t yt
Xt := {xa(1),1, xa(2),2, . . . , xa(t),t}T
yt := {ra(1),1, ra(2),2, . . . , ra(t),t}T
Ct := XT
t Xt
Using least squares:
ˆµa(t) := xT
a,t
ˆ✓t
E[ra,t|xa,t] = xT
a,t✓⇤
µ↵(t) + ↵(t)
ˆµa := xT
a,tC 1
t XT
t yt
22. The upper confidence bound is some expectation plus some confidence level:
µ↵(t) + ↵(t)
ˆ(t) :=
q
xT
a,tC 1
t xa,tˆµa := xT
a,tC 1
t XT
t yt
23. L. Li, W. Chu, J. Langford, R. E. Schapire, A Contextual-Bandit Approach to Personalized News Article
Recommendation, WWW, 2010.
25. • How can we locate
the city of Bristol from
tweets?
• 10K candidate
locations organised in
a 100x100 grid
• At every step we get
tweets from one
location and count
mentions of “Bristol”
• Challenge: find the
target in sub-linear
time complexity!
27. John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”,
Cambridge University press, 2004.
The Kernel trick! —no, it’s not just for SVMs
28. ˆµa(t) := xT
a,t
ˆ✓t ˆµa(t) = kT
x,tK 1
t yt
ˆa(t) =
q
tkT
x,tK 2
t kx,tˆ(t) :=
q
xT
a,tC 1
t xa,t
Ct := XT
t Xt Kt = XtXT
t
LinUCB:
M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of
Kernelised Contextual Bandits”, UAI, 2013.
KernelUCB:
29. • The last few steps
of the algorithm
before it locates
Bristol.
• KernelUCB with
RBF kernel
converges after
~300 iterations
(instead of >>10K).
30. Target is the red dot.
We locate it using KernelUCB with RBF kernel.
KernelUCB code: http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB
31. What if we have a high-dimensional space?
Hashing trick
Implementation in
Vowpal Wabbit,
by J. Langford, et al.
32.
33. References
M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of
Kernelised Contextual Bandits”, UAI, 2013.
L. Li, W. Chu, J. Langford, R. E. Schapire, “A Contextual-Bandit Approach to
Personalized News Article Recommendation”, WWW, 2010.
John-Shawe Taylor & Nello Cristianini, “Kernel Methods for Pattern Analysis”,
Cambridge University press, 2004.
Implementation of KernelUCB in Complacs toolkit:
http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB
https://en.wikipedia.org/wiki/Multi-armed_bandit
https://github.com/JohnLangford/vowpal_wabbit/wiki/Contextual-Bandit-Example
34. Thank you -
We are hiring!
Dr Ilias Flaounas
Senior Data Scientist
<first>.<last>@atlassian.com