Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Examples include large datasets containing sequential data capturing document dynamics and modern IR systems observing user dynamics through interactivity. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive.
The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling. Dynamic IR Modeling is the statistical modeling of IR systems that can adapt to change. It is a natural follow-up to previous statistical IR modeling tutorials with a fresh look on state-of-the-art dynamic retrieval models and their applications including session search and online advertising. The tutorial covers techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and presents to fellow researchers and practitioners a handful of useful algorithms and tools for solving IR problems incorporating dynamics.
http://www.dynamic-ir-modeling.org/
A newer version of this tutorial presented at WSDM 2015 can be found here http://www.slideshare.net/marcCsloan/dynamic-information-retrieval-tutorial-wsdm-2015
This version has a greater emphasis on the underlying theory and a guest lecture on evaluation by Dr Emine Yilmaz. The newer version presents a wider range of applications of DIR in state of the art research and includes a guest lecture on evaluation by Prof Charles Clarke.
@inproceedings{Yang:2014:DIR:2600428.2602297,
author = {Yang, Hui and Sloan, Marc and Wang, Jun},
title = {Dynamic Information Retrieval Modeling},
booktitle = {Proceedings of the 37th International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval},
series = {SIGIR '14},
year = {2014},
isbn = {978-1-4503-2257-7},
location = {Gold Coast, Queensland, Australia},
pages = {1290--1290},
numpages = {1},
url = {http://doi.acm.org/10.1145/2600428.2602297},
doi = {10.1145/2600428.2602297},
acmid = {2602297},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {dynamic information retrieval modeling, probabilistic relevance model, reinforcement learning},
}
4. Dynamic Information Retrieval
Dynamic Information Retrieval ModelingTutorial 20144
Documents
to explore Information
need
Observed
documents
User
Devise a strategy for
helping the user
explore the
information space in
order to learn which
documents are
relevant and which
aren’t, and satisfy
their information
need.
5. Evolving IR
Dynamic Information Retrieval ModelingTutorial 20145
Paradigm shifts in IR as new models emerge
e.g.VSM → BM25 → Language Model
Different ways of defining relationship between
query and document
Static → Interactive → Dynamic
Evolution in modeling user interaction with search
engine
6. Outline
Dynamic Information Retrieval ModelingTutorial 20146
Introduction
Static IR
Interactive IR
Dynamic IR
Theory and Models
Session Search
Reranking
GuestTalk: Evaluation
7. Conceptual Model – Static IR
Dynamic Information Retrieval ModelingTutorial 20147
Static IR
Interactive
IR
Dynamic
IR
No feedback
8. Characteristics of Static IR
Dynamic Information Retrieval ModelingTutorial 20148
Does not learn directly from user
Parameters updated periodically
12. Outline
Dynamic Information Retrieval ModelingTutorial 201412
Introduction
Static IR
Interactive IR
Dynamic IR
Theory and Models
Session Search
Reranking
GuestTalk: Evaluation
13. Conceptual Model – Interactive IR
Dynamic Information Retrieval ModelingTutorial 201413
Static IR
Interactive
IR
Dynamic
IR
Exploit Feedback
15. Learn the user’s taste
interactively!
At the same time, provide good
recommendations!
Dynamic Information Retrieval ModelingTutorial 201415
Interactive Recommender
Systems
16. Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201416
Ambiguous
Query
17. Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201417
Topic: Car
18. Example - Multi Page Search
Dynamic Information Retrieval ModelingTutorial 201418
Topic:Animal
19. Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201419
Click on ‘car’
webpage
20. Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201420
Click on ‘Next
Page’
21. Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201421
Page 2 results:
Cars
22. Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201422
Click on ‘animal’
webpage
23. Example – Interactive Search
Dynamic Information Retrieval ModelingTutorial 201423
Page 2 results:
Animals
24. Example – Dynamic Search
Dynamic Information Retrieval ModelingTutorial 201424
Topic: Guitar
25. Example – Dynamic Search
Dynamic Information Retrieval ModelingTutorial 201425
Diversified Page
1
Topics: Cars,
animals, guitars
26. Toy Example
Dynamic Information Retrieval ModelingTutorial 201426
Multi-Page search scenario
User image searches for “jaguar”
Rank two of the four results over two pages:
𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49
27. Toy Example – Static Ranking
Dynamic Information Retrieval ModelingTutorial 201427
Ranked according to PRP
Page 1 Page 2
1.
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
28. Toy Example – Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201428
Interactive Search
Improve 2nd page based on feedback from 1st page
Use clicks as relevance feedback
Rocchio1 algorithm on terms in image webpage
𝑤 𝑞
′
= 𝛼𝑤 𝑞 +
𝛽
|𝐷 𝑟|
𝑤 𝑑𝑑∈𝐷 𝑟
−
𝛾
𝐷 𝑛
𝑤 𝑑𝑑∈𝐷 𝑛
New query closer to relevant documents and
different to non-relevant documents
1Rocchio, J. J., ’71, Baeza-Yates &
Ribeiro-Neto‘99
29. Toy Example – Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201429
Ranked according to PRP and Rocchio
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
𝑟 = 0.5
𝑟 = 0.49
1.
*
* Click
30. Toy Example – Relevance
Feedback
Dynamic Information Retrieval ModelingTutorial 201430
No click when searching for animals
Page 1 Page 2
2.
𝑟 = 0.9
𝑟 = 0.51
1.
2.
1.
?
?
31. Toy Example – Value Function
Dynamic Information Retrieval ModelingTutorial 201431
Optimize both pages using dynamic IR
Bellman equation for value function
Simplified example:
𝑉 𝑡
𝜃 𝑡
, Σ 𝑡
= max
𝑠 𝑡
𝜃𝑠
𝑡
+ 𝐸(𝑉 𝑡+1
𝜃 𝑡+1
, Σ 𝑡+1
𝐶 𝑡
)
𝜃 𝑡
, Σ 𝑡
= relevance and covariance of documents for page 𝑡
𝐶 𝑡 = clicks on page 𝑡
𝑉 𝑡 =‘value’ of ranking on page 𝑡
Maximize value over all pages based on estimating feedback
32. 1 0.8 0.1 0
0.8 1 0.1 0
0.1 0.1 1 0.95
0 0 0.95 1
Toy Example - Covariance
Dynamic Information Retrieval ModelingTutorial 201432
Covariance matrix represents similarity between images
33. Toy Example – Myopic Value
Dynamic Information Retrieval ModelingTutorial 201433
For myopic ranking, 𝑉2
= 16.380
Page 1
2.
1.
34. Toy Example – Myopic Ranking
Dynamic Information Retrieval ModelingTutorial 201434
Page 2 ranking stays the same regardless of clicks
Page 1 Page 2
2.
1.
2.
1.
35. Toy Example – Optimal Value
Dynamic Information Retrieval ModelingTutorial 201435
For optimal ranking, 𝑉2
= 16.528
Page 1
2.
1.
36. Toy Example – Optimal Ranking
Dynamic Information Retrieval ModelingTutorial 201436
If car clicked, Jaguar logo is more relevant on next page
Page 1 Page 2
2.
1.
2.
1.
37. Toy Example – Optimal Ranking
Dynamic Information Retrieval ModelingTutorial 201437
In all other scenarios, rank animal first on next page
Page 1 Page 2
2.
1.
2.
1.
38. Interactive vs Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201438
• Treats interactions
independently
• Responds to
immediate
feedback
• Static IR used
before feedback
received
• Optimizes over
all interaction
• Long term gains
• Models future
user feedback
• Also used at
beginning of
interaction
Interactive Dynamic
39. Outline
Dynamic Information Retrieval ModelingTutorial 201439
Introduction
Static IR
Interactive IR
Dynamic IR
Theory and Models
Session Search
Reranking
GuestTalk: Evaluation
40. Conceptual Model – Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201440
Static IR
Interactive
IR
Dynamic
IR
Explore and exploit Feedback
41. Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201441
Rich interactions
Query formulation
Document clicks
Document examination
eye movement
mouse movements
etc.
42. Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201442
Temporal dependency
clicked
documentsquery
D1
ranked documents
q1 C1
D2
q2 C2
……
…… Dn
qn Cn
I
information need
iteration 1 iteration 2 iteration n
43. Characteristics of Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201443
Overall goal
Optimize over all iterations for goal
IR metric or user satisfaction
Optimal policy
44. Dynamic IR
Dynamic Information Retrieval ModelingTutorial 201444
Dynamic IR explores actions
Dynamic IR learns from user and adjusts its
actions
May hurt performance in a single stage, but
improves over all stages
45. Applications to IR
Dynamic Information Retrieval ModelingTutorial 201445
Dynamics found in lots of different aspects of IR
Dynamic Users
Users change behaviour over time, user history
Dynamic Documents
Information Filtering, document content change
Dynamic Queries
Changing query definition i.e.‘Twitter’
Dynamic Information Needs
Topic ontologies evolve over time
Dynamic Relevance
Seasonal/time of day change in relevance
46. User Interactivity in DIR
Dynamic Information Retrieval ModelingTutorial 201446
Modern IR interfaces
Facets
Verticals
Personalization
Responsive to particular user
Complex log data
Mobile
Richer user interactions
Ads
Adaptive targeting
47. Big Data
Dynamic Information Retrieval ModelingTutorial 201447
Data set sizes are always increasing
Computational footprint of learning to rank
Rich, sequential data
1Yin He et. al, ’11
Complex user model behaviour found in data, takes into
account reading, skipping and re-reading behaviours1
Uses a POMDP
Example
48. Online Learning to Rank
Dynamic Information Retrieval ModelingTutorial 201448
Learning to rank iteratively on sequential data
Clicks as implicit user feedback/preference
Often uses multi-armed bandit techniques
1Katja Hofmann et. al., ’11
2YisongYue et. al.,‘09
Uses click models to interpret clicks and a contextual
bandit to improve learning1
Pairwise comparison of rankings using duelling bandits
formulation2
Example
49. Evaluation
Dynamic Information Retrieval ModelingTutorial 201449
Use complex user interaction data to assess rankings
Compare ranking techniques in online testing
Minimise user dissatisfaction
1Jeff Huang et. al.,‘11
2Olivier Chapelle et. al.,‘12
Modelled cursor activity and correlated with eye tracking to
validate good or bad abandonment1
Interleave search results from two ranking algorithms to
determine which is better2
Example
50. Filtering and News
Dynamic Information Retrieval ModelingTutorial 201450
Adaptive techniques to personalize information filtering
or news recommendation
Understand the complex dynamics of real world events
in search logs
Capture temporal document change1
1Dennis Fetterly et. al.,‘03
2Stephen Robertson,‘02
3Jure Leskovec et. al.,‘09
Uses relevance feedback to adapt threshold sensitivity over
time in information filtering to maximise overal utility1
Detected patterns and memes in news cycles and modeled
how information spreads2
Example
51. Advertising
Dynamic Information Retrieval ModelingTutorial 201451
Behavioural targeting and personalized ads
Learn when to display new ads
Maximise profit from available ads
1ShuaiYuan et. al.,‘12
2ZeyuanAllen Zhu et. al.,‘10
Uses a POMDP and ad correlation to find the optimal ad to
display to a user1
Dynamic click model that can interpret complex user
behaviour in logs and apply results to tail queries and unseen
ads2
Example
52. Outline
Dynamic Information Retrieval ModelingTutorial 201452
Introduction
Theory and Models
Session Search
Reranking
GuestTalk: Evaluation
53. Outline
Dynamic Information Retrieval ModelingTutorial 201453
Introduction
Theory and Models
Why not use supervised learning
Markov Models
Session Search
Reranking
Evaluation
54. Why not use Supervised Learning
for Dynamic IR Modeling?
Dynamic Information Retrieval ModelingTutorial 201454
Lack of enough training data
Dynamic IR problems contain a sequence of dynamic interactions
E.g. a series of queries in session
Rare to find repeated sequences (close to zero)
Even in large query logs (WSCD 2013 & 2014, query logs fromYandex)
Chance of finding repeated adjacent query pairs is
also low
Dataset Repeated Adjacent
Query Pairs
Total Adjacent
Query Pairs
Repeated
Percentage
WSCD 2013 476,390 17,784,583 2.68%
WSCD 2014 1,959,440 35,376,008 5.54%
55. Our Solution
Dynamic Information Retrieval ModelingTutorial 201455
Try to find an optimal solution through a
sequence of dynamic interactions
Trial and Error:
learn from repeated, varied attempts which
are continued until success
No Supervised Learning
57. Dynamic Information Retrieval ModelingTutorial 201457
Rich interactions
Query formulation, Document clicks, Document examination,
eye movement, mouse movements, etc.
Temporal dependency
Overall goal
Recap – Characteristics of
Dynamic IR
58. Dynamic Information Retrieval ModelingTutorial 201458
Model interactions, which means it needs to have place holders for
actions;
Model information need hidden behind user queries and other
interactions;
Set up a reward mechanism to guide the entire search algorithm to adjust
its retrieval strategies;
Represent Markov properties to handle the temporal dependency.
What is a Desirable Model for
Dynamic IR
A model inTrial and Error setting will do!
A Markov Model will do!
59. Outline
Dynamic Information Retrieval ModelingTutorial 201459
Introduction
Theory and Models
Why not use supervised learning
Markov Models
Session Search
Reranking
Evaluation
60. Markov Process
Markov Property1 (the “memoryless” property)
for a system, its next state depends on its current state.
Pr(Si+1|Si,…,S0)=Pr(Si+1|Si)
Markov Process
a stochastic process with Markov property.
e.g.
Dynamic Information Retrieval ModelingTutorial 201460 1A.A. Markov,‘06
s0 s1
…… si
……si+1
61. Dynamic Information Retrieval ModelingTutorial 201461
Markov Chain
Hidden Markov Model
Markov Decision Process
Partially Observable Markov Decision Process
Multi-armed Bandit
Family of Markov Models
62. A
Pagerank(A)
Discrete-time Markov process
Example: Google PageRank1
Markov Chain
B
Pagerank(B)
𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘 𝑆 =
1 − 𝛼
𝑁
+ 𝛼
𝑃𝑎𝑔𝑒𝑟𝑎𝑛𝑘(𝑌)
𝐿(𝑌)
𝑌∈Π
# of pages # of outlinks
pages linked to S
Dynamic Information Retrieval ModelingTutorial 201462
D
Pagerank(D)
C
Pagerank(C)
E
Pagerank(E)
Random jump factor
1L. Page et. al.,‘99
The stable state distribution of such an MC is PageRank
State S – web page
Transition probability M
PageRank: how likely a random web
surfer will land on a page
(S, M)
63. Hidden Markov Model
A Markov chain that states are hidden and observable
symbols are emitted with some probability according to its
states1.
Dynamic Information Retrieval ModelingTutorial 201463
s0 s1 s2
……
o0 o1 o2
p0
𝑒0
p1 p2
𝑒1 𝑒2
Si– hidden state pi -- transition probability oi --observation
ei --observation probability (emission probability)
1Leonard E. Baum et. al.,‘66
(S, M, O, e)
64. An HMM example for IR
Construct an HMM for each document1
Dynamic Information Retrieval ModelingTutorial 201464
s0 s1 s2 ……
t0 t1 t2
p0
𝑒0
p1 p2
𝑒1 𝑒2
Si– “Document” or
“General English”
pi –a0 or a1
ti – query term
ei – Pr(t|D) or Pr(t|GE)
P(D|q)∝ (𝑎0 𝑃 𝑡 𝐺𝐸 + 𝑎1 𝑃(𝑡|𝐷))𝑡∈𝑞
Document-to-query relevance
1Miller et. al.‘99
query
65. MDP extends MC with actions and rewards1
si– state ai – action ri – reward
pi – transition probability
p0 p1 p2
Markov Decision Process
Dynamic Information Retrieval ModelingTutorial 201465
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
1R. Bellman,‘57
(S, M, A, R, γ)
66. Definition of MDP
A tuple (S, M, A, R, γ)
S : state space
M: transition matrix
Ma(s, s') = P(s'|s, a)
A: action space
R: reward function
R(s,a) = immediate reward taking action a at state s
γ: discount factor, 0< γ ≤1
policy π
π(s) = the action taken at state s
Goal is to find an optimal policy π* maximizing the expected
total rewards.
Dynamic Information Retrieval ModelingTutorial 201466
67. Policy
Policy: (s) = a
According to which,
select an action a at
state s.
(s0) =move right and ups0
(s1) =move right and ups1
(s2) = move rights2
Dynamic Information Retrieval ModelingTutorial 201467 [Slide altered from Carlos Guestrin’s ML lecture]
68. Value of Policy
Value:V(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
(s0)
V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by [0,1)
Dynamic Information Retrieval ModelingTutorial 201468 [Slide altered from Carlos Guestrin’s ML lecture]
69. Value of Policy
Value:V(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
(s0)
V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by [0,1)
s1
R(s1)
s1’’
s1’
R(s1’)
R(s1’’)
Dynamic Information Retrieval ModelingTutorial 201469 [Slide altered from Carlos Guestrin’s ML lecture]
70. Value of Policy
Value:V(s)
Expected long-term
reward starting from s
Start from s0
s0
R(s0)
(s0)
V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3)
+ 4 R(s4) + ]
Future rewards
discounted by [0,1)
s1
R(s1)
s1’’
s1’
R(s1’)
R(s1’’)
(s1)
R(s2)
s2
(s1’)
(s1’’)
s2’’
s2’
R(s2’)
R(s2’’)
Dynamic Information Retrieval ModelingTutorial 201470 [Slide altered from Carlos Guestrin’s ML lecture]
71. Computing the value of a policy
Dynamic Information Retrieval ModelingTutorial 201471
V(s0) = 𝐸 𝜋
[𝑅 𝑠0, 𝑎 + 𝛾𝑅 𝑠1, 𝑎 + 𝛾2 𝑅 𝑠2, 𝑎 + 𝛾3 𝑅 𝑠3, 𝑎 + ⋯ ]
=𝐸 𝜋[𝑅 𝑠0, 𝑎 + 𝛾 𝛾 𝑡−1 𝑅(𝑠𝑡, 𝑎)∞
𝑡=1 ]
=𝑅 𝑠0, 𝑎 + 𝛾𝐸 𝜋
[ 𝛾 𝑡−1
𝑅(𝑠𝑡, 𝑎)∞
𝑡=1 ]
=𝑅 𝑠0, 𝑎 + 𝛾 𝑀 𝜋 𝑠 (𝑠, 𝑠′) 𝑉(𝑠′)𝑠′
Value function
A possible next state
The current
state
72. Optimality — Bellman Equation
The Bellman equation1 to MDP is a recursive definition of
the optimal value function V*(.)
𝑉∗ s = max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
𝑠′
Dynamic Information Retrieval ModelingTutorial 201472
Optimal Policy
π∗ s = arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′)
𝑠′
1R. Bellman,‘57
state-value function
73. Optimality — Bellman Equation
The Bellman equation can be rewritten as
𝑉∗ 𝑠 = max
a
𝑄(𝑠, 𝑎)
𝑄(𝑠, 𝑎) = 𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)
𝑠′
Dynamic Information Retrieval ModelingTutorial 201473
Optimal Policy
π∗ s = arg 𝑚𝑎𝑥
𝑎
𝑄 𝑠, 𝑎
action-value function
Relationship
betweenV and Q
74. MDP algorithms
Dynamic Information Retrieval ModelingTutorial 201474
Value Iteration
Policy Iteration
Modified Policy Iteration
Prioritized Sweeping
Temporal Difference (TD) Learning
Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard,‘60, Puterman and Shin,‘78, Singh & Sutton,‘96, Sutton & Barto,‘98,
Richard Sutton,‘88,Watkins,‘92]
Solve Bellman
equation
Optimal
valueV*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML lecture]
75. Value Iteration
Initialization
Initialize 𝑉0 𝑠 arbitrarily
Loop
Iteration
𝑉𝑖+1 𝑠 ← max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
π s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
Stopping criteria
π s is good enough
Dynamic Information Retrieval ModelingTutorial 201475
1Bellman,‘57
77. Greedy Value Iteration
1. For each state s∈S
Initialize V0(s) arbitrarily
End for
2. 𝑖 ← 0
3. Repeat
3.1 𝑖 ← 𝑖 + 1
3.2 For each 𝑠 ∈ 𝑆
𝑉𝑖 𝑠 ← max
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖−1(𝑠′)𝑠′
end for
until ∀𝑠 𝑉𝑖 𝑠 − 𝑉𝑖−1 𝑠 < ε
4. For each 𝑠 ∈ 𝑆
π s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎(𝑠, 𝑠′)𝑉𝑖(𝑠′)𝑠′
end for
Algorithm
Dynamic Information Retrieval ModelingTutorial 201477
81. Policy Iteration
1.For each state s∈S
𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0
End for
2. Repeat
2.1 Repeat
For each 𝑠 ∈ 𝑆
𝑉′(𝑠) ← 𝑉(𝑠)
𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′
End for
until ∀𝑠 𝑉 𝑠 − 𝑉′ 𝑠 < ε
2.2 For each 𝑠 ∈ 𝑆
π𝑖+1 s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′
𝑉(𝑠′)
𝑠′
End for
2.3 𝑖 ← 𝑖 + 1
Until π𝑖 = π𝑖−1
Algorithm
Dynamic Information Retrieval ModelingTutorial 201481
82. Modified Policy Iteration
The “Policy Evaluation” step in Policy Iteration is time-
consuming, especially when the state space is large.
The Modified Policy Iteration calculates an approximated
policy evaluation by running just a few iterations
Dynamic Information Retrieval ModelingTutorial 201482
Modified Policy
Iteration
Policy Iteration
GreedyValue Iterationk=1
k=∞
83. Modified Policy Iteration
1.For each state s∈S
𝑉 𝑠 ←0, π0 s ← 𝑎𝑟𝑏𝑖𝑡𝑟𝑎𝑟𝑦 𝑝𝑜𝑙𝑖𝑐𝑦 , 𝑖 ← 0
End for
2. Repeat
2.1 Repeat k times
For each 𝑠 ∈ 𝑆
𝑉 𝑠 ← 𝑅 𝑠, π𝑖 s + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)𝑠′
End for
2.2 For each 𝑠 ∈ 𝑆
π𝑖+1 s ← arg 𝑚𝑎𝑥
𝑎
𝑅 𝑠, 𝑎 + 𝛾 𝑀 𝑎 𝑠, 𝑠′ 𝑉(𝑠′)
𝑠′
End for
2.3 𝑖 ← 𝑖 + 1
Until π𝑖 = π𝑖−1
Algorithm
Dynamic Information Retrieval ModelingTutorial 201483
84. MDP algorithms
Dynamic Information Retrieval ModelingTutorial 201484
Value Iteration
Policy Iteration
Modified Policy Iteration
Prioritized Sweeping
Temporal Difference (TD) Learning
Q-Learning
Model free
approaches
Model-based
approaches
[Bellman, ’57, Howard,‘60, Puterman and Shin,‘78, Singh & Sutton,‘96, Sutton & Barto,‘98,
Richard Sutton,‘88,Watkins,‘92]
Solve Bellman
equation
Optimal
valueV*(s)
Optimal
policy *(s)
[Slide altered from Carlos Guestrin’s ML lecture]
85. Temporal Difference Learning
Dynamic Information Retrieval ModelingTutorial 201485
Monte Carlo Sampling can be used for model-free policy iteration
Estimate 𝑉 𝜋 s in “Policy Evaluation” by the average reward of trajectories from s
However, on the trajectories, some of them can be reused
So, we estimate them by an expectation over next state
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝑟 + γ𝐸 𝑉 𝜋 𝑠′
|𝑠, 𝑎
The simplest estimation:
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝑟 + 𝛾𝑉 𝜋 s′
A smoothed version:
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 s′
+ (1 − 𝛼) 𝑉 𝜋 𝑠
TD-Learning rule:
𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′ − 𝑉 𝜋(𝑠)
r is the immediate reward, α is the learning rate
Temporal difference
Richard Sutton,‘88
Singh & Sutton,‘96
Sutton & Barto,‘98
86. Dynamic Information Retrieval ModelingTutorial 201486
1. For each state s∈S
Initialize V 𝜋(s) arbitrarily
End for
2. For each step in the state sequence
2.1 Initialize s
2.2 repeat
2.2.1 take action a at state s according to 𝜋
2.2.2 observe immediate reward r and the next state 𝑠′
2.2.3 𝑉 𝜋 s ← 𝑉 𝜋 𝑠 + 𝛼 𝑟 + 𝛾𝑉 𝜋 𝑠′
− 𝑉 𝜋(𝑠)
2.2.4 𝑠 ← 𝑠′
Until s is a terminal state
End for
Algorithm
Temporal Difference Learning
88. Q-Learning
Dynamic Information Retrieval ModelingTutorial 201488
1. For each state s∈S and a∈A
initialize Q0(s,a) arbitrarily
End for
2. 𝑖 ← 0
3. For each step in the state sequence
3.1 Initialize s
3.2 Repeat
3.2.1 𝑖 ← 𝑖 + 1
3.2.2 select an action a at state s according to Qi-1
3.2.3 take action a, observe immediate reward r and the next state 𝑠′
3.2.4 𝑄𝑖 𝑠, 𝑎 ← 𝑄𝑖−1 𝑠, 𝑎 + 𝛼 𝑟 + 𝛾 max
𝑎′
𝑄𝑖−1 𝑠′
, 𝑎′
− 𝑄𝑖−1(𝑠, 𝑎)
3.2.5 𝑠 ← 𝑠′
Until s is a terminal state
End for
4. For each 𝑠 ∈ 𝑆
π s ← arg 𝑚𝑎𝑥
𝑎
𝑄𝑖 𝑠, 𝑎
End for
Algorithm
89. Apply an MDP to an IR Problem
Dynamic Information Retrieval ModelingTutorial 201489
We can model IR systems using a Markov Decision
Process
Is there a temporal component?
States –What changes with each time step?
Actions – How does your system change the state?
Rewards – How do you measure feedback or
effectiveness in your problem at each time step?
Transition Probability – Can you determine this?
If not, then model free approach is more suitable
90. Apply an MDP to an IR Problem -
Example
Dynamic Information Retrieval ModelingTutorial 201490
User agent in session search
States – user’s relevance judgement
Action – new query
Reward – information gained
91. Apply an MDP to an IR Problem -
Example
Dynamic Information Retrieval ModelingTutorial 201491
Search engine’s perspective
What if we can’t directly observe user’s relevance
judgement?
Click ≠ relevance
? ? ? ?
92. Dynamic Information Retrieval ModelingTutorial 201492
Markov Chain
Hidden Markov Model
Markov Decision Process
Partially Observable Markov Decision Process
Multi-armed Bandit
Family of Markov Models
93. POMDP Model
Dynamic Information Retrieval ModelingTutorial 201493
……s0 s1
r0
a0
s2
r1
a1
s3
r2
a2
Hidden states
Observations
Belief
1R. D. Smallwood et. al.,‘73
o1 o2 o3
94. POMDP Definition
Dynamic Information Retrieval ModelingTutorial 201494
A tuple (S, M,A, R, γ, O, Θ, B)
S : state space
M: transition matrix
A: action space
R: reward function
γ: discount factor, 0< γ ≤1
O: observation set
an observation is a symbol emitted according to a hidden state.
Θ: observation function
Θ(s,a,o) is the probability that o is observed when the system transitions
into state s after taking action a, i.e. P(o|s,a).
B: belief space
Belief is a probability distribution over hidden states.
95. Dynamic Information Retrieval ModelingTutorial 201495
The agent uses a state estimator to update its belief about the
hidden states
b′
= 𝑆𝐸(𝑏, 𝑎, 𝑜′)
b′
s′
= P s′
o′
, a, b =
𝑃(𝑠′,𝑜′|𝑎,𝑏)
P(𝑜′|𝑎,𝑏)
=
Θ(𝑠′, 𝑎, 𝑜′) 𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)𝑠
𝑃(𝑜′|𝑎, 𝑏)
POMDP → Belief Update
96. Dynamic Information Retrieval ModelingTutorial 201496
The Bellman equation for POMDP
𝑉 𝑏 = max
𝑎
𝑟 𝑏, 𝑎 + 𝛾 𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′)
𝑜′
A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A, r, γ)
B : the continuous belief space
𝑀′: transition function 𝑀 𝑎
′ (𝑏, 𝑏′)= 1 𝑎,𝑜′(𝑏′, 𝑏)Pr(𝑜′|𝑎, 𝑏)𝑜∈𝑂
where 1 𝑎,𝑜′ 𝑏′
, 𝑏 =
1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′
0, 𝑒𝑙𝑠𝑒
.
A: action space
r: reward function r(b, a)= 𝑏 𝑠 𝑅(𝑠, 𝑎)𝑠∈𝑆
POMDP → Bellman Equation
97. Dynamic Information Retrieval ModelingTutorial 201497
The optimal policy of a POMDP
The optimal policy of its belief MDP
1L. Kaelbling et. al., ’98
A variation of the value iteration algorithm
Solving POMDPs – The Witness
Algorithm
98. Policy Tree
Dynamic Information Retrieval ModelingTutorial 201498
• A policy tree of depth i is an i-step non-stationary policy
• As if we run value iteration until the ith iteration
a(h)
ok(h) ok
a11
a21
a2k a2l
… …
…
…
…
… … … … … …
o1 ol
…aik
…
a(i-1)k
ai1
ail
o1 olok
i steps to go
i-1 steps to go
2 steps to go
1 step to go
99. Value of a Policy Tree
Dynamic Information Retrieval ModelingTutorial 201499
Can only determine the value of a policy tree h from some belief state
b, because it never knows the exact state.
𝑉ℎ 𝑏 = 𝑏(𝑠)𝑉ℎ(𝑠)𝑠∈𝑆
𝑉ℎ 𝑠 = 𝑅 𝑠, 𝑎 ℎ + 𝛾 𝑀 𝑎 ℎ (𝑠, 𝑠′) Θ(𝑠′, 𝑎 ℎ , 𝑜𝑖)𝑉𝑜 𝑘 ℎ (𝑠′)𝑜 𝑘∈𝑂𝑠′∈𝑆
the action at the
root node of h
the (i-1)-step subtree associated
with ok under the root node of h
100. Idea of the Witness Algorithm
Dynamic Information Retrieval ModelingTutorial 2014100
For each action a, compute Γ𝑖
𝑎
, the set of candidate i-step policy
trees with action a at their roots
The optimal value function at the ith step, 𝑉𝑖
∗
(b), is the upper
surface of the value functions of all i-step policy trees.
101. Optimal value function
Dynamic Information Retrieval ModelingTutorial 2014101
Geometrically, 𝑉𝑖
∗
(b) is piecewise linear and convex.
An example for a two-state POMDP
b(s1)+b(s2)=1
Simplex constraint
The belief space is one-dimensional
Vh2(b)
Vh3(b)
Vh1(b)
Vh5(b)
Vh4(b)
𝑉𝑖
∗
𝑏 = max
ℎ∈H
𝑉ℎ 𝑏
Pruning the Set of
PolicyTrees
102. Outlines of the Witness Algorithm
Dynamic Information Retrieval ModelingTutorial 2014102
Algorithm
1.𝐻1 ←{}
2. i ← 1
3. Repeat
3.1 i ← i+1
3.2 For each a in A
Γ𝑖
𝑎
← witness(𝐻i−1, a)
end for
3.3 Prune Γ𝑖
𝑎
𝑎 to get 𝐻i
until 𝑠𝑢𝑝 𝑏|Vi(b) − Vi−1(b)| < 𝜀
the inner loop
103. Inner Loop of the Witness
Algorithm
Dynamic Information Retrieval ModelingTutorial 2014103
Inner loop of the witness algorithm
1. Select a belief b arbitrarily. Generate a best i-step policy tree hi. Add
ℎi to an agenda.
2. In each iteration
2.1 Select a policy tree ℎ 𝑛𝑒𝑤 from the agenda.
2.2 Look for a witness point b using Za and ℎ 𝑛𝑒𝑤.
2.3 If find such a witness point b,
2.3.1 Calculate the best policy tree ℎ 𝑏𝑒𝑠𝑡 for b.
2.3.2 Add ℎ 𝑏𝑒𝑠𝑡 to Za.
2.3.3 Add all the alternative trees of ℎ 𝑏𝑒𝑠𝑡 to the agenda.
2.4 Else remove ℎ 𝑛𝑒𝑤 from the agenda.
3. Repeat the above iteration until the agenda is empty.
105. Dynamic Information Retrieval ModelingTutorial 2014105
POMDP Dynamic IR
Environment Documents
Agents User, Search engine
States Queries, User’s decision making status, Relevance of
documents, etc
Actions Provide a ranking of documents, Weigh terms in the query,
Add/remove/unchange the query terms, Switch on or
switch off a search technology, Adjust parameters for a
search technology
Observations Queries, Clicks, Document lists, Snippets, Terms, etc
Rewards Evaluation measures (such as DCG, NDCG or MAP)
Clicking information
Transition matrix Given in advance or estimated from training data.
Observation
function
Problem dependent, Estimated based on sample datasets
Applying POMDP to Dynamic IR
106. Session Search Example - States
SRT
Relevant &
Exploitation
SRR
Relevant &
Exploration
SNRT
Non-Relevant &
Exploitation
SNRR
Non-Relevant &
Exploration
scooter price ⟶ scooter stores Hartford visitors ⟶ Hartford
Connecticut tourism
Philadelphia NYC travel ⟶
Philadelphia NYC train
distance NewYork Boston ⟶
maps.bing.com
q0
106 [ J. Luo ,et al., ’14]
107. Session Search Example - Actions
(Au, Ase)
User Action(Au)
Add query terms (+Δq)
Remove query terms (-Δq)
keep query terms (qtheme)
clicked documents
SAT clicked documents
Search Engine Action(Ase)
increase/decrease/keep term weights,
Switch on or switch off query expansion
Adjust the number of top documents used in PRF
etc.
107 [ J. Luo et al., ’14]
108. Multi Page Search Example -
States & Actions
Dynamic Information Retrieval ModelingTutorial 2014108
State:
Relevance
of
document
Action:
Ranking of
documents
Observation:
Clicks
Belief: Multivariate
Guassian
Reward: DCG over 2
pages
[Xiaoran Jin et. al., ’13]
109. SIGIRTutorial July 7th 2014
Grace Hui Yang
Marc Sloan
JunWang
Guest Speaker: EmineYilmaz
Dynamic Information Retrieval
Modeling
Exercise
110. Dynamic Information Retrieval ModelingTutorial 2014110
Markov Chain
Hidden Markov Model
Markov Decision Process
Partially Observable Markov Decision Process
Multi-Armed Bandit
Family of Markov Models
111. Multi Armed Bandits (MAB)
Dynamic Information Retrieval ModelingTutorial 2014111
……
……
Which slot
machine should
I select in this
round?
Reward
112. Multi Armed Bandits (MAB)
Dynamic Information Retrieval ModelingTutorial 2014112
I won! Is this
the best slot
machine?
Reward
113. MAB Definition
Dynamic Information Retrieval ModelingTutorial 2014113
A tuple (S,A, R, B)
S : hidden reward distribution of each bandit
A: choose which bandit to play
R: reward for playing bandit
B: belief space, our estimate of each bandit’s
distribution
114. Comparison with Markov Models
Dynamic Information Retrieval ModelingTutorial 2014114
Single state Markov Decision Process
No transition probability
Similar to POMDP in that we maintain a belief
state
Action = choose a bandit, does not affect state
Does not‘plan ahead’ but intelligently adapts
Somewhere between interactive and dynamic IR
115. Markov Multi Armed Bandits
Dynamic Information Retrieval ModelingTutorial 2014115
……
……
Markov
Process 1
Markov
Process 2
Markov
Process k
Which slot
machine should
I select in this
round?
Reward
116. Markov Multi Armed Bandits
Dynamic Information Retrieval ModelingTutorial 2014116
……
……
Markov
Process 1
Markov
Process 2
Markov
Process k
Markov
Process
Action
Which slot
machine should
I select in this
round?
Reward
117. MAB Policy Reward
Dynamic Information Retrieval ModelingTutorial 2014117
MAB algorithm describes a policy 𝜋 for choosing
bandits
Maximise rewards from chosen bandits over all
time steps
Minimize regret
𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎 𝜋(𝑡))𝑇
𝑡=1
Cumulative difference between optimal reward and
actual reward
118. Exploration vs Exploitation
Dynamic Information Retrieval ModelingTutorial 2014118
Exploration
Try out bandits to find which has highest average reward
Exploitation
Too much exploration leads to poor performance
Play bandits that are known to pay out higher reward on average
MAB algorithms balance exploration and exploitation
Start by exploring more to find best bandits
Exploit more as best bandits become known
120. MAB – Index Algorithms
Dynamic Information Retrieval ModelingTutorial 2014120
Gittens index1
Play bandit with highest‘Dynamic Allocation Index’
Modelled using MDP but suffers‘curse of dimensionality’
𝜖-greedy2
Play highest reward bandit with probability 1 − ϵ
Play random bandit with probability 𝜖
UCB (Upper Confidence Bound)3
Play bandit 𝑖 with highest 𝑥𝑖 +
2 ln 𝑡
𝑇 𝑖
Chances of playing infrequently played bandits increases over
time
1J. C. Gittins.‘89
2Nicolò Cesa-Bianchi et. al.,‘98
3P.Auer et. al.,‘02
121. MAB use in IR
Dynamic Information Retrieval ModelingTutorial 2014121
Choosing ads to display to users1
Each ad is a bandit
User click through rate is reward
Recommending news articles2
News article is a bandit
Similar to Information Filtering case
Diversifying search results3
Each rank position is an MAB dependent on higher ranks
Documents are bandits chosen by each rank
1Deepayan Chakrabarti et. al. ,‘09
2Lihong Li et. al., ’10
3Radlinski et. al.,‘08
122. MAB Variations
Dynamic Information Retrieval ModelingTutorial 2014122
Contextual Bandits1
World has some context 𝑥 ∈ 𝑋 (i.e. user location)
Learn policy 𝜋: 𝑋 → 𝐴 that maps context to arms (online or
offline)
Duelling Bandits2
Play two (or more) bandits at each time step
Observe relative reward rather than absolute
Learn order of bandits
Mortal Bandits3
Value of bandits decays over time
Exploitation > exploration
1Lihong Li et. al.,‘10
2YisongYue et. al.,‘09
3Deepayan Chakrabarti et. al. ,‘09
123. Comparison of Markov Models
Dynamic Information Retrieval ModelingTutorial 2014123
MC – a fully observable stochastic process
HMM – a partially observable stochastic process
MDP – a fully observable decision process
MAB – a decision process, either fully or partially observable
POMDP – a partially observable decision process
actions rewards states
MC No No Observable
HMM No No Unobservable
MDP Yes Yes Observable
POMDP Yes Yes Unobservable
MAB Yes Yes Fixed
124. SIGIRTutorial July 7th 2014
Grace Hui Yang
Marc Sloan
JunWang
Guest Speaker: EmineYilmaz
Dynamic Information Retrieval
Modeling
Exercise
125. Outline
Dynamic Information Retrieval ModelingTutorial 2014125
Introduction
Theory and Models
Session Search
Reranking
GuestTalk: Evaluation
126. TREC Session Tracks (2010-2012)
Given a series of queries {q1,q2,…,qn}, top 10 retrieval
results {D1, … Di-1 } for q1 to qi-1, and click information
The task is to retrieve a list of documents for the current/last
query, qn
Relevance judgment is made based on how relevant the
documents are for qn, and how relevant they are for information
needs for the entire session (in topic description)
no need to segment the sessions
126
127. 1.pocono mountains pennsylvania
2.pocono mountains pennsylvania hotels
3.pocono mountains pennsylvania things to do
4.pocono mountains pennsylvania hotels
5.pocono mountains camelbeach
6.pocono mountains camelbeach hotel
7.pocono mountains chateau resort
8.pocono mountains chateau resort attractions
9.pocono mountains chateau resort getting to
10.chateau resort getting to
11.pocono mountains chateau resort directions
TREC 2012 Session 6
127
Information needs:
You are planning a winter vacation to the
Pocono Mountains region in Pennsylvania in
the US.Where will you stay?What will you
do while there? How will you get there?
In a session, queries change
constantly
128. Query change is an important
form of feedback
We define query change as the syntactic editing changes
between two adjacent queries:
includes
, added terms
, removed terms
The unchanged/shared terms are called:
, theme term
1 iii qqq
iq
128
iq
iq
iq
themeq
q1 = “bollywood legislation”
q2 = “bollywood law”
---------------------------------------
ThemeTerm = “bollywood”
Added (+Δq) = “law”
Removed (-Δq) = “legislation”
129. Where do these query changes come
from?
GivenTREC Session settings, we consider two sources of
query change:
the previous search results that a user viewed/read/examined
the information need
Example:
Kurosawa Kurosawa wife
`wife’ is not in any previous results, but in the topic description
However, knowing information needs before search is
difficult to achieve
129
130. Previous search results could influence
query change in quite complex ways
Merck lobbyists Merck lobbying US policy
D1 contains several mentions of‘policy’, such as
“A lobbyist who until 2004 worked as senior policy advisor to
Canadian Prime Minister Stephen Harper was hired last month by
Merck …”
These mentions are about Canadian policies; while the user adds
US policy in q2
Our guess is that the user might be inspired by‘policy’, but
he/she prefers a different sub-concept other than `Canadian
policy’
Therefore, for the added terms `US policy’,‘US’ is the novel term
here, and‘policy’ is not since it appeared in D1.
The two terms should be treated differently
130
131. We propose to model session search as a Markov decision process (MDP)
Two agents: the User and the Search Engine
Dynamic Information Retrieval ModelingTutorial 2014131
Environments
Search results
States Queries
Actions
User actions:
Add/remove/unchange
the query terms
Search Engine actions:
Increase/ decrease
/remain term weights
Applying MDP to Session Search
132. Search Engine Agent’s Actions
∈ Di−1 action Example
qtheme
Y increase “pocono mountain” in s6
N increase
“france world cup 98 reaction” in s28,
france world cup 98 reaction stock
market→ france world cup 98 reaction
+∆q
Y decrease
‘policy’ in s37, Merck lobbyists → Merck
lobbyists US policy
N increase
‘US’ in s37, Merck lobbyists → Merck lobbyists
US policy
−∆q
Y decrease
‘reaction’ in s28, france world cup 98
reaction
→ france world cup 98
N
No
change
‘legislation’ in s32, bollywood legislation
→bollywood law
132
133. Query Change retrieval Model
(QCM)
Bellman Equation gives the optimal value for an MDP:
The reward function is used as the document relevance score
function and is tweaked backwards from Bellman equation:
133
V*
(s) = max
a
R(s,a) + g P(s' | s,a)
s'
å V*
(s')
a
Di
)D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii
1
Document
relevant score Query
Transition
model
Maximum
past
relevanceCurrent
reward/relevanc
e score
134. Calculating the Transition Model
)|(log)|(
)|(log)()|(log)|(
)|(log)]|(1[+d)|P(qlog=d),Score(q
*
1
*
1
*
1ii
*
1
*
1
dtPdtP
dtPtidfdtPdtP
dtPdtP
qt
i
dt
qt
dt
qt
i
qthemet
i
ii
134
• According to Query Change and Search Engine
Actions
Current reward/
relevance score
Increase weights
for theme terms
Decrease weights
for removed terms
Increase weights
for novel added
terms
Decrease weights
for old added
terms
135. Maximizing the Reward Function
Generate a maximum rewarded document denoted as d*
i-1,
from Di-1
That is the document(s) most relevant to qi-1
The relevance score can be calculated as
𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − {1 − 𝑃(𝑡|𝑑𝑖−1)}
𝑡∈𝑞 𝑖−1
𝑃 𝑡 𝑑𝑖−1 =
#(𝑡,𝑑 𝑖−1)
|𝑑 𝑖−1|
From several options, we choose to only use the document
with top relevance
max
Di-1
P(qi-1 | Di-1)
135
136. Scoring the Entire Session
The overall relevance score for a session of queries is
aggregated recursively :
Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d)
= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]
= gn-i
i=1
n
å Score(qi, d)
136
154. Model – Bellman Equation
Dynamic Information Retrieval ModelingTutorial 2014154
Optimize 𝒔1 to improve 𝑼 𝒔
2
𝑉 𝜃1
, Σ1
, 1 =
max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 + max
𝒔2
(1 − 𝜆) 𝜃𝒔
2
. 𝑾2 𝑃 𝒓 𝑑𝒓𝒓
155. 𝜆
Dynamic Information Retrieval ModelingTutorial 2014155
Balances exploration and exploitation in page 1
Tuned for different queries
Navigational
Informational
𝜆 = 1 for non-ambiguous search
156. Approximation
Dynamic Information Retrieval ModelingTutorial 2014156
Monte Carlo Sampling
≈ max
𝒔1
𝜆𝜃𝒔
1
. 𝑾1 + max
𝒔2
1 − 𝜆
1
𝑆
𝜃𝒔
2
. 𝑾2 𝑃 𝒓𝑟∈𝑂
Sequential Ranking Decision
157. Experiment Data
Dynamic Information Retrieval ModelingTutorial 2014157
Difficult to evaluate without access to live users
Simulated using 3TREC collections and relevance
judgements
WT10G – Explicit Ratings
TREC8 – Clickthroughs
Robust – Difficult (ambiguous) search
158. User Simulation
Dynamic Information Retrieval ModelingTutorial 2014158
Rank M documents
Simulated user clicks according to relevance judgements
Update page 2 ranking
Measure at page 1 and 2
Recall
Precision
nDCG
MRR
BM25 – prior ranking model
165. Results
Dynamic Information Retrieval ModelingTutorial 2014165
Similar results across data sets and metrics
2nd page gain outweighs 1st page losses
Outperformed Maximum Marginal Relevance using MRR to
measure diversity
BM25-U simply no exploration case
Similar results when 𝑀 = 5
174. Different Approaches to
Evaluation
Online Evaluation
Design interactive experiments
Use users’ actions to evaluate the quality
Inherently dynamic in nature
Offline Evaluation
Controlled laboratory experiments
The users’ interaction with the engine is only simulated
Recent work focused on dynamic IR evaluation
175. Online Evaluation
Standard click metrics
Clickthrough rate
Probability user skips over results they have considered (pSkip)
Most recently: Result interleaving
Click/Noclick
Evaluate
175
176. What is result interleaving?
A way to compare rankers online
Given the two rankings produced by two methods
Present a combination of the rankings to users
Team Draft Interleaving (Radlinski et al., 2008)
Interleaving two rankings
Input:Two rankings (“can be seen as teams who pick players”)
Repeat:
o Toss a coin to see which team (ranking) picks next
o Winner picks their best remaining player (document)
o Loser picks their best remaining player (document)
Output: One ranking (2 teams of 5)
Credit assignment
Ranking providing more of the clicked results wins
177. Team Draft InterleavingRanking A
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley – The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
AB
178. Team Draft InterleavingRanking A
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley – The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
B wins!
179. Team Draft InterleavingRanking A
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Valley Wineries - Plan your wine...
www.napavalley.com/wineries
3. Napa Valley College
www.napavalley.edu/homex.asp
4. Been There | Tips | Napa Valley
www.ivebeenthere.co.uk/tips/16681
5. Napa Valley Wineries and Wine
www.napavintners.com
6. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
Ranking B
1. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
2. Napa Valley – The authority for lodging...
www.napavalley.com
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
5. NapaValley.org
www.napavalley.org
6. The Napa Valley Marathon
www.napavalleymarathon.org
Presented Ranking
1. Napa Valley – The authority for lodging...
www.napavalley.com
2. Napa Country, California – Wikipedia
en.wikipedia.org/wiki/Napa_Valley
3. Napa: The Story of an American Eden...
books.google.co.uk/books?isbn=...
4. Napa Valley Wineries – Plan your wine...
www.napavalley.com/wineries
5. Napa Valley Hotels – Bed and Breakfast...
www.napalinks.com
6. Napa Valley College
www.napavalley.edu/homex.asp
7 NapaValley.org
www.napavalley.org
B wins!
Repeat Over Many Different
Queries!
180. Offline Evaluation
Controlled laboratory experiments
The user’s interaction with the engine is
only simulated
Ask experts to judge each query result
Predict how users behave when they search
Aggregate judgments to evaluate
180
181. Offline Evaluation
Until recently: Metrics assume that user’s information need was not affected
by the documents read
E.g.Average Precision, NDCG, …
• Users are more likely to stop searching when they see a highly relevant
document
• Lately: Metrics that incorporate the affect of relevance of documents seen
by the user on user behavior
Based on devising more realistic user models
EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09]
181
182. Modeling User Behavior
Cascade-based models
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
• The user views search results from top to bottom
• At each rank i, the user has a certain probability of being
satisfied.
• Probability of satisfaction proportional to the
relevance grade of the document at rank i.
• Once the user is satisfied with a document, he terminates
the search.
185. Expected Reciprocal Rank
[Chapelle et al CIKM09]
Query
Stop
Relevant?
View Next
Item
nosomewhathighly
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
186. Expected Reciprocal Rank
[Chapelle et al CIKM09]
black powder
ammunition
1
2
3
4
5
6
7
8
9
10
…
rrankatdocument"perfectthe"findingofUtility:(r)
1/r(r)
)positionatstopsuser(
1
1
rP
r
ERR
n
r
1
11
)1(
1 r
i
ri
n
r
RR
r
ERR
documentitheofgraderelevance: th
ig
iRi g
g
i
i
docatstopofProb.
2
12
docofrelevanceofProb. max
189. Measuring “goodness”
The user steps down a ranked list of documents and
observes each one of them until a decision point and either
a) abandons the search, or
b) reformulates
While stepping down or sideways, the user accumulates
utility
190. Evaluation over a single ranked list
1
2
3
4
5
6
7
8
9
10
…
kenya cooking
traditional swahili
kenya cooking
traditional
kenya swahili
traditional food
recipes
191.
192. Session DCG
[Järvelin et al ECIR 2008]
kenya cooking
traditional swahili
kenya cooking
traditional
2rel(r)
1
logb (r b 1)r1
k
2rel(r)
1
logb (r b 1)r1
k
1
logc (1 c 1)
DCG(RL1)
1
logc (2 c 1)
DCG(RL2)
193. Model-based measures
Probabilistic space of users following
different paths
Ω is the space of all paths
P(ω) is the prob of a user following a path ω in Ω
Mω is a measure over a path ω
[Yang and Lad ICTIR 2009,
Kanoulas et al. SIGIR 2011]
194. Probability of a path
Probability of abandoning at
reform 2
X
Probability of reformulating at rank
3
Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
(1)
(2)
195. Expected Global Utility
[Yang and Lad ICTIR 2009]
1. User steps down ranked results one-by-one
2. Stops browsing documents based on a stochastic process
that defines a stopping probability distribution over ranks
and reformulates
3. Gains something from relevant documents, accumulating
utility
196. Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
Probability
of abandoning
the session at
reformulation i
Geometric w/ parameter preform
(1)
197. Q1 Q2 Q3
N R R
N R R
N R R
N R R
N R R
N N R
N N R
N N R
N N R
N N R
… … …
Geometricw/parameterpdown
Probability
of reformulating
at rank j
(2)
Geometric w/ parameter preform
198. Expected Global Utility
[Yang and Lad ICTIR 2009]
The probability of a user following a path ω:
P(ω) = P(r1, r2, ..., rK)
ri is the stopping and reformulation point in list i
Assumption: stopping positions in each list are independent
P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK)
Use geometric distribution (RBP) to model the stopping and
reformulation behaviour
P(ri = r) = (1-)k1
199. Conclusions
Recent focus on evaluating the dynamic nature of the search
process
Interleaving
New offline evaluation metrics
ERR, RBU
Session evaluation metrics
200. Outline
Dynamic Information Retrieval ModelingTutorial 2014200
Introduction
Theory and Models
Session Search
Reranking
GuestTalk: Evaluation
Conclusion
201. Conclusions
Dynamic Information Retrieval ModelingTutorial 2014201
Dynamic IR describes a new class of interactive model
Incorporates rich feedback, temporal dependency and is goal
oriented.
Family of Markov models and Multi Armed Bandit theory
useful in building DIR models
Applicable to a range of IR problems
Useful in applications such as session search and evaluation
202. Dynamic IR Book
Dynamic Information Retrieval ModelingTutorial 2014202
Published by Morgan & Claypool
‘Synthesis Lectures on Information Concepts, Retrieval, and
Services’
Due March/April 2015 (in time for SIGIR 2015)
203. Acknowledgment
Dynamic Information Retrieval ModelingTutorial 2014203
We thank Dr. EmineYilmaz for giving us the guest speech.
We sincerely thank Dr. Xuchu Dong for his help in
preparation of the tutorial
We also thank comments and suggestions from the following
colleagues:
Dr. Jamie Callan
Dr. Ophir Frieder
Dr. Fernando Diaz
Dr Filip Radlinski
206. References
Dynamic Information Retrieval ModelingTutorial 2014206
Static IR
Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-
Neto.Addison-Wesley, 1999.
The PageRank Citation Ranking: Bringing Order to theWeb.
Lawrence Page , Sergey Brin , Rajeev Motwani ,TerryWinograd.
1999
Implicit User Modeling for Personalized Search, Xuehua Shen et.
al, CIKM, 2005
A Short Introduction to Learning to Rank. Hang Li, IEICE
Transactions 94-D(10): 1854-1862, 2011.
207. References
Dynamic Information Retrieval ModelingTutorial 2014207
Interactive IR
Relevance Feedback in Information Retrieval, Rocchio, J. J.,The
SMART Retrieval System (pp. 313-23), 1971
A study in interface support mechanisms for interactive
information retrieval, RyenW.White et. al, JASIST, 2006
Visualizing stages during an exploratory search session, Bill Kules
et. al, HCIR, 2011
Dynamic Ranked Retrieval, Cristina Brandt et. al,WSDM, 2011
Structured Learning of Two-level Dynamic Rankings, Karthik
Raman et. al, CIKM, 2011
208. References
Dynamic Information Retrieval ModelingTutorial 2014208
Dynamic IR
A hidden Markov model information retrieval system. D. R. H.
Miller,T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.
Threshold setting and performance optimization in adaptive
filtering, Stephen Robertson, JIR 2002
A large-scale study of the evolution of web pages, Dennis Fetterly
et. al.,WWW 2003
Learning diverse rankings with multi-armed bandits. Filip
Radlinski, Robert Kleinberg,Thorsten Joachims. ICML, 2008.
Interactively Optimizing Information Retrieval Systems as a
Dueling Bandits Problem,YisongYue et. al., ICML 2009
Meme-tracking and the dynamics of the news cycle, Jure Leskovec
et. al., KDD 2009
209. References
Dynamic Information Retrieval ModelingTutorial 2014209
Dynamic IR
Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip
Radlinski, Eli Upfal. NIPS 2009
A Novel Click Model and Its Applications to Online Advertising ,
Zeyuan Allen Zhu et. al.,WSDM 2010
A contextual-bandit approach to personalized news article
recommendation. Lihong Li,Wei Chu, John Langford, Robert E.
Schapire.WWW, 2010
Inferring search behaviors using partially observable markov model with
duration (POMD),Yin he et. al.,WSDM, 2011
No Clicks, No Problem: Using Cursor Movements to Understand and
Improve Search, Jeff Huang et. al., CHI 2011
Balancing Exploration and Exploitation in Learning to Rank Online,
Katja Hofmann et. al., ECIR, 2011
Large-ScaleValidation and Analysis of Interleaved Search Evaluation,
Olivier Chapelle et. al.,TOIS 2012
210. References
Dynamic Information Retrieval ModelingTutorial 2014210
Dynamic IR
Using ControlTheory for Stable and Efficient Recommender Systems.T.
Jambor, J.Wang, N. Lathia. In:WWW '12, pages 11-20.
Sequential selection of correlated ads by POMDPs, ShuaiYuan et. al.,
CIKM 2012
Utilizing query change for session search. D. Guan, S. Zhang, and H.
Yang. In SIGIR ’13, pages 453–462.
Query Change as Relevance Feedback in Session Search (short paper). S.
Zhang, D. Guan, and H.Yang. In SIGIR 2013.
Interactive exploratory search for multi page search results. X. Jin, M.
Sloan, and J.Wang. InWWW ’13.
Interactive Collaborative Filtering. X. Zhao,W. Zhang, J.Wang. In:
CIKM'2013, pages 1411-1420.
Win-win search: Dual-agent stochastic game in session search. J. Luo, S.
Zhang, and H.Yang. In SIGIR ’14.
211. References
Dynamic Information Retrieval ModelingTutorial 2014211
Markov Processes
A markovian decision process. R. Bellman. Indiana University
Mathematics Journal, 6:679–684, 1957.
Dynamic Programming. R. Bellman. Princeton University Press,
Princeton, NJ, USA, first edition, 1957.
Dynamic Programming and Markov Processes. R.A. Howard. MIT Press.
1960
Linear Programming and Sequential Decisions.Alan S. Manne.
Management Science, 1960
Statistical Inference for Probabilistic Functions of Finite State Markov
Chains. Baum, Leonard E.; Petrie,Ted.The Annals of Mathematical
Statistics 37, 1966
212. References
Dynamic Information Retrieval ModelingTutorial 2014212
Markov Processes
Learning to predict by the methods of temporal differences. Richard
Sutton. Machine Learning 3. 1988
Computationally feasible bounds for partially observed Markov decision
processes.W. Lovejoy. Operations Research 39: 162–175, 1991.
Q-Learning. Christopher J.C.H.Watkins, Peter Dayan. Machine
Learning. 1992
Reinforcement learning with replacing eligibility traces. Singh, S. P. &
Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.
Reinforcement Learning:An Introduction. Richard S. Sutton and
Andrew G. Barto. MIT Press, 1998.
Planning and acting in partially observable stochastic domains. L.
Kaelbling, M. Littman, and A. Cassandra.Artificial Intelligence, 101(1-
2):99–134, 1998.
213. References
Dynamic Information Retrieval ModelingTutorial 2014213
Markov Processes
Finding approximate POMDP solutions through belief compression. N.
Roy. PhDThesis Carnegie Mellon. 2003
VDCBPI: an approximate scalable algorithm for large scale POMDPs, P.
Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.
Finding Approximate POMDP solutionsThrough Belief Compression. N.
Roy, G. Gordon and S.Thrun. Journal of Artificial Intelligence Research,
23:1-40,2005.
Probabilistic robotics. S.Thrun,W. Burgard, D. Fox. Cambridge. MIT
Press. 2005
Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G.
Gordon and S.Thrun.Volume 27, pages 335-380, 2006
Probabilistic Robotics. S.Thrun,W. Burgard, D. Fox.The MIT Press,
2006.
214. References
Dynamic Information Retrieval ModelingTutorial 2014214
Markov Processes
The optimal control of partially observable Markov decision processes over a
finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973
Modified Policy IterationAlgorithms for Discounted Markov Decision
Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.
An example of statistical investigation of the text eugene onegin the connection
of samples in chains.A.A. Markov. Science in Context, 19:591–600, 12 2006.
Learning to Rank for Information Retrieval.Tie-Yan Liu. Springer Science &
Business Media. 2011
Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-
Bianchi, Paul Fischer. ICML 100-108, 1998
Multi-armed bandit allocation indices,Wiley, J. C. Gittins. 1989
Finite-time Analysis of the Multiarmed Bandit Problem, PeterAuer et. al.,
Machine Learning 47, Issue 2-3. 2002.