SIGIR 2011 Workshop Summary on Query Representation

Summary of Papers of
SIGIR 2011 Workshop on Query
Representation and Understanding
Chetana Gavankar

Ricardo Campos, Alipio Jorge, Gael Dias:
"Using Web Snippets and Query-logs to
Measure Implicit Temporal Intents in
Queries"

Types of Temporal queries
1. Atemporal: Queries not sensitive to
time like plan my trip
2.Temporal unambiguous: Queries in
concrete time period. Ex: Haiti earthquake
in 2010
3. Temporal ambiguous: queries with
multiple instances over time. Ex: Cricket
worldcup which occurs every four years.

Web snippets and Query Logs
Content-Related Resources, based on a web content approach
Simply requires the set of web search results.
Query-Log Resources, based on similar year-qualified queries
Imply that some versions of the query have already been issued.

1.Web snippets
(temporal evidence within web pages):
TA(q)=∑fεI wf f(q)
I = {Tsnippet(.),TTitle(.),TUrl(.)}
Value each feature differently using wf
18.14 for TTitles, 50.91 for TSnippets and 30.95 for Turl(.)
If TA(q) value < 10% then Atemporal.
Dates appearing in query & docs may not match.
TSnippets =
# Snippets Retrieved
# Snippets Retrieved with Dates
Identifying implicit temporal queries

Identifying implicit temporal queries
2.Web Query Logs: Temporal activity can be
recorded from date & time of request and from user
activity.
No. of times query is pre, post qualified by year is
WA(q,y)=#(y,q) + #(q,y)
α(q) = ∑y WA (q,y) / ∑x#(x,q) + ∑x#(q,x)
If query qualified with single year then α(q) =1

Results
Temporal information is more frequent in web snippets than
in any of the query logs of Google and Yahoo!;
Most of the queries have a TSnippet(.) value around 20%,
TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.

Conclusion
➔Future dates common in snippets than query log
➔Query having dates does not necessarily mean
that it has temporal intent (from web query logs of
Google and yahoo) Ex: October Sky movie
➔Web snippets statistically more relevant in terms
of temporal intent than query logs

Rishiraj Saha Roy, Niloy Ganguly, Monojit
Choudhury, Naveen Singh:
"Complex Network Analysis Reveals
Kernel-Periphery Structure in Web
Search Queries"

Search Queries
Search Query language: bag of segments
Word occurrence n/w: Edge exists if Pij > Pi Pj
Eight complex network models for query logs
●
Query Unrestricted wordnet(local) and (global)
●
Query Restricted wordnet(local) and (global)
●
Query Unrestricted SegmentNet(local) and (global)
●
Query Restricted SegmentNet(local) and (global)

Kernel and Peripheral lexicons
Two regimes in DD of word occurrence N/W:
1.Kernel lexicons (K-Lex or modifiers):
• Units popular in query (high degrees)
• Generic and domain independent
2.Peripheral lexicon (P-Lex or HEADs):Rare ones
with degree much less than those in kernal
P
K-Lex (popular segments) P-Lex (rarer segments)
how to matthew brodrick
wiki accessories
free police officer
and who is
in australia epson tx800
videos star trek next gen

Degree Disribution
|N| = Nodes, |E| = edges
C= average clustering coefficient
d=mean shortest path between edges
Crand and drand are corr. Values in random graph
Crand ~ k'/ |N| , drand ~ ln(|N|)/ ln(|k'|)
k'= average degree of graph
Degree distribution= p(k)
= nodes with degree k/ total nodes

Conclusion
● Like NL, Queries reflect kernal-periphery distinction
Unlike NL, Query N/W lack small word property for
quickly retrieving words from mind
● More difficult to understand context of segment in query.
● Peripheral N/W consist of large number of small
disconnected components
● Capability of peripheral units to exist by themselves
makes POS identification hard in Queries.
● Socio-cultural factors govern the kernel-periphery
distinction in queries

Lidong Bing, Wai Lam:
"Investigation of Web Query Refinement
via Topic Analysis and Learning with
Personalization"

Web Query Refinement
● Query Refinement
● Substitution
● Expansion
● Deletion
● Stemming
● Spelling correction
● Abbreviation expansion
......................
● Generate some candidate queries first, and score
the quality of these candidates.

Latent Topic Analysis in Query Log
Query log record (user_id, query, clicked_url, time)
Pseudo-document generation: Queries related to the same host are
aggregated. General sites like “en.wikipedia.org” are not suitable for
latent topic analysis & are eliminated
Latent Dirichlet Allocation Algorithm) LDA to conduct the latent
semantic topic analysis on the collection of host-based pseudo-
documents.
Z = set of latent topics zi
Each zi is associated with multinomial distribution of terms
P(tk|zi)= prob of term tk given topic zi

Personalization
πu ={πu
1, πu
2, … , πu
|z|} = profile of the user u,
πu
i = P(zi|u) = probability that the user u prefers the
topic zi
Generate user-based pseudo-document U for user u.
{P(z1|U), P(z2|U), … , P(z|Z||U)} = profile of u.
candidate query q: t1, … tn
Topic of term tr = zr

Topic based scoring with
personalization
Candidate query score:
model parameter P(zj|zi) captures the relationship of two
topics
With personal profile
P(z1|u) = probability that user u prefers the topic z1

Conclusion
Framework that considers
personalization achieves
the best performance.
With user profiles, the
topic-based scoring part
is more reliable

SIGIR 2011 Workshop Summary on Query Representation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie SIGIR 2011 Workshop Summary on Query Representation

Ähnlich wie SIGIR 2011 Workshop Summary on Query Representation (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

SIGIR 2011 Workshop Summary on Query Representation