SIGIR 2011 Workshop Summary on Query Representation
1. Summary of Papers of
SIGIR 2011 Workshop on Query
Representation and Understanding
Chetana Gavankar
2. Ricardo Campos, Alipio Jorge, Gael Dias:
"Using Web Snippets and Query-logs to
Measure Implicit Temporal Intents in
Queries"
3. Types of Temporal queries
1. Atemporal: Queries not sensitive to
time like plan my trip
2.Temporal unambiguous: Queries in
concrete time period. Ex: Haiti earthquake
in 2010
3. Temporal ambiguous: queries with
multiple instances over time. Ex: Cricket
worldcup which occurs every four years.
4. Web snippets and Query Logs
Content-Related Resources, based on a web content approach
Simply requires the set of web search results.
Query-Log Resources, based on similar year-qualified queries
Imply that some versions of the query have already been issued.
5. 1.Web snippets
(temporal evidence within web pages):
TA(q)=∑fεI wf f(q)
I = {Tsnippet(.),TTitle(.),TUrl(.)}
Value each feature differently using wf
18.14 for TTitles, 50.91 for TSnippets and 30.95 for Turl(.)
If TA(q) value < 10% then Atemporal.
Dates appearing in query & docs may not match.
TSnippets =
# Snippets Retrieved
# Snippets Retrieved with Dates
Identifying implicit temporal queries
6. Identifying implicit temporal queries
2.Web Query Logs: Temporal activity can be
recorded from date & time of request and from user
activity.
No. of times query is pre, post qualified by year is
WA(q,y)=#(y,q) + #(q,y)
α(q) = ∑y WA (q,y) / ∑x#(x,q) + ∑x#(q,x)
If query qualified with single year then α(q) =1
7. Results
Temporal information is more frequent in web snippets than
in any of the query logs of Google and Yahoo!;
Most of the queries have a TSnippet(.) value around 20%,
TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.
8. Conclusion
➔Future dates common in snippets than query log
➔Query having dates does not necessarily mean
that it has temporal intent (from web query logs of
Google and yahoo) Ex: October Sky movie
➔Web snippets statistically more relevant in terms
of temporal intent than query logs
9. Rishiraj Saha Roy, Niloy Ganguly, Monojit
Choudhury, Naveen Singh:
"Complex Network Analysis Reveals
Kernel-Periphery Structure in Web
Search Queries"
10. Search Queries
Search Query language: bag of segments
Word occurrence n/w: Edge exists if Pij > Pi Pj
Eight complex network models for query logs
●
Query Unrestricted wordnet(local) and (global)
●
Query Restricted wordnet(local) and (global)
●
Query Unrestricted SegmentNet(local) and (global)
●
Query Restricted SegmentNet(local) and (global)
11. Kernel and Peripheral lexicons
Two regimes in DD of word occurrence N/W:
1.Kernel lexicons (K-Lex or modifiers):
• Units popular in query (high degrees)
• Generic and domain independent
2.Peripheral lexicon (P-Lex or HEADs):Rare ones
with degree much less than those in kernal
P
K-Lex (popular segments) P-Lex (rarer segments)
how to matthew brodrick
wiki accessories
free police officer
and who is
in australia epson tx800
videos star trek next gen
12. Degree Disribution
|N| = Nodes, |E| = edges
C= average clustering coefficient
d=mean shortest path between edges
Crand and drand are corr. Values in random graph
Crand ~ k'/ |N| , drand ~ ln(|N|)/ ln(|k'|)
k'= average degree of graph
Degree distribution= p(k)
= nodes with degree k/ total nodes
14. Conclusion
● Like NL, Queries reflect kernal-periphery distinction
Unlike NL, Query N/W lack small word property for
quickly retrieving words from mind
● More difficult to understand context of segment in query.
● Peripheral N/W consist of large number of small
disconnected components
● Capability of peripheral units to exist by themselves
makes POS identification hard in Queries.
● Socio-cultural factors govern the kernel-periphery
distinction in queries
15. Lidong Bing, Wai Lam:
"Investigation of Web Query Refinement
via Topic Analysis and Learning with
Personalization"
16. Web Query Refinement
● Query Refinement
● Substitution
● Expansion
● Deletion
● Stemming
● Spelling correction
● Abbreviation expansion
......................
● Generate some candidate queries first, and score
the quality of these candidates.
17. Latent Topic Analysis in Query Log
Query log record (user_id, query, clicked_url, time)
Pseudo-document generation: Queries related to the same host are
aggregated. General sites like “en.wikipedia.org” are not suitable for
latent topic analysis & are eliminated
Latent Dirichlet Allocation Algorithm) LDA to conduct the latent
semantic topic analysis on the collection of host-based pseudo-
documents.
Z = set of latent topics zi
Each zi is associated with multinomial distribution of terms
P(tk|zi)= prob of term tk given topic zi
18. Personalization
πu ={πu
1, πu
2, … , πu
|z|} = profile of the user u,
πu
i = P(zi|u) = probability that the user u prefers the
topic zi
Generate user-based pseudo-document U for user u.
{P(z1|U), P(z2|U), … , P(z|Z||U)} = profile of u.
candidate query q: t1, … tn
Topic of term tr = zr
19. Topic based scoring with
personalization
Candidate query score:
model parameter P(zj|zi) captures the relationship of two
topics
With personal profile
P(z1|u) = probability that user u prefers the topic z1