Query logs record the actual usage of search systems and
their analysis has proven critical to improving search engine
functionality. Yet, despite the deluge of information, query
log analysis often suffers from the sparsity of the query space.
Based on the observation that most queries pivot around a
single entity that represents the main focus of the user’s
need, we propose a new model for query log data called the
entity-aware click graph. In this representation, we decom-
pose queries into entities and modifiers, and measure their
association with clicked pages. We demonstrate the benefits
of this approach on the crucial task of understanding which
websites fulfill similar user needs, showing that using this
representation we can achieve a higher precision than other
query log-based approaches.
Dynamic Associative Relationships on the Linked Open Data Web
Entity Aware Click Graph
1. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 1/16
Measuring Website Similarity using
an Entity-Aware Click Graph
Pablo N. Mendes1, Peter Mika2, Hugo Zaragoza2, Roi Blanco2
1. Freie Universität Berlin
2. Yahoo! Research Barcelona
Nov 1st 2012, Maui, CIKM 2012
2. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 2/16
Introduction: query log analysis
● Query logs record user interaction with Web
search engines
● Query log analysis has been proven critical to
improving search
● For search engines
– Ranking, autosuggest, “Also try”, etc.
● For site owners
– insight into user needs, allows optimizing Web
presence, etc.
3. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 3/16
Introduction: website similarity
● Click graph: relating queries and websites,
edges are clicks
Click graph Site similarity graph (SG)
● Allows modeling website relatedness based on
shared queries leading to each website pair
4. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 4/16
Problems: Sparsity
● 44% of queries occur only once even when
considering a full year of data [1]
● using “shared queries” as relatedness
measure relatedness becomes tough in the
long tail.
[1] Baeza-Yates. Relating content through web usage. In HT ’09, 2009.
5. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 5/16
Problems: partial overlaps
● Breaking up into words distorts semantics
– “Forest” vs “Forest Gump”
– “Pitt” vs “Brad Pitt”
6. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 6/16
Introduction
● >62% of queries contain entity name or type [20]
[20] Pound, Mika, & Zaragoza. Ad-hoc object retrieval in the web of data. In WWW’10, 2010.
7. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 7/16
Entity-aware Click Graph
● Websites can share
entities and/or
modifiers
8. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 8/16
Entity-aware Website Similarity
Graph
● More connected
● Preserves semantics
● Allows analysis of
how websites relate
to entities and modifiers
9. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 9/16
Experiments
● Website similarity
– Find top K similar sites
– Evaluation: two sites are “similar” if they are in the
same category in ODP (Open Directory Project)
● Website characteristics from the searcher POV
– What entities lead to a website
– What context words lead to a website
10. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 10/16
Dataset Statistics: Query Log
● 1 month of queries from Yahoo!, 45M sessions
● 5M entities from Freebase
11. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 11/16
Results 1
● Similarity edge prediction
12. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 12/16
Results 1
● Similarity edge prediction with credit to partial
category overlap
13. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 13/16
Results 2
Many entities
Few modifiers
Many entities
Many modifiers
Entropy of
distribution of
entities
Few entities
Many modifiers
Entropy of
distribution of modifiers
14. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 14/16
Results 2
15. Mendes, Mika, Zaragoza, Blanco. Measuring Website Similarity using an Entity-Aware Click Graph 15/16
Conclusion
● Recognizing entities in Web search logs allows for
click graphs that account for internal composition of
queries
● New similarity graphs built from entity-aware click
graphs allow enable more robust and flexible
similarity analysis (evaluated for website similarity)
● Future:
– Exploit the knowledge base (e.g. type hierarchy)
– More complex queries
– etc