The Internet is the largest source of information in the world. Search engines help people navigate the huge space of available data in order to acquire new skills and knowledge. In this paper, we present an in-depth analysis of sessions in which people explicitly search for new knowledge on the Web based on the log files of a popular search engine. We investigate within-session and cross-session developments of expertise, focusing on how the language and search behavior of a user on a topic evolves over time. In this way, we identify those sessions and page visits that appear to significantly boost the learning process. Our experiments demonstrate a strong connection between clicks and several metrics related to expertise. Based on models of the user and their specific context, we present a method capable of automatically predicting, with good accuracy, which clicks will lead to enhanced learning. Our findings provide insight into how search engines might better help users learn as they search.
This work together with Jaime Teevan, Ryen White and Susan Dumais has been accepted for full oral presentation at the 7th ACM International Conference on Web Search and Data Mining (WSDM). The full version of this paper is available at: http://dl.acm.org/citation.cfm?id=2556195.2556217
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Lessons from the Journey: A Query Log Analysis of Within-Session Learning (WSDM'14)
1. Lessons from the Journey
A Query-log Analysis of Within-session Learning
Carsten Eickhoff
Jaime Teevan
Ryen White
Susan Dumais
2. Learning by Searching
• Domain expertise seems to be generally useful for indomain searches
• Domain expertise can slowly change over time
• Here, we measure this effect at finer granularity
4. Explicit Learning Sessions
• Learning happens all the time
• We look at explicit knowledge acquisition sessions
• Two types of informational needs:
– Procedural: learn how to do something
• E.g.: Ehow.com, YouTube tutorials, …
– Declarative: learn about something
• E.g.: Wikipedia.com, documentaries, …
5. Finding Indicator Terms
• Group sessions that end at Ehow vs. Wikipedia
• Find query terms that occur more frequently
in knowledge acquisition sessions
6. Selecting Sessions
• Based on a set of 26.7 Million sessions
• Select sessions that contain indicator terms in
at least 50% of queries
– Dproc
– Ddecl
10. Session Properties
• Knowledge acquisition sessions are long,
topically diverse and more exploratory
• Extended sets are noisy and mimic the full
collection
11. Within-session Learning
• General upwards trend for domain count and
query complexity
• This trend is strongest for learning sessions
12. Sustained Learning
• What happens beyond
the session boundary?
• Domain expertise
metrics are more likely
to increase further
after within-session
learning
13. Page Visits Spark Learning
• We study the origin of new query terms
• (Where) did added terms occur previously in
the same session?
14. The Effect of Page Visits
• Condition P+, P= and P- of expertise metrics on
click status of previous SERP
• Clicks more often result in metric increases
• Click duration has no significant effect
15. Summary
• Introduced procedural/declarative needs
• Noted evidence of within-session learning
• Learning is sustained across session boundary
• Page visits seem to have a strong influence
16. Future Directions
• Ranking to Learn
– Learning potential is spread evenly across SERP
– Predictors of learning potential may serve as
ranking criteria
• Qualitative study of query reformulation
– Here: Term presence implies causality
– Better: Study what the user really sees
(e.g., via eye gaze tracking)