Lessons from the Journey: A Query Log Analysis of Within-Session Learning (WSDM'14)

•Als PPTX, PDF herunterladen•

2 gefällt mir•601 views

The Internet is the largest source of information in the world. Search engines help people navigate the huge space of available data in order to acquire new skills and knowledge. In this paper, we present an in-depth analysis of sessions in which people explicitly search for new knowledge on the Web based on the log files of a popular search engine. We investigate within-session and cross-session developments of expertise, focusing on how the language and search behavior of a user on a topic evolves over time. In this way, we identify those sessions and page visits that appear to significantly boost the learning process. Our experiments demonstrate a strong connection between clicks and several metrics related to expertise. Based on models of the user and their specific context, we present a method capable of automatically predicting, with good accuracy, which clicks will lead to enhanced learning. Our findings provide insight into how search engines might better help users learn as they search. This work together with Jaime Teevan, Ryen White and Susan Dumais has been accepted for full oral presentation at the 7th ACM International Conference on Web Search and Data Mining (WSDM). The full version of this paper is available at: http://dl.acm.org/citation.cfm?id=2556195.2556217

Technologie

Lessons from the Journey
A Query-log Analysis of Within-session Learning

Carsten Eickhoff
Jaime Teevan
Ryen White
Susan Dumais

Learning by Searching
• Domain expertise seems to be generally useful for indomain searches
• Domain expertise can slowly change over time

• Here, we measure this effect at finer granularity

Expertise

Studying Expertise over Time

Time

Explicit Learning Sessions
• Learning happens all the time
• We look at explicit knowledge acquisition sessions
• Two types of informational needs:
– Procedural: learn how to do something
• E.g.: Ehow.com, YouTube tutorials, …

– Declarative: learn about something
• E.g.: Wikipedia.com, documentaries, …

Finding Indicator Terms
• Group sessions that end at Ehow vs. Wikipedia
• Find query terms that occur more frequently
in knowledge acquisition sessions

Selecting Sessions
• Based on a set of 26.7 Million sessions
• Select sessions that contain indicator terms in
at least 50% of queries
– Dproc
– Ddecl

Session Properties

• Knowledge acquisition sessions are
long, topically diverse and more exploratory

Session Properties

• Knowledge acquisition sessions are long,
topically diverse and more exploratory
• Extended sets are noisy and mimic the full
collection

Within-session Learning

• General upwards trend for domain count and
query complexity
• This trend is strongest for learning sessions

Sustained Learning
• What happens beyond
the session boundary?
• Domain expertise
metrics are more likely
to increase further
after within-session
learning

Page Visits Spark Learning
• We study the origin of new query terms
• (Where) did added terms occur previously in
the same session?

The Effect of Page Visits
• Condition P+, P= and P- of expertise metrics on
click status of previous SERP
• Clicks more often result in metric increases
• Click duration has no significant effect

Summary
• Introduced procedural/declarative needs

• Noted evidence of within-session learning
• Learning is sustained across session boundary
• Page visits seem to have a strong influence

Future Directions
• Ranking to Learn
– Learning potential is spread evenly across SERP
– Predictors of learning potential may serve as
ranking criteria

• Qualitative study of query reformulation
– Here: Term presence implies causality
– Better: Study what the user really sees
(e.g., via eye gaze tracking)

Empfohlen

Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...Carsten Eickhoff

Web2Text: Deep Structured Boilerplate RemovalCarsten Eickhoff

Cognitive Biases in CrowdsourcingCarsten Eickhoff

Evaluating Music Recommender Systems for GroupsCarsten Eickhoff

Active Content-Based Crowdsourcing Task SelectionCarsten Eickhoff

Efficient Parallel Learning of Word2VecCarsten Eickhoff

An Eye-Tracking Study of Query ReformulationCarsten Eickhoff

Introduction to Information RetrievalCarsten Eickhoff

Empfohlen

Unsupervised Learning of General-Purpose Embeddings for User and Location Mod...Carsten Eickhoff

Web2Text: Deep Structured Boilerplate RemovalCarsten Eickhoff

Cognitive Biases in CrowdsourcingCarsten Eickhoff

Evaluating Music Recommender Systems for GroupsCarsten Eickhoff

Active Content-Based Crowdsourcing Task SelectionCarsten Eickhoff

Efficient Parallel Learning of Word2VecCarsten Eickhoff

An Eye-Tracking Study of Query ReformulationCarsten Eickhoff

Introduction to Information RetrievalCarsten Eickhoff

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Take control of your SAP testing with UiPath Test SuiteDianaGray10

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Connecting the Dots for Information Discovery.pdfNeo4j

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

Data governance with Unity Catalog PresentationKnoldus Inc.

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Weitere ähnliche Inhalte

Kürzlich hochgeladen