An interdisciplinary journey with the SAL spaceship – results and challenges in the emerging field of Search As Learning (SAL)
22. Sep 2022•0 gefällt mir
0 gefällt mir
Sei der Erste, dem dies gefällt
Mehr anzeigen
•40 Aufrufe
Aufrufe
Aufrufe insgesamt
0
Auf Slideshare
0
Aus Einbettungen
0
Anzahl der Einbettungen
0
Downloaden Sie, um offline zu lesen
Melden
Technologie
Keynote at HELMeTO2022 conference, Palermo, Italy on recent research in Search As Learning (SAL), at the intersection of machine learning and cognitive psychology.
Professor for Data & Knowledge Engineering (HHU Düsseldorf) & Scientific Director of Knowledge Technologies for the Social Sciences (GESIS, Cologne) um Heinrich-Heine-University Düsseldorf & GESIS (Cologne)
An interdisciplinary journey with the SAL spaceship – results and challenges in the emerging field of Search As Learning (SAL)
An interdisciplinary journey with the SAL spaceship –
results and challenges in the emerging field of Search
As Learning (SAL)
HELMETO Conference, Palermo
Stefan Dietze, 22.09.2022
Informal and microlearning on the Web
2
▪ Anything can be a learning resource
▪ The activity makes the difference (not only
the resource): i.e. how a resource is being
used
▪ Learning Analytics data in online/non-
learning environments?
o Activity streams,
o Social graphs (and their evolution),
o Behavioural traces (mouse movements,
keystrokes)
o ...
▪ Research challenges:
o How to detect „learning“?
o How to detect learning-specific notions
such as „competences“, „learning
performance“ etc?
SAL = „Search As Learning“
3
Research challenges at the intersection of AI/ML,
HCI & cognitive psychology
▪ Detecting coherent search missions?
▪ Detecting learning throughout search?
detecting “informational” search missions (as
opposed to “transactional” or “navigational”
missions)
▪ How competent is the user? –
Predict/understand knowledge state of users
based on in-session behavior/interactions
▪ How well does a user achieve his/her learning
goal/information need? - Predict knowledge gain
throughout search session
Hoppe, A., Holtz, P., Kammerer, Y., Yu, R., Dietze, S., Ewerth, R., Current Challenges for Studying Search as Learning Processes, 7th Workshop on
Learning & Education with Web Data (LILE2018), in conjunction with ACM Web Science 2018 (WebSci18), Amsterdam, NL, 27 May, 2018.
„SAL Spaceship“ – an interdisciplinary, conceptual SAL framework
5
Von Hoyer, J., Hoppe, A., Kammerer, Y., Otto, C., Pardi, G., Rokicki, M., Yu, R., Dietze, S., Ewerth, R., Holtz, P., The SAL Spaceship: Towards a comprehensive model
of psychological and technological facets of search as learning (SAL), Frontiers in Psychology, Section Human-Media Interaction, 2022.
SAL Spaceship
Focus: Learner
6
• Model focuses on
„informational“ search
• Cf. search intent
taxonomy by Broder,
2002
• Parts of it applicable to
other types of search
intents („transactional“,
„navigational“)
Can we detect „learning“ in Web Search?
Yu, R., Limock, Dietze, S., Still Haven’t Found What You’re Looking For - Detecting the Intent of Web Search Missions from User Interaction Features. CoRR abs/2207.01256
▪ Segmentation of real-world query logs (AOL dataset)
into logical and physical sessions
▪ Segments from Hagen et al. 2013 (2881 logical
sessions, 1378 missions, average 6.5 queries per
session)
▪ Task: supervised classification of mission intent into
transactional, navigational and informational
▪ Basic machine learning models (DT, RF, LR, SVM)
▪ 22 features in 3 categories (query, browsing,
mission)
Can we detect „learning“ in Web Search?
Yu, R., Limock, Dietze, S., Still Haven’t Found What You’re Looking For - Detecting the Intent of Web Search Missions from User Interaction Features. CoRR abs/2207.01256
Can we detect „learning“ in Web Search?
Yu, R., Limock, Dietze, S., Still Haven’t Found What You’re Looking For - Detecting the Intent of Web Search Missions from User Interaction Features. CoRR abs/2207.01256
Understanding knowledge gain/state of users during search
Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM CHIIR 2018.
Data collection - summary
▪ Crowdsourced collection of search session data
▪ 10 search topics (e.g. “Altitude sickness”,
“Tornados”), incl. pre- and post-tests
▪ Approx. 1000 distinct crowd workers & 100
sessions per topic
▪ Tracking of user behavior through 76 features
in 5 categories (session, query, SERP – search
engine result page, browsing, mouse traces)
Understanding knowledge gain/state of users during search
11
Some results
▪ 70% of users exhibited a knowledge gain (KG)
▪ Negative relationship between KG of users and
topic popularity (avg. accuracy of workers in
knowledge tests) (R= -.87)
▪ Amount of time users actively spent on web pages
describes 7% of the variance in their KG
▪ Query complexity explains 25% of the variance in
the KG of users
▪ Topic-dependent behavior: search behavior
correlates stronger with search topic than with
KG/KS
Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM CHIIR 2018.
▪ Same session data as Gadiraju et al., 2018
▪ Stratification of users into classes: user knowledge state (KS)
and knowledge gain (KG) into {low, moderate, high} using
(low < (mean ± 0.5 SD) < high)
▪ Supervised multiclass classification
(Naive Bayes, Logistic regression, SVM, random forest, multilayer perceptron)
▪ KG prediction performance results (after 10-fold cross-validation)
▪ Considers in-session features (behavioural traces) only
Predicting knowledge gain/state during web search
13
Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
Predicting knowledge gain/state during SAL: Features
14
Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
Behavioral
features
▪ Feature importance (knowledge gain prediction task)
Predicting knowledge gain/state during web search
15
Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
▪ Feature importance (knowledge state prediction task)
Predicting knowledge gain/state during web search
16
Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
Does topic familiarity influence search/learning behaviour?
17
Davari, M., Yu., R., Dietze, S., Understanding the Influence of Topic Familiarity on Search Behavior in Digital Libraries, EARS 2019 – International Workshop
on ExplainAble Recommendation and Search, collocated with SIGIR2019, Paris, July 2019.
▪ Small lab study (N=25) using eyetracking data in
digital library / scholarly literature search (SowiPort)
▪ 50 sessions (for each user one on familiar topic, one
on unfamiliar topic)
▪ 2344 web pages viewed (SERPs, actual docs); 2.6 M
rows of eye tracking data
▪ Familiar tasks: more fixated terms, longer sessions
and less query term variance (indicators for prior
competence)
▪ Unfamiliar tasks: more focus on SERPs (rather than
actual resources)
▪ Yet in unfamiliar tasks, only 51.2% of query terms
are fixated before acquisition (compared to 60.7%
for familiar tasks)
Understanding the impact of (learning) resource characteristics
20
Yu, R., Tang, R., Rokicki, M., Gadiraju, U., Dietze, S., Topic-independent Modeling of User Knowledge in Informational Search Sessions.
Information Retrieval Journal (2021): 1-29
▪ Understanding the relation between Web resource features (e.g. resource complexity, language, length)
and a user’s knowledge state (KS) and knowledge gain (KG).
▪ Understanding the topic-specificity of individual features, i.e. dependency between feature performance
and information needs (topics)
▪ Building generalizable ML models that can be used in real-world search environments on unseen topics for
predicting learning & competence from both behavioral and resource-centric features
▪ Approach/experimental setup: same dataset from SIGIR2018, but additional features and feature selection
strategy (maximise correlation with target variable and generalizability)
Yu, R., Tang, R., Rokicki, M., Gadiraju, U., Dietze, S., Topic-independent Modeling of User Knowledge in Informational Search Sessions.
Information Retrieval Journal (2021): 1-29
Characteristics of (learning) resources (instead of behaviour)
21
Web resource features
& correlation coefficients
(highlighted: p > 0.05)
Characteristics of (learning) resources (instead of behaviour)
22
▪ Model performance on knowledge state prediction & knowledge gain prediction
▪ Significant improvements across all classes
KS
New: this work (IRJ21)
Baseline: SIGIR2018 (previous slides)
Yu, R., Tang, R., Rokicki, M., Gadiraju, U., Dietze, S., Topic-independent Modeling of User Knowledge in Informational Search Sessions.
Information Retrieval Journal (2021): 1-29
How does multimodality affect the knowledge (g/s) prediction?
23
Otto, C., Yu, R., Pardi, G., von Hoyer, J., Rokicki, M., Hoppe, A., Holtz, P., Kammerer, Y., Dietze, S., Ewerth, E., Predicting Knowledge Gain during Web Search
based on Multimedia Resource Consumption, 22nd International Conference on Artificial Intelligence in Education (AIED2021), Springer, 2021.
▪ Lab study for data collection (N=113)
▪ Topic: “Lightning & thunderstorms” (causal chain of events, including
declarative and procedural knowledge)
▪ Knowledge test: 10 item multiple choice test pre-/post
▪ Tracking of behavioral features & text features
(all 110 features from IRJ)
▪ Additionally: multimedia features & eye tracking
o Detect learning frames (actual reading) in screencast (as opposed to
navigation/procrastination)
o Detecting key structure (heading, menu, content list, text, images …)
o Classifying image types (Infographics, Indoor Photo, Map, Outdoor
Photo, Technical Drawing, Information Visualization)
Classifier trained through weak labels (images crawled through
Google Image Search)
How does multimodality affect the knowledge (g/s) prediction?
24
Otto, C., Yu, R., Pardi, G., von Hoyer, J., Rokicki, M., Hoppe, A., Holtz, P., Kammerer, Y., Dietze, S., Ewerth, E., Predicting Knowledge Gain during Web Search
based on Multimedia Resource Consumption, 22nd International Conference on Artificial Intelligence in Education (AIED2021), Springer, 2021.
▪ Results for knowledge gain prediction (TI = text features, MI = multimedia features)
▪ Feature importance (Mean Decrease in Impurity) in RF model
IRJ2021
Facilitating SAL research through public research data
25
https://data.uni-hannover.de/dataset/sal-dataset
Otto, C., Rokicki, M., Pardi, G., Gritz, W., Hienert, D.,Yu, R., Hoyer, J., Hoppe, A., Dietze, S., Holtz, P., Kammerer, Y., Ewerth, R., SaL-Lightning Dataset: Search and Eye
Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search, ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR2022).
Learning on the Web beyond Google et al.
E.g.: Are Twitter users not learning too?
27
https://ai4sci-project.org/
Science claim
Science reference
Science relevance
No science
Science reference
Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., "SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online
Discourse", CIKM2022
Learning on the Web beyond Google et al.
Science discourse is on the rise
28
▪ AI4Sci project: understanding and classification of science discourse online (news, social Web)
https://ai4sci-project.org/
▪ Percentage of tweets containing
links to scientific articles (journals,
publishers, science blogs etc)
▪ Uses list of > 30 K science web
domains
▪ Data source: TweetsKB
(https://data.gesis.org/tweetskb/),
> 10 bn tweets archived since 2013
Learning on the Web beyond Google et al.
Science discourse is on the rise
29
https://ai4sci-project.org/
SciBERT classifier
Heuristic: Sci term
Sci subdomain
SciTweets dataset & classifier
30
▪ Ground truth dataset, heuristics-based sampling
strategy and annotation framework for testing
classification models
▪ 1261 expert-labeled tweets across all
classes/labels
▪ Baseline classifiers based on SciBERT transformer
model (fine-tuned/tested on SciTweets)
▪ Ongoing: analysis of large-scale science discourse
and its evolution
https://ai4sci-project.org/
Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse,
CIKM2022
The SAL Spaceship in the context of ubiquitous online learning
31
Von Hoyer, J., Hoppe, A., Kammerer, Y., Otto, C., Pardi, G., Rokicki, M., Yu, R., Dietze, S., Ewerth, R., Holtz, P., The SAL Spaceship: Towards a comprehensive
model of psychological and technological facets of search as learning (SAL), Frontiers in Psychology, Section Human-Media Interaction, 2022.
▪ Knowledge acquisition (learning) is a ubiquitous activity on the Web
▪ Search as learning = specific case of informal/microlearning during Web search & browsing
▪ Behavioural traces (e.g. scrolling, queries, browsing, mouse traces etc) are crucial indicators to
distinct learning from other activities
▪ Behavioural traces also facilitate user modeling/classification: prediction of knowledge state
(competence) and gain (learning) without any prior knowledge of the user
▪ Resource features (e.g. complexity, language) improve classification significantly
▪ Multimodal features likely to provide useful indicators
▪ Learning is ubiquitous also in social platforms (science discourse as specific case)
▪ Data is very costly (lab studies, crowdsourced session data)
Key take-aways
32
References
33
• Hoppe, A., Holtz, P., Kammerer, Y., Yu, R., Dietze, S., Ewerth, R., LILE2018, in conjunction with ACM Web Science 2018 (WebSci18), Amsterdam,
NL, 27 May, 2018.
• Von Hoyer, J., Hoppe, A., Kammerer, Y., Otto, C., Pardi, G., Rokicki, M., Yu, R., Dietze, S., Ewerth, R., Holtz, P., The SAL Spaceship: Towards a
comprehensive model of psychological and technological facets of search as learning (SAL), Frontiers in Psychology, Section Human-Media
Interaction, 2022.
• Yu, R., Limock, Dietze, S., Still Haven’t Found What You’re Looking For - Detecting the Intent of Web Search Missions from User Interaction
Features. CoRR abs/2207.01256
• Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM CHIIR 2018.
• Davari, M., Yu., R., Dietze, S., Understanding the Influence of Topic Familiarity on Search Behavior in Digital Libraries, EARS 2019 – International
Workshop on ExplainAble Recommendation and Search, collocated with SIGIR2019, Paris, July 2019.
• Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web.
ACM SIGIR 2018.
• Yu, R., Tang, R., Rokicki, M., Gadiraju, U., Dietze, S., Topic-independent Modeling of User Knowledge in Informational Search Sessions.
Information Retrieval Journal (2021): 1-29
• Otto, C., Yu, R., Pardi, G., von Hoyer, J., Rokicki, M., Hoppe, A., Holtz, P., Kammerer, Y., Dietze, S., Ewerth, E., Predicting Knowledge Gain during
Web Search based on Multimedia Resource Consumption, 22nd International Conference on Artificial Intelligence in Education (AIED2021),
Springer, 2021.
• Otto, C., Rokicki, M., Pardi, G., Gritz, W., Hienert, D.,Yu, R., Hoyer, J., Hoppe, A., Dietze, S., Holtz, P., Kammerer, Y., Ewerth, R., SaL-Lightning
Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search, ACM CHIIR2022.
• Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online
Discourse, CIKM2022
Acknowledgements & thanks
34
▪ All co-authors
▪ Knowledge Technologies for the Social Sciences @ GESIS
http://gesis.org/en/kts
▪ Data & Knowledge Engineering group at HHU
https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering
▪ SALIENT project team
https://projects.tib.eu/salient/
▪ AI4Sci project team
https://ai4sci-project.org/
▪ Funders: BMBF, Leibniz Association, ANR
▪ The HELMeTO team