Anzeige
Anzeige

Más contenido relacionado

Similar a An interdisciplinary journey with the SAL spaceship – results and challenges in the emerging field of Search As Learning (SAL)(20)

Más de Stefan Dietze(20)

Anzeige

An interdisciplinary journey with the SAL spaceship – results and challenges in the emerging field of Search As Learning (SAL)

  1. An interdisciplinary journey with the SAL spaceship – results and challenges in the emerging field of Search As Learning (SAL) HELMETO Conference, Palermo Stefan Dietze, 22.09.2022
  2. Informal and microlearning on the Web 2 ▪ Anything can be a learning resource ▪ The activity makes the difference (not only the resource): i.e. how a resource is being used ▪ Learning Analytics data in online/non- learning environments? o Activity streams, o Social graphs (and their evolution), o Behavioural traces (mouse movements, keystrokes) o ... ▪ Research challenges: o How to detect „learning“? o How to detect learning-specific notions such as „competences“, „learning performance“ etc?
  3. SAL = „Search As Learning“ 3 Research challenges at the intersection of AI/ML, HCI & cognitive psychology ▪ Detecting coherent search missions? ▪ Detecting learning throughout search? detecting “informational” search missions (as opposed to “transactional” or “navigational” missions) ▪ How competent is the user? – Predict/understand knowledge state of users based on in-session behavior/interactions ▪ How well does a user achieve his/her learning goal/information need? - Predict knowledge gain throughout search session Hoppe, A., Holtz, P., Kammerer, Y., Yu, R., Dietze, S., Ewerth, R., Current Challenges for Studying Search as Learning Processes, 7th Workshop on Learning & Education with Web Data (LILE2018), in conjunction with ACM Web Science 2018 (WebSci18), Amsterdam, NL, 27 May, 2018.
  4. What is the „SAL Spaceship“? 4
  5. „SAL Spaceship“ – an interdisciplinary, conceptual SAL framework 5 Von Hoyer, J., Hoppe, A., Kammerer, Y., Otto, C., Pardi, G., Rokicki, M., Yu, R., Dietze, S., Ewerth, R., Holtz, P., The SAL Spaceship: Towards a comprehensive model of psychological and technological facets of search as learning (SAL), Frontiers in Psychology, Section Human-Media Interaction, 2022.
  6. SAL Spaceship Focus: Learner 6 • Model focuses on „informational“ search • Cf. search intent taxonomy by Broder, 2002 • Parts of it applicable to other types of search intents („transactional“, „navigational“)
  7. Can we detect „learning“ in Web Search? Yu, R., Limock, Dietze, S., Still Haven’t Found What You’re Looking For - Detecting the Intent of Web Search Missions from User Interaction Features. CoRR abs/2207.01256 ▪ Segmentation of real-world query logs (AOL dataset) into logical and physical sessions ▪ Segments from Hagen et al. 2013 (2881 logical sessions, 1378 missions, average 6.5 queries per session) ▪ Task: supervised classification of mission intent into transactional, navigational and informational ▪ Basic machine learning models (DT, RF, LR, SVM) ▪ 22 features in 3 categories (query, browsing, mission)
  8. Can we detect „learning“ in Web Search? Yu, R., Limock, Dietze, S., Still Haven’t Found What You’re Looking For - Detecting the Intent of Web Search Missions from User Interaction Features. CoRR abs/2207.01256
  9. Can we detect „learning“ in Web Search? Yu, R., Limock, Dietze, S., Still Haven’t Found What You’re Looking For - Detecting the Intent of Web Search Missions from User Interaction Features. CoRR abs/2207.01256
  10. Understanding knowledge gain/state of users during search Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM CHIIR 2018. Data collection - summary ▪ Crowdsourced collection of search session data ▪ 10 search topics (e.g. “Altitude sickness”, “Tornados”), incl. pre- and post-tests ▪ Approx. 1000 distinct crowd workers & 100 sessions per topic ▪ Tracking of user behavior through 76 features in 5 categories (session, query, SERP – search engine result page, browsing, mouse traces)
  11. Understanding knowledge gain/state of users during search 11 Some results ▪ 70% of users exhibited a knowledge gain (KG) ▪ Negative relationship between KG of users and topic popularity (avg. accuracy of workers in knowledge tests) (R= -.87) ▪ Amount of time users actively spent on web pages describes 7% of the variance in their KG ▪ Query complexity explains 25% of the variance in the KG of users ▪ Topic-dependent behavior: search behavior correlates stronger with search topic than with KG/KS Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM CHIIR 2018.
  12. 12 SAL Spaceship Focus: Information Retrieval Backend
  13. ▪ Same session data as Gadiraju et al., 2018 ▪ Stratification of users into classes: user knowledge state (KS) and knowledge gain (KG) into {low, moderate, high} using (low < (mean ± 0.5 SD) < high) ▪ Supervised multiclass classification (Naive Bayes, Logistic regression, SVM, random forest, multilayer perceptron) ▪ KG prediction performance results (after 10-fold cross-validation) ▪ Considers in-session features (behavioural traces) only Predicting knowledge gain/state during web search 13 Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
  14. Predicting knowledge gain/state during SAL: Features 14 Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018. Behavioral features
  15. ▪ Feature importance (knowledge gain prediction task) Predicting knowledge gain/state during web search 15 Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
  16. ▪ Feature importance (knowledge state prediction task) Predicting knowledge gain/state during web search 16 Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018.
  17. Does topic familiarity influence search/learning behaviour? 17 Davari, M., Yu., R., Dietze, S., Understanding the Influence of Topic Familiarity on Search Behavior in Digital Libraries, EARS 2019 – International Workshop on ExplainAble Recommendation and Search, collocated with SIGIR2019, Paris, July 2019. ▪ Small lab study (N=25) using eyetracking data in digital library / scholarly literature search (SowiPort) ▪ 50 sessions (for each user one on familiar topic, one on unfamiliar topic) ▪ 2344 web pages viewed (SERPs, actual docs); 2.6 M rows of eye tracking data ▪ Familiar tasks: more fixated terms, longer sessions and less query term variance (indicators for prior competence) ▪ Unfamiliar tasks: more focus on SERPs (rather than actual resources) ▪ Yet in unfamiliar tasks, only 51.2% of query terms are fixated before acquisition (compared to 60.7% for familiar tasks)
  18. Search/learning behavior is important, but how about the resources? 18
  19. Search/learning behavior is important, but how about the resources? 19
  20. Understanding the impact of (learning) resource characteristics 20 Yu, R., Tang, R., Rokicki, M., Gadiraju, U., Dietze, S., Topic-independent Modeling of User Knowledge in Informational Search Sessions. Information Retrieval Journal (2021): 1-29 ▪ Understanding the relation between Web resource features (e.g. resource complexity, language, length) and a user’s knowledge state (KS) and knowledge gain (KG). ▪ Understanding the topic-specificity of individual features, i.e. dependency between feature performance and information needs (topics) ▪ Building generalizable ML models that can be used in real-world search environments on unseen topics for predicting learning & competence from both behavioral and resource-centric features ▪ Approach/experimental setup: same dataset from SIGIR2018, but additional features and feature selection strategy (maximise correlation with target variable and generalizability)
  21. Yu, R., Tang, R., Rokicki, M., Gadiraju, U., Dietze, S., Topic-independent Modeling of User Knowledge in Informational Search Sessions. Information Retrieval Journal (2021): 1-29 Characteristics of (learning) resources (instead of behaviour) 21 Web resource features & correlation coefficients (highlighted: p > 0.05)
  22. Characteristics of (learning) resources (instead of behaviour) 22 ▪ Model performance on knowledge state prediction & knowledge gain prediction ▪ Significant improvements across all classes KS New: this work (IRJ21) Baseline: SIGIR2018 (previous slides) Yu, R., Tang, R., Rokicki, M., Gadiraju, U., Dietze, S., Topic-independent Modeling of User Knowledge in Informational Search Sessions. Information Retrieval Journal (2021): 1-29
  23. How does multimodality affect the knowledge (g/s) prediction? 23 Otto, C., Yu, R., Pardi, G., von Hoyer, J., Rokicki, M., Hoppe, A., Holtz, P., Kammerer, Y., Dietze, S., Ewerth, E., Predicting Knowledge Gain during Web Search based on Multimedia Resource Consumption, 22nd International Conference on Artificial Intelligence in Education (AIED2021), Springer, 2021. ▪ Lab study for data collection (N=113) ▪ Topic: “Lightning & thunderstorms” (causal chain of events, including declarative and procedural knowledge) ▪ Knowledge test: 10 item multiple choice test pre-/post ▪ Tracking of behavioral features & text features (all 110 features from IRJ) ▪ Additionally: multimedia features & eye tracking o Detect learning frames (actual reading) in screencast (as opposed to navigation/procrastination) o Detecting key structure (heading, menu, content list, text, images …) o Classifying image types (Infographics, Indoor Photo, Map, Outdoor Photo, Technical Drawing, Information Visualization) Classifier trained through weak labels (images crawled through Google Image Search)
  24. How does multimodality affect the knowledge (g/s) prediction? 24 Otto, C., Yu, R., Pardi, G., von Hoyer, J., Rokicki, M., Hoppe, A., Holtz, P., Kammerer, Y., Dietze, S., Ewerth, E., Predicting Knowledge Gain during Web Search based on Multimedia Resource Consumption, 22nd International Conference on Artificial Intelligence in Education (AIED2021), Springer, 2021. ▪ Results for knowledge gain prediction (TI = text features, MI = multimedia features) ▪ Feature importance (Mean Decrease in Impurity) in RF model IRJ2021
  25. Facilitating SAL research through public research data 25 https://data.uni-hannover.de/dataset/sal-dataset Otto, C., Rokicki, M., Pardi, G., Gritz, W., Hienert, D.,Yu, R., Hoyer, J., Hoppe, A., Dietze, S., Holtz, P., Kammerer, Y., Ewerth, R., SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search, ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR2022).
  26. SAL Spaceship … can fly! 26
  27. Learning on the Web beyond Google et al. E.g.: Are Twitter users not learning too? 27 https://ai4sci-project.org/ Science claim Science reference Science relevance No science Science reference Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., "SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse", CIKM2022
  28. Learning on the Web beyond Google et al. Science discourse is on the rise 28 ▪ AI4Sci project: understanding and classification of science discourse online (news, social Web) https://ai4sci-project.org/ ▪ Percentage of tweets containing links to scientific articles (journals, publishers, science blogs etc) ▪ Uses list of > 30 K science web domains ▪ Data source: TweetsKB (https://data.gesis.org/tweetskb/), > 10 bn tweets archived since 2013
  29. Learning on the Web beyond Google et al. Science discourse is on the rise 29 https://ai4sci-project.org/ SciBERT classifier Heuristic: Sci term Sci subdomain
  30. SciTweets dataset & classifier 30 ▪ Ground truth dataset, heuristics-based sampling strategy and annotation framework for testing classification models ▪ 1261 expert-labeled tweets across all classes/labels ▪ Baseline classifiers based on SciBERT transformer model (fine-tuned/tested on SciTweets) ▪ Ongoing: analysis of large-scale science discourse and its evolution https://ai4sci-project.org/ Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse, CIKM2022
  31. The SAL Spaceship in the context of ubiquitous online learning 31 Von Hoyer, J., Hoppe, A., Kammerer, Y., Otto, C., Pardi, G., Rokicki, M., Yu, R., Dietze, S., Ewerth, R., Holtz, P., The SAL Spaceship: Towards a comprehensive model of psychological and technological facets of search as learning (SAL), Frontiers in Psychology, Section Human-Media Interaction, 2022.
  32. ▪ Knowledge acquisition (learning) is a ubiquitous activity on the Web ▪ Search as learning = specific case of informal/microlearning during Web search & browsing ▪ Behavioural traces (e.g. scrolling, queries, browsing, mouse traces etc) are crucial indicators to distinct learning from other activities ▪ Behavioural traces also facilitate user modeling/classification: prediction of knowledge state (competence) and gain (learning) without any prior knowledge of the user ▪ Resource features (e.g. complexity, language) improve classification significantly ▪ Multimodal features likely to provide useful indicators ▪ Learning is ubiquitous also in social platforms (science discourse as specific case) ▪ Data is very costly (lab studies, crowdsourced session data) Key take-aways 32
  33. References 33 • Hoppe, A., Holtz, P., Kammerer, Y., Yu, R., Dietze, S., Ewerth, R., LILE2018, in conjunction with ACM Web Science 2018 (WebSci18), Amsterdam, NL, 27 May, 2018. • Von Hoyer, J., Hoppe, A., Kammerer, Y., Otto, C., Pardi, G., Rokicki, M., Yu, R., Dietze, S., Ewerth, R., Holtz, P., The SAL Spaceship: Towards a comprehensive model of psychological and technological facets of search as learning (SAL), Frontiers in Psychology, Section Human-Media Interaction, 2022. • Yu, R., Limock, Dietze, S., Still Haven’t Found What You’re Looking For - Detecting the Intent of Web Search Missions from User Interaction Features. CoRR abs/2207.01256 • Gadiraju, U., Yu, R., Dietze, S., Holtz, P.,. Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM CHIIR 2018. • Davari, M., Yu., R., Dietze, S., Understanding the Influence of Topic Familiarity on Search Behavior in Digital Libraries, EARS 2019 – International Workshop on ExplainAble Recommendation and Search, collocated with SIGIR2019, Paris, July 2019. • Yu, R., Gadiraju, U., Holtz, P., Rokicki, M., Kemkes, P., Dietze, S., Analyzing Knowledge Gain of Users in Informational Search Sessions on the Web. ACM SIGIR 2018. • Yu, R., Tang, R., Rokicki, M., Gadiraju, U., Dietze, S., Topic-independent Modeling of User Knowledge in Informational Search Sessions. Information Retrieval Journal (2021): 1-29 • Otto, C., Yu, R., Pardi, G., von Hoyer, J., Rokicki, M., Hoppe, A., Holtz, P., Kammerer, Y., Dietze, S., Ewerth, E., Predicting Knowledge Gain during Web Search based on Multimedia Resource Consumption, 22nd International Conference on Artificial Intelligence in Education (AIED2021), Springer, 2021. • Otto, C., Rokicki, M., Pardi, G., Gritz, W., Hienert, D.,Yu, R., Hoyer, J., Hoppe, A., Dietze, S., Holtz, P., Kammerer, Y., Ewerth, R., SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search, ACM CHIIR2022. • Hafid, S., Schellhammer, S., Bringay, S., Todorov, K., Dietze, S., SciTweets - A Dataset and Annotation Framework for Detecting Scientific Online Discourse, CIKM2022
  34. Acknowledgements & thanks 34 ▪ All co-authors ▪ Knowledge Technologies for the Social Sciences @ GESIS http://gesis.org/en/kts ▪ Data & Knowledge Engineering group at HHU https://www.cs.hhu.de/en/research-groups/data-knowledge-engineering ▪ SALIENT project team https://projects.tib.eu/salient/ ▪ AI4Sci project team https://ai4sci-project.org/ ▪ Funders: BMBF, Leibniz Association, ANR ▪ The HELMeTO team
  35. 35 @stefandietze https://stefandietze.net http://gesis.org/en/kts
Anzeige