Relevance Clues: Developing an experimental research design to investigate a surrogate’s relevance clues
1. RELEVANCE CLUES
Developing an experimental research design to investigate
a surrogate’s relevance clues
Christiane Behnert
Hamburg University of Applied Sciences
PhD Student Workshop IWiSt – DIPF
10th and 11th October 2017 in Hildesheim
2. DEPARTMENT OF INFORMATION
Christiane Behnert
OUTLINE
1. Problem statement
2. Research questions
3. Related research
4. Approach
5. First results
6. Experimental design
7. References
8. Discussion
1
3. DEPARTMENT OF INFORMATION
Christiane Behnert
1. PROBLEM STATEMENT
→ Today’s academic search systems integrate additional data into their search results
presentation.
• Factors for popularity based on the wisdom of crowds principle [Surowiecki, 2005]
• Popularity data in academia [Plassmeier et al., 2015]
- Number of citations (e.g., Google Scholar)
- Number of downloads (e.g., ACM Digital library)
- Circulation counts
- Number of copies in a library
2
4. DEPARTMENT OF INFORMATION
Christiane Behnert
2. RESEARCH QUESTIONS
What are the clues and criteria by which users judge a search result’s relevance?
RQ I. What clues within a surrogate do users use to judge on its relevance?
RQ II. Which clues affect the relevance decision to what extent?
RQ III. What influence does the use situation (e.g., the user’s location, time pressure)
have on the relevance judgment?
RQ IV. What relevance criteria can be determined by the answers to RQs I-III, and
how can they be weighed against each other?
3
5. DEPARTMENT OF INFORMATION
Christiane Behnert
3. RELATED RESEARCH
3.1. Relevance criteria [Mizzaro, 1997; Saracevic, 2016]
− Factors and criteria influence user‘s relevance judgments
− System-oriented view: Topical match between query and document
− User-oriented view: Beyond topical relevance [e.g., Barry & Schamber, 1998]
Validity, recency, availability, and credibility of the information source
− Credibility and quality as important factors to filter and judge information on the web
[Rieh & Belkin, 2000]
− Cognitive authority [Wilson, 1983]
→ Author‘s (academic) impact [Rieh, 2009]
4
6. DEPARTMENT OF INFORMATION
Christiane Behnert
3. RELATED RESEARCH
3.2. Document representations
− Predictive and evaluative relevance judgments [Rieh, 2002]
− Studies on relevance criteria involving predictive judgments reveal: [Saracevic, 2016]
→ Titles and abstracts provide most of the relevance clues
→ Great importance of topicality and content-related criteria
− Studies in the Web search context: snippets, hyperlinks as surrogates
→ Information quality and cognitive authority [Rieh, 2002]
➔ At the time of the previous research, additional/popularity data were not part of the
search results.
➔ So far, no studies on relevance judgments of surrogates that include this kind of data,
and, at the same time, considering the user perspective have been published.
5
7. DEPARTMENT OF INFORMATION
Christiane Behnert
3. RELATED RESEARCH
3.3. User models on relevance criteria (1/2)
6
Source: Wang & Soergel, 1998, p. 118
Document
Selection
Model for
Academic
Users in
Agricultural
Economics
8. DEPARTMENT OF INFORMATION
Christiane Behnert
3. RELATED RESEARCH
3.3. User models on relevance criteria (2/2)
7Source:
Rieh, 2002, p. 158.
Model of judgment of
information quality and
cognitive authority
9. DEPARTMENT OF INFORMATION
Christiane Behnert
4. APPROACH
The research goal is pursued in five steps:
(1) an extensive literature review of related research,
(2) the development of an initial user model of relevance criteria which allows to formulate
hypotheses,
(3) an empirical examination through the conduction of online experiments with human test
persons,
(4) the statistical data analysis,
(5) the alteration of the initial user model after reflecting the results.
8
10. DEPARTMENT OF INFORMATION
Christiane Behnert
5. FIRST RESULTS
Systematic examination of 47 empirical studies on relevance criteria (1988–2016) 1/2
(a) data collection method [Döring & Bortz, 2016]
(b) number and status of participants
(c) whether or not the relevance criteria were weighted against each other
(d) whether or not the criteria were presented to the participants prior to their judgments
(e) context of search tasks (e.g., ELIS or academic)
9
25
22
Distribution of data collection
methods, n=47
Survey/Questionnaire
Observation &
survey/questionnaire
Survey / Questionnaire (Q) 25
Other survey or questionnaire method (Q) 24
Think aloud method (Q) 18
Diary method (Q) 3
Focus group discussion (Q) 2
Observation (O) & survey / questionnaire 22
Other observation method (O) 16
Log file analysis (O) 4
Eye tracking (O) 2
Table 1. Methods of data collection
11. DEPARTMENT OF INFORMATION
Christiane Behnert
5. FIRST RESULTS
Systematic examination of 47 empirical studies on relevance criteria (1988–2016) 2/3
EXCURSUS:
Experimental research designs aim to establish causality between variables (stimuli or
independent variables, and effects or dependent variables) [Sedlmeier & Renkewitz, 2007]:
Three prerequisites for causal conclusions:
− Covariance
− Temporal precedence
− Exclusion of an alternative explanation
(confounding variables)
→ Manipulation of independent variables
→ Controlling possible confounding variables
- Randomization
- Counterbalancing
10
Randomization Manipulation Grouping
Experiment
Quasi-experiment
Non-experimental study /
natural experiment
Table 2. Types of experimental studies
[Modified after Berger & Wolbring, 2015]
12. DEPARTMENT OF INFORMATION
Christiane Behnert
5. FIRST RESULTS
Systematic examination of 47 empirical studies on relevance criteria (1988–2016) 3/3
11
Source
Number of
participants
Criteria
weighted
Criteria
presented
Context
Regazzi (1988) 32 Yes No ACAD
Choi & Rasmussen (2002) 38 Yes Yes ACAD
Toms, O’Brien, Kopak, & Freund (2005) 48 No No ACAD/ELIS
De Sabbata & Reichenbacher (2012) 242 Yes Yes ELIS
Kim, Kazai, & Zitouni (2013) 28 No Yes ELIS
Hamid, Thom, & Iskandar (2016) 48 Yes Yes Work
Table 3. Studies on relevance criteria with experimental research designs
13. DEPARTMENT OF INFORMATION
Christiane Behnert
6. EXPERIMENTAL DESIGN
Overview
6.1. Dependent and independent variables
6.2. Multifactorial mixed design
6.3. Controlling for potential confounding variables
6.4. Sample
6.5. Questionnaires and measuring
12
14. DEPARTMENT OF INFORMATION
Christiane Behnert
6. EXPERIMENTAL DESIGN
6.1. Dependent and independent variables
Dependent variable: Predictive relevance judgment of the search result to a given task
13
No. IV Level A Level B Level C Presentation
1
Matched query terms (Title,
abstract, journal)
Low number High number n.a.
Query,
surrogate
2
Recency
(Publication date; in relation
to field, information need
etc.)
„Old“
document
„New“
document
n.a. Surrogate
3 Location / availability
Near / open
acces
Far / not open
access
n.a.
Situation,
query,
surrogate
4 No. of citations (document) Low number High number Not provided Surrogate
5 Author‘s reputation (h-index) Low number High number Not provided Surrogate
6
Usage (No. of downloads /
circulations)
Low number High number Not provided Surrogate
7 Use situation: User location On-site Off-site n.a. Situation
8 Use situation: Time pressure Low High n.a. Situation
Table 4. Potential relevance clues within a surrogate as independent variables (IV).
15. DEPARTMENT OF INFORMATION
Christiane Behnert
6. EXPERIMENTAL DESIGN
6.2. Multifactorial mixed design
Between-subjects and within-subjects design
• Comparison between subjects: One subject is assigned to one condition
• Comparison within subjects: One subject is assigned to all conditions
6 independent variables on 2 or 3 levels
=> 2x2x2x3x3x3 = 216 experimental conditions for each of the four groups
→ Selection of conditions with latin square
Multifactorial design
• Several clues → multiple factors
14
Group 1 Group 2 Group 3 Group 4
User location: On-site User location: On-site User location: Off-site User location: Off-site
Time pressure: Low Time pressure: High Time pressure: Low Time pressure: High
Table 5. Independent variables 7 and 8 as between-subjects design
16. DEPARTMENT OF INFORMATION
Christiane Behnert
6. EXPERIMENTAL DESIGN
6.3. Controlling for potential confounding variables
(a) Characteristics of the test subjects
- Sample as homogeneous as possible
- Randomised assignment of a test person to a task (experimental condition)
(b) Test situation
- Identical instructions, test interface for subjects
- Randomisation of tasks (conditions)
- Presentation of surrogates in a randomised order (→ learning effects)
- Revealing the true purpose of experiment to test persons after their task
performances (→ expectation effects)
(c) Researcher (→Rosenthal effect)
- No physical contact or interaction between researcher and test subjects
15
17. DEPARTMENT OF INFORMATION
Christiane Behnert
6. EXPERIMENTAL DESIGN
6.4. Sample
Sample characteristics:
− Academic context
− People with experiences with academic search engines, scientific information systems
- Master students
- PhD students
- Research assistants
- Academic staff
- Universities or research institutes
Sample size:
− At least 400
− Exact size needs to be established
Incentives:
− 10 EUR Amazon gift vouchers 16
18. DEPARTMENT OF INFORMATION
Christiane Behnert
6. EXPERIMENTAL DESIGN
6.5. Questionnaires and measuring
Pre-task questionnaire:
− Demographic data (age, gender)
− Status, research domain
− Self-assessment about experience with research in academic information systems
− „Big Five“ Personality test
Measuring:
− Relevance is not binary, but graded
→ Scale assessment from „not relevant“ (0 points) to „relevant“ (100 points)
Post-task questionnaire:
− Open questions regarding relevance judgments, criteria etc. 17
19. DEPARTMENT OF INFORMATION
Christiane Behnert
7. REFERENCES
Barry, C. L., & Schamber, L. (1998). Users’ criteria for relevance evaluation: A cross-situational comparison. Information
Processing & Management, 34(2–3), 219–236.
Berger, R., & Wolbring, T. (2015). Kontrafaktische Kausalität und eine Typologie sozialwissenschaftlicher Experimente.
In M. Keuschnigg & T. Wolbring (Hrsg.), Experimente in den Sozialwissenschaften (S. 34–52). Baden-Baden: Nomos.
Döring, N., & Bortz, J. (2016). Forschungsmethoden und Evaluation in den Sozial- und Humanwissenschaften (5. Aufl.).
Berlin, Heidelberg: Springer.
Kelly, D., & Cresenzi, A. (2016). From design to analysis: Conducting controlled laboratory experiments with users. In
Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval -
SIGIR ’16 (1207–1210). New York, New York, USA: ACM Press.
Mizzaro, S. (1997). Relevance: The whole history. Journal of the American Society for Information Science, 48(9), 810–
832.
Plassmeier, K., Borst, T., Behnert, C., & Lewandowski, D. (2015). Evaluating popularity data for relevance ranking in
library information systems. In Proceedings of the 78th ASIS&T Annual Meeting (Vol. 51).
Rieh, S. Y. (2002). Judgment of information quality and cognitive authority in the Web. Journal of the American Society for
Information Science and Technology, 53(2), 145–161.
Rieh, S. Y. (2009). Credibility and cognitive authority of information. In Encyclopedia of Library and Information Sciences
(Third ed., S. 1337–1344). CRC Press.
Rieh, S. Y., & Belkin, N. (2000). Interaction on the Web: Scholars’ judgement of information quality and cognitive
authority. Proceedings of the 63rd Annual Meeting of the ASIS, 37, 25–38.
Saracevic, T. (2016). The notion of relevance in Information Science: Everybody knows what relevance is. But, what is it
really? (G. Marchionini, Ed.), Synthesis Lectures on Information Concepts, Retrieval, and Services; 50. Morgan &
Claypool.
Sedlmeier, P., & Renkewitz, F. (2007). Forschungsmethoden und Statistik in der Psychologie. München ; Boston [u.a.]:
Pearson Studium, 123-180.
Surowiecki, J. (2005). The wisdom of crowds. New York, NY [u.a.]: Anchor Books.
Wang, P., & Soergel, D. (1998). A cognitive model of document use during a research project. Study I. Document
selection. Journal of the American Society for Information Science, 49(2), 115–133.
Wilson, P. (1983). Second-hand knowledge: An inquiry into cognitive authority. Westport, Conn.; London, UK:
Greenwood Press.
18
20. DEPARTMENT OF INFORMATION
Christiane Behnert
8. DISCUSSION
Selection of independent variables
→ Focus on data as potential relevance clues that have not been investigated yet?
Selection of topics, tasks, search queries
→ How domain-(un)related should (can) it be?
→ Would test subjects recognize fakes, i.e., manipulated elements?
19
21. THANK YOU VERY MUCH!
I APPRECIATE YOUR FEEDBACK!
Christiane Behnert
Hamburg University of Applied Sciences, Germany
christiane.behnert@haw-hamburg.de
http://searchstudies.org/christiane-behnert/