SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Matt Lease
• School of Information @mattlease
University of Texas at Austin ml@utexas.edu
Joint work with
with
Yinglong Zhang Jin Zhang Jacek Gwizdka
Multidimensional Relevance Modeling
via Psychometrics & Crowdsourcing
slides: www.slideshare.net/mattlease
Saracevic’s ‘97 Salton Award address
“…the human-centered side was often highly critical
of the systems side for ignoring users... [when]
results have implications for systems design &
practice. Unfortunately… beyond suggestions,
concrete design solutions were not delivered.
“…the systems side by and large ignores the user
side and user studies… the stance is ‘tell us what
to do and we will.’ But nobody is telling...
“Thus, there are not many interactions…”
Matt Lease <ml@utexas.edu> 2/20
Primary Research Question
• What is relevance?
– What factors constitute it? Can we quantify their
relative importance? How do they interact?
• Old IR question, many studies, little agreement
• Potential impacts?
– Further understanding of cognitive relevance
– Guide IR engineering toward inferring key factors
– Foster multi-dimensional evaluation of IR systems
Matt Lease <ml@utexas.edu> 3/20
Secondary Research Question
• How can we measure/ensure quality of
subjective relevance judgments
– How can we distinguish valid subjectivity vs. human
error in judging disagreements (traditional or online)?
• Potential impacts
– Help explain/reduce judging disagreements
– Enable evaluation wrt. distribution of opinions
– Encourage other subjective data collection in HCOMP
Matt Lease <ml@utexas.edu> 4/20
Pscychology to the Rescue!
• A Guide to Behavioral Experiments
on Mechanical Turk
– W. Mason and S. Suri (2010). SSRN online.
• Crowdsourcing for Human Subjects Research
– L. Schmidt (CrowdConf 2010)
• Crowdsourcing Content Analysis for Behavioral Research:
Insights from Mechanical Turk
– Conley & Tosti-Kharas (2010). Academy of Management
• Amazon's Mechanical Turk : A New Source of
Inexpensive, Yet High-Quality, Data?
– M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.
– see also: Amazon Mechanical Turk Guide for Social Scientists
5/20
August 12, 2012
6/20
Contributions
• Describe a simple, reliable, scalable method for
collecting diverse (subjective), multi-dimensional
relevance judgments from online participants
– Online survey techniques from pscyhometrics
– Data available online
• Describe a rigorous, positivist, data-driven framework
for inferring & modeling multi-dimensional relevance
– Structural equation modeling (SEM) from pscyhometrics
– Run the experiment & let the data speak for itself!
– Implemented in standard R libraries available online
Matt Lease <ml@utexas.edu> 7/20
An example model of multi-dimensional relevance
Matt Lease <ml@utexas.edu> 8/20
Experimental Design
• Define some search tasks
• Pick some documents to be judged
• Hypothesize some relevance dimensions
• Ask participants to answer some questions
• Analyze data via Structural Equation Modeling (SEM)
– Use Exploratory Factor Analysis (EFA) to assess question-
factor relationships, then prune “bad” questions
– Use Confirmatory Factor Analysis (CFA) to assess
correlations, test significance, & compare models
– Cousin to graphical models in statistics/AI
Matt Lease <ml@utexas.edu> 9/20
Collecting multi-dimensional relevance
judgments
• Participant picks one of several pre-defined topics
– You want to plan a one week vacation in China
• Participant assigned a Web page to judge
– We wrote a query for each topic, submitted to a popular
search engine, and did stratified sampling of results
• Participant answers a set of likert-scale questions
– I think the information in this page is incorrect
– It’s difficult to understand the information in this page
– …
Matt Lease <ml@utexas.edu> 10/20
What Questions might we ask?
• What factors do you think impact relevance…
• We hypothesize same 5 factors as Xu & Chen ’06
– Topicality, reliability, novelty, understability, & scope
– Choose same to make revised mechanics & any
difference in findings maximally clear
• Assume factors are incomplete & imperfect
– Positivist approach: do these factors explain
observed data better than other alternatives:
uni-dimensional relevance or another set of factors?
Matt Lease <ml@utexas.edu> 11/20
How do we ask the questions?
• Ask 3+ questions per hypothesized dimension
– Ask repeated, similar questions, & change polarity
– Randomize question order (don’t group questions)
– Over-generate questions to allow for later pruning
– Exclude participants failing self-consistency checks
• Usual stuff
– Use clear, familiar, non-leading wording
– Balance likert response scale,
– Pre-test survey in-house, then pilot study online
Matt Lease <ml@utexas.edu> 12/20
Structural Equation Modeling (SEM)
• Based on Sewell Wright’s path analysis (1921)
– A factor model is parameterized by factor loadings,
covariances, & residual error terms
• Graphical representation: path diagram
– Observed variables in boxes
– Latent variables in ovals
– Directed edges denote
causal relationships
– Residual error terms
implicitly assumed
Matt Lease <ml@utexas.edu> 13/20
Exploratory Factor Analysis (EFA) – 1 of 2
• Is the sample large enough for EFA?
– Kaiser-Mayer-Olkin (KMO) Measure of Adequacy
– Bartlett’s Test of Sphericity
• Principal Axis Factoring (PAF) to find eigenvalues
– Assume some large, constant # of latent factors
– Assume each factor has a connecting edge to each question
– Estimate factor model parameters by least-squares (ML)
• Promax (oblique) rotation to maximize correlations
• Prune factors via Parallel Analysis
– Create random data with same # factors & questions
– Create correlation matrix and find eigenvalues
Matt Lease <ml@utexas.edu> 14/20
• Perform Parallel Analysis
– Create random data w/ same # of factors & questions
– Create correlation matrix and find eigenvalues
• Create Scree Plot of Eigenvalues
• Re-run EFA for reduced factors
• Compute Pearson correlations
• Discard questions with:
– Weak factor loading
– Strong cross-factor loading
– Lack of logical interpretation
• Kenny’s Rule: need >= 2 questions per factor for EFA
Exploratory Factor Analysis (EFA) – 2 of 2
Matt Lease <ml@utexas.edu> 15/20
Question-Factor Loadings (Weights)
Matt Lease <ml@utexas.edu> 16/20
CFA: Assess and Compare Models
• F First-order baseline model uses a single
latent factor to explain observed data
Posited hierarchical factor model
uses 5 relevance dimensions
Matt Lease <ml@utexas.edu> 17/20
• Null model assume observations independent
– Covariance between questions fixed at 0 and all means and
coveriances left free
• Comparison stats
– Non-Normed Fit Index (NNFI)
– Comparative Fit Index (CFI)
– Root-Mean Squared Error of Approximation (RMSEA)
– Standardized-root Mean-Square Residual (SMSR)
Confirmatory Factor Analysis (CFA)
Matt Lease <ml@utexas.edu> 18/20
Our model of multi-dimensional relevance
Matt Lease <ml@utexas.edu> 19/20
Future Directions
• More data-driven positivist research into factors
– Different user groups, search scenarios, devices, etc.
– Need more data to support normative claims
• Train/test operational systems for varying factors
– Identify/extend detected features for each dimension
– Personalize search results for individual preferences
• Improve judging agreement by making task more
natural and/or assessing impact of latent factors?
• Intra-subject vs. inter-subject aggregation?
– Other methods for ensuring subjective data quality?
20/20
Thank You!
ir.ischool.utexas.edu
21
Slides: www.slideshare.net/mattlease

Weitere ähnliche Inhalte

Was ist angesagt?

Problem and situation analysis
Problem and situation analysisProblem and situation analysis
Problem and situation analysis
Moamen Abu Nada
 
Problem formulation
Problem formulationProblem formulation
Problem formulation
H9460730008
 
WPIPosterPresentation24x36
WPIPosterPresentation24x36WPIPosterPresentation24x36
WPIPosterPresentation24x36
Allan La
 

Was ist angesagt? (15)

Exploring Digital Assessments: How Teachers Improve Learning Outcomes
Exploring Digital Assessments: How Teachers Improve Learning OutcomesExploring Digital Assessments: How Teachers Improve Learning Outcomes
Exploring Digital Assessments: How Teachers Improve Learning Outcomes
 
Data analysis for effective decision making
Data analysis for effective decision makingData analysis for effective decision making
Data analysis for effective decision making
 
Introduction to meta-analysis
Introduction to meta-analysisIntroduction to meta-analysis
Introduction to meta-analysis
 
Request experiment at WES
Request experiment at WESRequest experiment at WES
Request experiment at WES
 
Request experiment in CHOICE LAB
Request experiment in CHOICE LABRequest experiment in CHOICE LAB
Request experiment in CHOICE LAB
 
EAPRIL explanatory evaluation
EAPRIL explanatory evaluationEAPRIL explanatory evaluation
EAPRIL explanatory evaluation
 
Problem and situation analysis
Problem and situation analysisProblem and situation analysis
Problem and situation analysis
 
3701552978
37015529783701552978
3701552978
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Using interactive models to enhance UML education
Using interactive models to enhance UML educationUsing interactive models to enhance UML education
Using interactive models to enhance UML education
 
Chapter 3 Methodology (Capstone Research)
Chapter 3   Methodology (Capstone Research)Chapter 3   Methodology (Capstone Research)
Chapter 3 Methodology (Capstone Research)
 
Problem formulation
Problem formulationProblem formulation
Problem formulation
 
Mixed Effects Models - Power
Mixed Effects Models - PowerMixed Effects Models - Power
Mixed Effects Models - Power
 
WPIPosterPresentation24x36
WPIPosterPresentation24x36WPIPosterPresentation24x36
WPIPosterPresentation24x36
 
System dynamics modeling and its applications on urban environmental management
System dynamics modeling and its applications on urban environmental managementSystem dynamics modeling and its applications on urban environmental management
System dynamics modeling and its applications on urban environmental management
 

Ähnlich wie Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SIGIR 2014 Presentation

Exploratory
Exploratory Exploratory
Exploratory
toby2036
 
Data analytics to support awareness and recommendation
Data analytics to support awareness and recommendationData analytics to support awareness and recommendation
Data analytics to support awareness and recommendation
Katrien Verbert
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral Outcomes
Alekya Yermal
 

Ähnlich wie Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SIGIR 2014 Presentation (20)

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 
Exploratory
Exploratory Exploratory
Exploratory
 
Structural_equation_modeling_SEM_worksho (2).pptx
Structural_equation_modeling_SEM_worksho (2).pptxStructural_equation_modeling_SEM_worksho (2).pptx
Structural_equation_modeling_SEM_worksho (2).pptx
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-complete
 
Data analytics to support awareness and recommendation
Data analytics to support awareness and recommendationData analytics to support awareness and recommendation
Data analytics to support awareness and recommendation
 
Rree measurement-larry-d3
Rree measurement-larry-d3Rree measurement-larry-d3
Rree measurement-larry-d3
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
Learning to Teach: Improving Instruction with Machine Learning Techniques
Learning to Teach: Improving Instruction with Machine Learning TechniquesLearning to Teach: Improving Instruction with Machine Learning Techniques
Learning to Teach: Improving Instruction with Machine Learning Techniques
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral Outcomes
 
Representative Of The Populationseek Your Dream/Tutorialoutletdotcom
Representative Of The Populationseek Your Dream/TutorialoutletdotcomRepresentative Of The Populationseek Your Dream/Tutorialoutletdotcom
Representative Of The Populationseek Your Dream/Tutorialoutletdotcom
 
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...[DSC Europe 22] Machine learning algorithms as tools for student success pred...
[DSC Europe 22] Machine learning algorithms as tools for student success pred...
 
Introduction to participatory systemic inquiry mongolia
Introduction to participatory systemic inquiry   mongoliaIntroduction to participatory systemic inquiry   mongolia
Introduction to participatory systemic inquiry mongolia
 
Socail Influence & Homophilly
Socail Influence & HomophillySocail Influence & Homophilly
Socail Influence & Homophilly
 
Online Learning to Rank
Online Learning to RankOnline Learning to Rank
Online Learning to Rank
 
MMR.pdf
MMR.pdfMMR.pdf
MMR.pdf
 
Survey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni ConnawaySurvey Research Methods with Lynn Silipigni Connaway
Survey Research Methods with Lynn Silipigni Connaway
 
Mixed methods
Mixed methodsMixed methods
Mixed methods
 
Psychometric instrument development
Psychometric instrument developmentPsychometric instrument development
Psychometric instrument development
 

Mehr von Matthew Lease

The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
Matthew Lease
 

Mehr von Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SIGIR 2014 Presentation

  • 1. Matt Lease • School of Information @mattlease University of Texas at Austin ml@utexas.edu Joint work with with Yinglong Zhang Jin Zhang Jacek Gwizdka Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing slides: www.slideshare.net/mattlease
  • 2. Saracevic’s ‘97 Salton Award address “…the human-centered side was often highly critical of the systems side for ignoring users... [when] results have implications for systems design & practice. Unfortunately… beyond suggestions, concrete design solutions were not delivered. “…the systems side by and large ignores the user side and user studies… the stance is ‘tell us what to do and we will.’ But nobody is telling... “Thus, there are not many interactions…” Matt Lease <ml@utexas.edu> 2/20
  • 3. Primary Research Question • What is relevance? – What factors constitute it? Can we quantify their relative importance? How do they interact? • Old IR question, many studies, little agreement • Potential impacts? – Further understanding of cognitive relevance – Guide IR engineering toward inferring key factors – Foster multi-dimensional evaluation of IR systems Matt Lease <ml@utexas.edu> 3/20
  • 4. Secondary Research Question • How can we measure/ensure quality of subjective relevance judgments – How can we distinguish valid subjectivity vs. human error in judging disagreements (traditional or online)? • Potential impacts – Help explain/reduce judging disagreements – Enable evaluation wrt. distribution of opinions – Encourage other subjective data collection in HCOMP Matt Lease <ml@utexas.edu> 4/20
  • 5. Pscychology to the Rescue! • A Guide to Behavioral Experiments on Mechanical Turk – W. Mason and S. Suri (2010). SSRN online. • Crowdsourcing for Human Subjects Research – L. Schmidt (CrowdConf 2010) • Crowdsourcing Content Analysis for Behavioral Research: Insights from Mechanical Turk – Conley & Tosti-Kharas (2010). Academy of Management • Amazon's Mechanical Turk : A New Source of Inexpensive, Yet High-Quality, Data? – M. Buhrmester et al. (2011). Perspectives… 6(1):3-5. – see also: Amazon Mechanical Turk Guide for Social Scientists 5/20
  • 7. Contributions • Describe a simple, reliable, scalable method for collecting diverse (subjective), multi-dimensional relevance judgments from online participants – Online survey techniques from pscyhometrics – Data available online • Describe a rigorous, positivist, data-driven framework for inferring & modeling multi-dimensional relevance – Structural equation modeling (SEM) from pscyhometrics – Run the experiment & let the data speak for itself! – Implemented in standard R libraries available online Matt Lease <ml@utexas.edu> 7/20
  • 8. An example model of multi-dimensional relevance Matt Lease <ml@utexas.edu> 8/20
  • 9. Experimental Design • Define some search tasks • Pick some documents to be judged • Hypothesize some relevance dimensions • Ask participants to answer some questions • Analyze data via Structural Equation Modeling (SEM) – Use Exploratory Factor Analysis (EFA) to assess question- factor relationships, then prune “bad” questions – Use Confirmatory Factor Analysis (CFA) to assess correlations, test significance, & compare models – Cousin to graphical models in statistics/AI Matt Lease <ml@utexas.edu> 9/20
  • 10. Collecting multi-dimensional relevance judgments • Participant picks one of several pre-defined topics – You want to plan a one week vacation in China • Participant assigned a Web page to judge – We wrote a query for each topic, submitted to a popular search engine, and did stratified sampling of results • Participant answers a set of likert-scale questions – I think the information in this page is incorrect – It’s difficult to understand the information in this page – … Matt Lease <ml@utexas.edu> 10/20
  • 11. What Questions might we ask? • What factors do you think impact relevance… • We hypothesize same 5 factors as Xu & Chen ’06 – Topicality, reliability, novelty, understability, & scope – Choose same to make revised mechanics & any difference in findings maximally clear • Assume factors are incomplete & imperfect – Positivist approach: do these factors explain observed data better than other alternatives: uni-dimensional relevance or another set of factors? Matt Lease <ml@utexas.edu> 11/20
  • 12. How do we ask the questions? • Ask 3+ questions per hypothesized dimension – Ask repeated, similar questions, & change polarity – Randomize question order (don’t group questions) – Over-generate questions to allow for later pruning – Exclude participants failing self-consistency checks • Usual stuff – Use clear, familiar, non-leading wording – Balance likert response scale, – Pre-test survey in-house, then pilot study online Matt Lease <ml@utexas.edu> 12/20
  • 13. Structural Equation Modeling (SEM) • Based on Sewell Wright’s path analysis (1921) – A factor model is parameterized by factor loadings, covariances, & residual error terms • Graphical representation: path diagram – Observed variables in boxes – Latent variables in ovals – Directed edges denote causal relationships – Residual error terms implicitly assumed Matt Lease <ml@utexas.edu> 13/20
  • 14. Exploratory Factor Analysis (EFA) – 1 of 2 • Is the sample large enough for EFA? – Kaiser-Mayer-Olkin (KMO) Measure of Adequacy – Bartlett’s Test of Sphericity • Principal Axis Factoring (PAF) to find eigenvalues – Assume some large, constant # of latent factors – Assume each factor has a connecting edge to each question – Estimate factor model parameters by least-squares (ML) • Promax (oblique) rotation to maximize correlations • Prune factors via Parallel Analysis – Create random data with same # factors & questions – Create correlation matrix and find eigenvalues Matt Lease <ml@utexas.edu> 14/20
  • 15. • Perform Parallel Analysis – Create random data w/ same # of factors & questions – Create correlation matrix and find eigenvalues • Create Scree Plot of Eigenvalues • Re-run EFA for reduced factors • Compute Pearson correlations • Discard questions with: – Weak factor loading – Strong cross-factor loading – Lack of logical interpretation • Kenny’s Rule: need >= 2 questions per factor for EFA Exploratory Factor Analysis (EFA) – 2 of 2 Matt Lease <ml@utexas.edu> 15/20
  • 16. Question-Factor Loadings (Weights) Matt Lease <ml@utexas.edu> 16/20
  • 17. CFA: Assess and Compare Models • F First-order baseline model uses a single latent factor to explain observed data Posited hierarchical factor model uses 5 relevance dimensions Matt Lease <ml@utexas.edu> 17/20
  • 18. • Null model assume observations independent – Covariance between questions fixed at 0 and all means and coveriances left free • Comparison stats – Non-Normed Fit Index (NNFI) – Comparative Fit Index (CFI) – Root-Mean Squared Error of Approximation (RMSEA) – Standardized-root Mean-Square Residual (SMSR) Confirmatory Factor Analysis (CFA) Matt Lease <ml@utexas.edu> 18/20
  • 19. Our model of multi-dimensional relevance Matt Lease <ml@utexas.edu> 19/20
  • 20. Future Directions • More data-driven positivist research into factors – Different user groups, search scenarios, devices, etc. – Need more data to support normative claims • Train/test operational systems for varying factors – Identify/extend detected features for each dimension – Personalize search results for individual preferences • Improve judging agreement by making task more natural and/or assessing impact of latent factors? • Intra-subject vs. inter-subject aggregation? – Other methods for ensuring subjective data quality? 20/20