SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
How do Gain and Discount Functions Affect the
Correlation between DCG and User Satisfaction?
Julián Urbano Mónica Marrero
ECIR 2015
Vienna, March 30th
Discount d(i ; k) Gain g(r)
Zipfian: 1/݅ Linear: ‫ݎ‬
Linear: ሺ݇ ൅ ݅ െ 1ሻ/݇ Exp(2): 2௥
െ 1
Constant: 1 Exp(3): 3௥
െ 1
Log(2): 1/ logଶ ݅ ൅ 1 Exp(5): 5௥
െ 1
Log(3): 1/ logଷ ݅ ൅ 2 Bin(1): Iሾ‫ݎ‬ ൒ 1ሿ
Log(5): 1/ logହሺ݅ ൅ 4ሻ Bin(2): Iሾ‫ݎ‬ ൒ 2ሿ
Discount functions
Rank i
Discountd(i)
0.00.20.40.60.81.0
1 2 3 4 5
Zipfian
Linear
Constant
Log(2)
Log(3)
Log(5)
Gain functions
Relevance r
Gaing(r)
0510152025
0 1 2
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)Bin(2)
Documents
Information
Need
Real World Cranfield
IR System
Topic
Relevance
Judgments
IR System
Documents
GAP
DCG
ERR
Static
Component
Dynamic
Component
Test
Collection
Effectiveness
Measures
Time to complete task, Idle time,
Success rate, Frustration, Satisfaction,
Ease of use, Ease of learning…
Precision, Average Precision, Reciprocal Rank,
Q-measure, Discounted Cumulative Gain,
Rank-Biased Precision, Time-Biased Gain…
Live Observation
What Gain and Discount for DCG are
better to predict user satisfaction?
• First, let’s normalize DCG scores (this is not nDCG!)
• One system with DCG=φ. What does it mean?
• Intuition: φ·100% of users will be satisfied
• P(Sat|DCG= φ)= φ
• Two systems with ΔDCG=Δφ. What does it mean?
• Intuition: users will prefer the (supposedly) better one
• P(Pref|ΔDCG=Δφ)=1
P(Sat) and P(Pref) depend on the systems,
not on how we evaluate them. Yet, there are
many different ways to compute effectiveness
Experiment
• Collect user preferences between two systems
• Map DCG onto P(Sat)
• Map ΔDCG onto P(Pref)
• Music recommendation task
• Ad-hoc, informational, enjoyable by assessors
• Preferences less confounded by interface effects
• All data from MIREX (TREC-like for Music IR)
• Datasets from 2007–2012
• 3-point relevance scale: {0, 1, 2}
• 4115 examples
• Uniformly covering the [0,1] range of ΔDCG
• 432 unique queries
• 5636 unique documents
• Crowdsourced with Crowdflower
• Trap examples for quality control
• 113 subjects
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Zipfian discount
Difference in DCG
Probabilitythatusersagree
Gains
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)
Bin(2)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(2) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(3) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(5) discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Constant discount
Difference in DCG
Probabilitythatusersagree
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Zipfian discount
DCG
Probabilityofusersatisfaction
Gains
Linear
Exp(2)
Exp(3)
Exp(5)
Bin(1)
Bin(2)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(2) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(3) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Log(5) discount
DCG
Probabilityofusersatisfaction
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Constant discount
DCG
Probabilityofusersatisfaction
Results (1 system): DCG predicting user satisfaction Results (2 systems): ΔDCG predicting user preference
Results: bias of Gain and Discount functions
• Diagonal: how far is P(Sat|DCG) from the ideal diagonal?
• Intuitiveness of DCG scores
• Endpoint: how far is P(Sat|DCG) from the ideal 0% and 100%?
• User disagreement and goodness of the DCG user model
• Top: how far is P(Pref|ΔDCG) from the ideal 100%?
• Discriminative power
Summary and Implications
• New method to map system effectiveness onto user satisfaction
• Sample application to DCG for a music recommendation task
• Gain functions that emphasize highly relevant documents underestimate
user satisfaction. Linear gain is better than exponential
• All discount functions bias the prediction of user satisfaction
• This task might be too enjoyable to observe discount effect
• Size (of the DCG difference) does matter
• Non-parametric statistics (eg. Sign test, Wilcoxon test) and just looking at
the ranking of systems (eg. Kendall τ) oversimplify the evaluation problem
• Zero-point null hypothesis testing (ie. H0 : ΔDCG=0) is not reasonable
• Future work will investigate this method for Text IR
• Provide a common framework, based on P(Sat) and P(Pref), to evaluate
with informational and navigational queries using appropriate measures
Data and code
available online
‫݇@ܩܥܦ‬ ൌ
∑ ݃ ‫ݎ‬௜ ⋅ ݀ ݅	; ݇௞
௜ୀଵ
∑ ݃ ‫ݎ‬௠௔௫ ⋅ ݀ ݅	; ݇௞
௜ୀଵ
0.060.100.14
Diagonal bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.060.100.14
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.060.100.14
Gain Discount
0.140.180.22
Endpoint bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.140.180.22
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.140.180.22
Gain Discount
0.460.500.54
Top bias
Bias
Bin(2)
Bin(1)
Exp(5)
Exp(3)
Exp(2)
Linear
0.460.500.54
Zipfian
Linear
Log(2)
Log(3)
Log(5)
Constant
0.460.500.54
Gain Discount

Weitere ähnliche Inhalte

Andere mochten auch

A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...Julián Urbano
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...Julián Urbano
 
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)Suneal Saini
 
Hplc presentation final
Hplc presentation    finalHplc presentation    final
Hplc presentation finalOvesh Gaikwad
 
Hplc presentation for class
Hplc presentation for classHplc presentation for class
Hplc presentation for classDr. Ravi Sankar
 
HPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and ApplicationHPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and ApplicationAlakesh Pradhan
 
Water Pollution2 By Meenaxi & Shradha
Water Pollution2  By Meenaxi & ShradhaWater Pollution2  By Meenaxi & Shradha
Water Pollution2 By Meenaxi & Shradhasubzero64
 
Principles and application of chromatography
Principles and application of chromatographyPrinciples and application of chromatography
Principles and application of chromatographysuniu
 
HPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid ChromatographyHPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid ChromatographyDivya Basuti
 
Lead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to GuideLead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to GuideSlideShare
 

Andere mochten auch (13)

A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...A Comparison of the Optimality of Statistical Significance Tests for Informat...
A Comparison of the Optimality of Statistical Significance Tests for Informat...
 
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
MIREX 2010 Symbolic Melodic Similarity: Local Alignment with Geometric Repres...
 
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
HIGH PERFORMANCE LIQUID CHROMATOGRAPHY (HPLC)
 
Hplc presentation final
Hplc presentation    finalHplc presentation    final
Hplc presentation final
 
HPLC
HPLCHPLC
HPLC
 
Hplc presentation for class
Hplc presentation for classHplc presentation for class
Hplc presentation for class
 
HPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and ApplicationHPLC Principle,Instrumentation and Application
HPLC Principle,Instrumentation and Application
 
Water Pollution2 By Meenaxi & Shradha
Water Pollution2  By Meenaxi & ShradhaWater Pollution2  By Meenaxi & Shradha
Water Pollution2 By Meenaxi & Shradha
 
Hplc
HplcHplc
Hplc
 
Principles and application of chromatography
Principles and application of chromatographyPrinciples and application of chromatography
Principles and application of chromatography
 
HPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid ChromatographyHPLC - High Performance Liquid Chromatography
HPLC - High Performance Liquid Chromatography
 
Environmental pollution
Environmental pollutionEnvironmental pollution
Environmental pollution
 
Lead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to GuideLead Generation on SlideShare: A How-to Guide
Lead Generation on SlideShare: A How-to Guide
 

Ähnlich wie How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction?

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic AlgorithmsDr. C.V. Suresh Babu
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Zbigniew Jerzak
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingAdam Doyle
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceHansol Kang
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning FrameworksTakeo Imai
 
DSDT Meetup October 2017
DSDT Meetup October 2017DSDT Meetup October 2017
DSDT Meetup October 2017DSDT_MTL
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Amplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDAmplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDCPqD
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Sara Granados Cabeza
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
 

Ähnlich wie How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction? (20)

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Relational algebra
Relational algebraRelational algebra
Relational algebra
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
 
DSDT Meetup October 2017
DSDT Meetup October 2017DSDT Meetup October 2017
DSDT Meetup October 2017
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
 
Dsdt meetup oct24
Dsdt meetup oct24Dsdt meetup oct24
Dsdt meetup oct24
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Software metrics
Software metricsSoftware metrics
Software metrics
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
ga-2.ppt
ga-2.pptga-2.ppt
ga-2.ppt
 
Amplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDAmplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqD
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
Advanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITIAdvanced Analytics in Banking, CITI
Advanced Analytics in Banking, CITI
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 

Mehr von Julián Urbano

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Julián Urbano
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowJulián Urbano
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationJulián Urbano
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationJulián Urbano
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured DocumentsJulián Urbano
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackJulián Urbano
 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...Julián Urbano
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Julián Urbano
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Julián Urbano
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityJulián Urbano
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityJulián Urbano
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...Julián Urbano
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Julián Urbano
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...Julián Urbano
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Julián Urbano
 
Audio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and StabilityAudio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and StabilityJulián Urbano
 
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...Julián Urbano
 
Improving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered ListsImproving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered ListsJulián Urbano
 

Mehr von Julián Urbano (20)

Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
Your PhD and You
Your PhD and YouYour PhD and You
Your PhD and You
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
The Treatment of Ties in AP Correlation
The Treatment of Ties in AP CorrelationThe Treatment of Ties in AP Correlation
The Treatment of Ties in AP Correlation
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Crawling the Web for Structured Documents
Crawling the Web for Structured DocumentsCrawling the Web for Structured Documents
Crawling the Web for Structured Documents
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing TrackThe University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
 
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
What is the Effect of Audio Quality on the Robustness of MFCCs and Chroma Fea...
 
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
Evaluation in (Music) Information Retrieval through the Audio Music Similarit...
 
Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)Symbolic Melodic Similarity (through Shape Similarity)
Symbolic Melodic Similarity (through Shape Similarity)
 
Evaluation in Audio Music Similarity
Evaluation in Audio Music SimilarityEvaluation in Audio Music Similarity
Evaluation in Audio Music Similarity
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...How Significant is Statistically Significant? The case of Audio Music Similar...
How Significant is Statistically Significant? The case of Audio Music Similar...
 
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
Towards Minimal Test Collections for Evaluation of Audio Music Similarity and...
 
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track: Noteboo...
 
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Mu...
 
Audio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and StabilityAudio Music Similarity and Retrieval: Evaluation Power and Stability
Audio Music Similarity and Retrieval: Evaluation Power and Stability
 
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
Bringing Undergraduate Students Closer to a Real-World Information Retrieval ...
 
Improving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered ListsImproving the Generation of Ground Truths based on Partially Ordered Lists
Improving the Generation of Ground Truths based on Partially Ordered Lists
 

Kürzlich hochgeladen

G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 

Kürzlich hochgeladen (20)

G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 

How Do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction?

  • 1. How do Gain and Discount Functions Affect the Correlation between DCG and User Satisfaction? Julián Urbano Mónica Marrero ECIR 2015 Vienna, March 30th Discount d(i ; k) Gain g(r) Zipfian: 1/݅ Linear: ‫ݎ‬ Linear: ሺ݇ ൅ ݅ െ 1ሻ/݇ Exp(2): 2௥ െ 1 Constant: 1 Exp(3): 3௥ െ 1 Log(2): 1/ logଶ ݅ ൅ 1 Exp(5): 5௥ െ 1 Log(3): 1/ logଷ ݅ ൅ 2 Bin(1): Iሾ‫ݎ‬ ൒ 1ሿ Log(5): 1/ logହሺ݅ ൅ 4ሻ Bin(2): Iሾ‫ݎ‬ ൒ 2ሿ Discount functions Rank i Discountd(i) 0.00.20.40.60.81.0 1 2 3 4 5 Zipfian Linear Constant Log(2) Log(3) Log(5) Gain functions Relevance r Gaing(r) 0510152025 0 1 2 Linear Exp(2) Exp(3) Exp(5) Bin(1)Bin(2) Documents Information Need Real World Cranfield IR System Topic Relevance Judgments IR System Documents GAP DCG ERR Static Component Dynamic Component Test Collection Effectiveness Measures Time to complete task, Idle time, Success rate, Frustration, Satisfaction, Ease of use, Ease of learning… Precision, Average Precision, Reciprocal Rank, Q-measure, Discounted Cumulative Gain, Rank-Biased Precision, Time-Biased Gain… Live Observation What Gain and Discount for DCG are better to predict user satisfaction? • First, let’s normalize DCG scores (this is not nDCG!) • One system with DCG=φ. What does it mean? • Intuition: φ·100% of users will be satisfied • P(Sat|DCG= φ)= φ • Two systems with ΔDCG=Δφ. What does it mean? • Intuition: users will prefer the (supposedly) better one • P(Pref|ΔDCG=Δφ)=1 P(Sat) and P(Pref) depend on the systems, not on how we evaluate them. Yet, there are many different ways to compute effectiveness Experiment • Collect user preferences between two systems • Map DCG onto P(Sat) • Map ΔDCG onto P(Pref) • Music recommendation task • Ad-hoc, informational, enjoyable by assessors • Preferences less confounded by interface effects • All data from MIREX (TREC-like for Music IR) • Datasets from 2007–2012 • 3-point relevance scale: {0, 1, 2} • 4115 examples • Uniformly covering the [0,1] range of ΔDCG • 432 unique queries • 5636 unique documents • Crowdsourced with Crowdflower • Trap examples for quality control • 113 subjects 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Zipfian discount Difference in DCG Probabilitythatusersagree Gains Linear Exp(2) Exp(3) Exp(5) Bin(1) Bin(2) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(2) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(3) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(5) discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Constant discount Difference in DCG Probabilitythatusersagree 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Zipfian discount DCG Probabilityofusersatisfaction Gains Linear Exp(2) Exp(3) Exp(5) Bin(1) Bin(2) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(2) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(3) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Log(5) discount DCG Probabilityofusersatisfaction 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Constant discount DCG Probabilityofusersatisfaction Results (1 system): DCG predicting user satisfaction Results (2 systems): ΔDCG predicting user preference Results: bias of Gain and Discount functions • Diagonal: how far is P(Sat|DCG) from the ideal diagonal? • Intuitiveness of DCG scores • Endpoint: how far is P(Sat|DCG) from the ideal 0% and 100%? • User disagreement and goodness of the DCG user model • Top: how far is P(Pref|ΔDCG) from the ideal 100%? • Discriminative power Summary and Implications • New method to map system effectiveness onto user satisfaction • Sample application to DCG for a music recommendation task • Gain functions that emphasize highly relevant documents underestimate user satisfaction. Linear gain is better than exponential • All discount functions bias the prediction of user satisfaction • This task might be too enjoyable to observe discount effect • Size (of the DCG difference) does matter • Non-parametric statistics (eg. Sign test, Wilcoxon test) and just looking at the ranking of systems (eg. Kendall τ) oversimplify the evaluation problem • Zero-point null hypothesis testing (ie. H0 : ΔDCG=0) is not reasonable • Future work will investigate this method for Text IR • Provide a common framework, based on P(Sat) and P(Pref), to evaluate with informational and navigational queries using appropriate measures Data and code available online ‫݇@ܩܥܦ‬ ൌ ∑ ݃ ‫ݎ‬௜ ⋅ ݀ ݅ ; ݇௞ ௜ୀଵ ∑ ݃ ‫ݎ‬௠௔௫ ⋅ ݀ ݅ ; ݇௞ ௜ୀଵ 0.060.100.14 Diagonal bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.060.100.14 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.060.100.14 Gain Discount 0.140.180.22 Endpoint bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.140.180.22 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.140.180.22 Gain Discount 0.460.500.54 Top bias Bias Bin(2) Bin(1) Exp(5) Exp(3) Exp(2) Linear 0.460.500.54 Zipfian Linear Log(2) Log(3) Log(5) Constant 0.460.500.54 Gain Discount