SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Graded Relevance Assessments
and Graded Relevance
Measures of NTCIR
Tetsuya Sakai
Waseda University
tetsuyasakai@acm.org
10th June, 2019@EVIA2019/NTCIR-14, Tokyo.
http://sakailab.com/ntcirbookdraft/
TALK OUTLINE
1. NTCIR and me
2. Survey of NTCIR overviews (1999-2019)
3. Q-measures etc.
4. D-measures etc.
5. Beyond graded relevance
6. Summary
NTCIR-1, -2, -3 (1999-2003)
• Sakai, T., Shibazaki, Y., Suzuki, M., Kajiura, M.,
Manabe, T. and Sumita, K.: Cross-Language
Information Retrieval for NTCIR at Toshiba,
Proceedings of NTCIR-1, 1999.
• Sakai, T., Robertson, S.E. and Walker, S.: Flexible
Pseudo-Relevance Feedback for NTCIR-2,
Proceedings of NTCIR-2, 2001.
• Sakai, T., Koyama, M., Suzuki, M. and Manabe, T.:
Toshiba KIDS at NTCIR-3: Japanese and English-
Japanese IR, Proceedings of NTCIR-3, 2003.
1 paper per NTCIR
NTCIR-4 (2004)
• Sakai, T., Koyama, M., Kumano, A. and Manabe, T.:
Toshiba BRIDJE at NTCIR-4 CLIR:
Monolingual/Bilingual IR and Flexible Feedback,
Proceedings of NTCIR-4, 2004.
• Sakai, T., Saito, Y., Ichimura, Y., Koyama, M. and
Kokubu, T.: Toshiba ASKMi at NTCIR-4 QAC2,
Procedings of NTCIR-4, 2004.
• Sakai, T.: New Performance Metrics based on
Multigrade Relevance: Their Application to
Question Answering, Proceedings of NTCIR-4
Proceedings (Open Submission Session), 2004.
Q-measure
This later evolved into EVIA
3 papers
NTCIR-5 (2005)
• Kokubu, T., Sakai, T., Saito, Y., Tsutsui, H., Manabe, T.,
Koyama, M. and Fujii, H.: The Relationship between
Answer Ranking and User Satisfaction in a Question
Answering System, Proceedings of NTCIR-5 (Open
Submission Session), 2005.
• Sakai, T.: The Effect of Topic Sampling on Sensitivity
Comparisons of Information Retrieval Metrics,
Proceedings of NTCIR-5 (Open Submission Session),
2005.
• Sakai, T., Manabe, T., Kumano, A., Koyama, M. and
Kokubu, T.: Toshiba BRIDJE at NTCIR-5: Evaluation using
Geometric Means, Proceedings of NTCIR-5, 2005.
3 papers
NTCIR-6 (2007)
• Sakai, T.: On Penalising Late Arrival of Relevant
Documents in Information Retrieval Evaluation with
Graded Relevance, Proceedings of EVIA 2007.
• Sakai, T.: User Satisfaction Task: A Proposal for
NTCIR-7, Proceedings of EVIA 2007.
• Sakai, T., Koyama, M., Izuha, T., Kumano, A.,
Manabe, T. and Kokubu, T.: Toshiba BRIDJE at
NTCIR-6 CLIR: The Head/Lead Method and Graded
Relevance Feedback, Proceedings of NTCIR-6, 2007.
3 papers
NTCIR-7 (2008)
• Sakai, T. and Robertson, S.: Modelling A User Population for
Designing Information Retrieval Metrics, Proceedings of
EVIA 2008.
• Sakai, T. and Kando, N.: Are Popular Documents More Likely
To Be Relevant? A Dive into the ACLIA IR4QA Pools,
Proceedings of EVIA 2008.
• Mitamura, T., Nyberg, E., Shima, H., Kato, T., Mori, T., Lin, C.-
Y., Song, R., Lin, C.-J., Sakai, T., Ji, D. and Kando, N.: Overview
of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual
Information Access, Proceedings of NTCIR-7, 2008.
• Sakai, T., Kando, N., Lin, C.-J., Mitamura, T., Shima, H., Ji, D.,
Chen, K.-H., and Nyberg, E.: Overview of the NTCIR-7 ACLIA
IR4QA Task, Proceedings of NTCIR-7, 2008.
NCU
Debut as a task
organiser
4 papers
NTCIR-8 (2010)
• Song, R., Qi, D., Liu, H., Sakai, T., Nie, J.-Y., Hon, H.-W. and Yu, Y.: Constructing a Test Collection
with Multi-Intent Queries, Proceedings of EVIA 2010.
• Sakai, T., Craswell, N., Song, R., Robertson, S., Dou, Z. and Lin, C.-Y.: Simple Evaluation Metrics for
Diversified Search Results, Proceedings of EVIA 2010.
• Sakai, T. and Lin, C.-Y.: Ranking Retrieval Systems without Relevance Assessments ? Revisited,
Proceedings of EVIA 2010.
• Mitamura, T., Shima, H., Sakai, T., Kando, N., Mori, T., Takeda, K., Lin, C.-Y., Song, R., Lin, C.-J. and
Lee, C.-W.: Overview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Access,
Proceedings of NTCIR-8, 2010.
• Sakai, T., Shima, H., Kando, N., Song, R., Lin, C.-J., Mitamura, T., Sugimoto, M. and Lee, C.-W.:
Overview of NTCIR-8 ACLIA IR4QA, Proceedings of NTCIR-8, 2010.
• Gey, F., Larson, R., Kando, N., Machado, J. and Sakai, T.: NTCIR-GeoTime Overview: Evaluating
Geographic and Temporal Search, Proceedings of NTCIR-8, 2010.
• Ishikawa, D., Sakai, T. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part I):
The Test Collection and the Task, Proceedings of NTCIR-8, 2010.
• Sakai, T., Ishikawa, D. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part II):
System Evaluation, Proceedings of NTCIR-8, 2010.
• Song, Y.-I., Liu, J., Sakai, T., Wang, X.-J., Feng, G., Cao, Y., Suzuki, H. and Lin, C.-Y.: Microsoft
Research Asia with Redmond at the NTCIR-8 Community QA Pilot Task, Proceedings of NTCIR-8,
2010.
D-measures
9 papers
NTCIR-9 (2011)
• Ishikawa, D., Kando, N. and Sakai, T.: What Makes a Good Answer in Community
Question Answering? An Analysis of Assessors' Criteria, Proceedings of EVIA
2011.
• Song, R., Zhang, M., Sakai, T., Kato, M.P., Liu, Y., Sugimoto, M., Wang, Q. and Orii,
N.: Overview of the NTCIR-9 INTENT Task, Proceedings of NTCIR-9, 2011.
• Sakai, T., Kato, M.P. and Song, Y.-I.: Overview of NTCIR-9 1CLICK, Proceedings of
NTCIR-9, 2011.
• Orii, N., Song, Y.-I. and Sakai, T.: Microsoft Research Asia at the NTCIR-9 1CLICK
Task, Proceedings of NTCIR-9, 2011.
• Han, J., Wang, Q., Orii, N., Dou, Z., Sakai. T. and Song, R.: Microsoft Research
Asia at the NTCIR-9 Intent Task, Proceedings of NTCIR-9, 2011.
• Morita, H., Makino, T., Sakai, T., Takamura, H. and Okumura, M.: TTOKU
Summarization Based Systems at NTCIR-9 1CLICK Task, Proceedings of NTCIR-9,
2011.
• Joho, H. and Sakai, T.: Grid-based Interaction for NTCIR-9 VisEx Task, Proceedings
of NTCIR-9, 2011.
7 papers
NTCIR-10 (2013)
• Sakai, T.: The Unreusability of Diversified Search Test
Collections, Proceedings of EVIA 2013.
• Sakai, T., Dou, Z., Yamamoto, T., Liu, Y., Zhang, M., Song,
R., Kato, M.P. and Iwata, M.: Overview of the NTCIR-10
INTENT-2 Task, Proceedings of NTCIR-10, 2013.
• Kato, M.P., Ekstrand-Abueg, M., Pavlu, V., Sakai, T.,
Yamamoto, T. and Iwata, M.: Overview of the NTCIR-10
1CLICK-2 Task, Proceedings of NTCIR-10, 2013.
• Tsukuda, K., Dou, Z. and Sakai, T.: Microsoft Research
Asia at the NTCIR-10 Intent Task, Proceedigns of NTCIR-
10, 2013.
• Narita, K., Sakai, T., Dou, Z. and Song, Y.-I.: MSRA at
NTCIR-10 1CLICK-2, Proceedings of NTCIR-10, 2013.
5 papers
NTCIR-11 (2014)
• Sakai, T.: Topic Set Size Design with Variance
Estimates from Two-Way ANOVA, Proceedings of
EVIA 2014.
• Kato, M.P., Ekstrand-Abueg, M., Pavlu, V., Sakai, T.,
Yamamoto, T. and Iwata, M.: Overview of the
NTCIR-11 MobileClick Task, Proceedings of NTCIR-
11, 2014.
Joined Waseda in September 2013
2 papers
NTCIR-12 (2016)
• Sakai, T. and Shang, L: On Estimating Variances for Topic Set Size Design, Proceedings of EVIA
2016.
• Kato, M.P., Pavlu, V., Sakai, T., Yamamoto, T. and Morita, H.: Two-layered Summaries for Mobile
Search: Does the Evaluation Measure Reflect User Preferences?, Proceedings of EVIA 2016.
• Shang, L., Sakai, T., Lu, Z., Li, H., Higashinaka, R. and Miyao, Y.: Overview of the NTCIR-12 Short
Text Conversation Task, Proceedings of NTCIR-12, 2016.
• Kato, M.P., Sakai, T., Yamamoto, T., Pavlu, V., Morita, H. and Fujita, S.: Overview of the NTCIR-12
MobileClick Task, Proceedings of NTCIR-12, 2016.
• Nanba, H., Sakai, T., Kando, N., Keyaki, A., Eguchi, K., Hatano, K., Shimizu, T., Hirate, Y. and Fujii,
A.: NEXTI at NTCIR-12 IMine-2 Task, Proceedings of NTCIR-12, 2016.
• Higuchi, S. and Sakai, T.: SLQAL at the NTCIR-12 QALab-2 Task, Proceedings of NTCIR-12, 2016.
• Denawa, H., Sano, T., Kadotami, Y., Kato, S. and Sakai, T.: SLSTC at the NTCIR-12 STC Task,
Proceedings of NTCIR-12, 2016.
• Iijima, S. and Sakai, T.: SLLL at the NTCIR-12 Lifelog Task: Sleepflower and the LIT Subtask,
Proceedings of NTCIR-12
My students’
debut at
NTCIR
8 papers
NTCIR-13 (2017)
• Shang, L., Sakai, T., Li, H., Higashinaka, R., Miyao, Y., Arase, Y., and Nomoto,M.: Overview of the NTCIR-13 Short
Text Conversation Task, Proceedings of NTCIR-13, 2017.
• Luo, C., Sakai, T., Liu, Y., Dou, Z., Xiong, C., and Xu, J.: Overview of the NTCIR-13 We Want Web Task,
Proceedings of NTCIR-13, 2017.
• Kashimura, R. and Sakai, T.: SLOLQ at the NTCIR-13 OpenLiveQ Task, Proceedings of NTCIR-13, 2017.
• Sato, K. and Sakai, T.: SLQAL at the NTCIR-13 QA Lab-3 Task, Proceedings of NTCIR-13, 2017.
• Guan, J. and Sakai, T.: SLSTC at the NTCIR-13 STC Task, Proceedings of NTCIR-13, 2017.
• Xiao, P., Li, L., Fan, Y., and Sakai, T.: SLWWW at the NTCIR-13 WWW Task, Proceedings of NTCIR-13, 2017.
• Zeng, Z., Luo, C., Shang, L., Li, H., and Sakai, T.: Test Collections and Measures for Evaluating Customer-
Helpdesk Dialogues, Proceedings of EVIA 2017.
• Sakai, T.: Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths, Proceedings of EVIA
2017.
• Sakai, T.: Towards Automatic Evaluation of Multi-Turn Dialogues: A Task Design that Leverages Inherently
Subjective Annotations, Proceedings of EVIA 2017.
• Sakai, T.: The Effect of Inter-Assessor Disagreement on IR System Evaluation: A Case Study with Lancers and
Students, Proceedings of EVIA 2017.
• Sakai, T.: Unanimity-Aware Gain for Highly Subjective Assessments, Proceedings of EVIA 2017.
11 papers
NTCIR-14 (2019)
• Sakai, T., Ferro, N., Soboroff, I., Zeng, Z., Xiao, P., and Maistro,
M.: Overview of the NTCIR-14 CENTRE Task, Proceedings of
NTCIR-14, 2019.
• Mao, J., Sakai, T., Luo, C., Xiao, P., Liu, Y., and Dou, Z.:
Overview of the NTCIR-14 We Want Web Task, Proceedings
of NTCIR-14, 2019.
• Zeng, Z., Kato, S., and Sakai, T.: Overview of the NTCIR-14
Short Text Conversation Task: Dialogue Quality and Nugget
Detection Subtasks, Proceedings of NTCIR-14, 2019.
• Kato, S., Suzuki, R., Zeng, Z., and Sakai, T.: SLSTC at the
NTCIR-14 STC-3 Dialogue Quality and Nugget Detection
Subtasks, Proceedings of NTCIR-14, 2019.
• Xiao, P. and Sakai, T.: SLWWW at the NTCIR-14 We Want
Web Task, Proceedings of NTCIR-14, 2019.
For the first time, I don’t have a paper at EVIA!
5 papers?
Or so I thought...
• Oard, D.W., Sakai, T., and Kando, N.: Celebrating 20
Years of NTCIR: The Book, Proceedings of EVIA 2019.
TALK OUTLINE
1. NTCIR and me
2. Survey of NTCIR overviews (1999-2019)
3. Q-measures etc.
4. D-measures etc.
5. Beyond graded relevance
6. Summary
[Harman05] (The TREC book)
“Relevance was defined within the task
of the information analyst, with TREC
assessors instructed to judge a document
relevant if information from that
document would be used in some
manner for the writing of a report on the
subject of the topic. This also implies the
use of binary relevance judgments;”
NTCIR overviews (1999-2019)
survey method
• Examined all overview papers (for tasks that
involved ranked retrieval only)
• Examined how many relevance levels were used
and how they were obtained in each task (ALL
NTCIR retrieval tasks use graded relevance levels!)
• Examined whether graded relevance measures
were used to evaluate the participating systems.
IF you want (a) > (b) > (c), then you
should use graded relevance
measures.
Relevant
Partially relevant
Partially relevant
Nonrelevant
(a)
Partially relevant
Partially relevant
Relevant
Nonrelevant
Nonrelevant
Nonrelevant
Relevant
Nonrelevant
(b) (c)
IF you want (a) > (b) > (c),
“relaxed relevance” doesn’t work.
Relevant
Partially relevant
Partially relevant
Nonrelevant
(a)
Partially relevant
Partially relevant
Relevant
Nonrelevant
Nonrelevant
Nonrelevant
Relevant
Nonrelevant
(b) (c)
Considered equally effective
IF you want (a) > (b) > (c),
“rigid relevance” doesn’t work.
Relevant
Partially relevant
Partially relevant
Nonrelevant
(a)
Partially relevant
Partially relevant
Relevant
Nonrelevant
Nonrelevant
Nonrelevant
Relevant
Nonrelevant
(b) (c)
Considered equally effective
Tasks that used only binary relevance measures
Tasks that used grade relevance measures (1)
Tasks that used grade relevance measures (2)
TALK OUTLINE
1. NTCIR and me
2. Survey of NTCIR overviews (1999-2019)
3. Q-measures etc.
4. D-measures etc.
5. Beyond graded relevance
6. Summary
Normalised Cumulative Utility (1)
[Sakai+Robertson EVIA08]
:
r
1
2
3
:
Population of
users who scan
the ranked list
Normalised Cumulative Utility (2)
:
r
1
2
3
:
Stopping probability at r
Users who abandon the list at r=1
Users who abandon the list at r=3
Normalised Cumulative Utility (3)
:
r
1
2
3
:
Measure utility of
this doc for this user
group
Measure utility of
these docs for this
user group
Utility at r
NCU is “expected utility”
AP is an NCU (1)
• Suppose R=3 relevant docs are known.
Nonrelevant
Relevant
Nonrelevant
Relevant
33% of
users
33% of
users Nonrelevant
Stopping
probability
distribution:
uniform
over
relevant
docs
33% of
users
Retrieved
Not retrieved
Relevant
AP is an NCU (2)
• Suppose R=3 relevant docs are known.
Nonrelevant
Relevant
Nonrelevant
Relevant
33% of
users
33% of
users Nonrelevant
Prec(2)
= 1/2
Prec(5)
= 2/5
AP
= ( Prec(2) + Prec(5) + 0 ) / 3
= 0.300
Q-measure is an NCU (1)
• Suppose R=3 relevant (1 highly rel, 2 partially rel)
docs are known.
Nonrelevant
Highly rel: 3
Nonrelevant
Partially rel: 1
33% of
users
33% of
users Nonrelevant
Stopping
probability
distribution:
uniform
over
relevant
docs
33% of
users
Retrieved
Not retrieved
Partially rel: 1
Q-measure is an NCU (2)
• Suppose R=3 relevant (1 highly rel, 2 partially rel)
docs are known.
Nonrelevant
Highly rel: 3
Nonrelevant
Partially rel: 1
33% of
users
33% of
users Nonrelevant
BR(2)
= 4/6
BR(5)
= 6/10
Q
= ( BR(2) + BR(5) + 0 ) / 3
= 0.422
Q generalizes AP by
using the Blended Ratio
instead of Prec as Utility
BR combines Prec and Normalised
Cumulative Gain (1)
• Suppose R=3 relevant (1 highly rel, 2 partially rel)
docs are known.
Nonrelevant
Highly rel: 3
Nonrelevant
Partially rel: 1
Nonrelevant
Prec(2)
= 1/2
Highly rel: 3
Partially rel: 1
Partially rel: 1
Ideal list
cg(r) cg*(r)
Cumulative gain
0
3
3
3
4
3
4
5
5
5
BR(2)
= (1+3)/(2+4)
= 4/6
with β=1
BR combines Prec and Normalised
Cumulative Gain (2)
• Suppose R=3 relevant (1 highly rel, 2 partially rel)
docs are known.
Nonrelevant
Highly rel: 3
Nonrelevant
Partially rel: 1
Nonrelevant Prec(5)
= 2/5
Highly rel: 3
Partially rel: 1
Partially rel: 1
Ideal list
cg(r) cg*(r)
Cumulative gain
0
3
3
3
4
3
4
5
5
5
BR(5)
= (2+4)/(5+5)
= 6/10
with β=1
Patience parameter β of BR
(binary relevance environment)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
β=0.1
β=1
β=10
r1 <= R ⇒
BR(r1)=(1+β)/(r1+βr1)=1/r1
r1 > R ⇒
BR(r1)=(1+β)/(r1+βR)
r1 : rank of the 1st relevant doc
Large β ⇒ more
tolerance to relevant
docs at low ranks
BR(r1) R=5
TALK OUTLINE
1. NTCIR and me
2. Survey of NTCIR overviews (1999-2019)
3. Q-measures etc.
4. D-measures etc.
5. Beyond graded relevance
6. Summary
Diversified search
• Given an ambiguous/underspecified query, produce a
single Search Engine Result Page that satisfies
different user intents!
• Challenge: balancing relevance and diversity
SERP(SearchEngineResultPage)
Highly relevant
near the top
Give more
space to
popular intents?
Give more space
to informational
intents?
Cover many
intents
Approaches to evaluating
diversified search
• α-nDCG [Clarke+SIGIR08]
• Intent-Aware measures [Agrawal+WSDM09,
Chapelle+IR11]
(1) Compute a measure for each intent
(2) Combine the measures using intent probabilities as
weights
• D(#)-measures [Sakai+EVIA10,Sakai+SIGIR11]
(1) Combine intentwise graded relevance with intent
probabilities to compute the gain of each document
(2) Construct an ideal list based on the gain, and then
compute a graded relevance measure based on it
D-measures (1)
Intent i:
“harry potter
books”
Pr(i|q) = 0.7
Partially rel:1
Highly rel:3
Perfect:7
Nonrel:0
Partially rel:1 Partially rel:1
Reldoc1
Reldoc2
Reldoc3
Per-intent gain values
gi gj
Intent j:
“pottermore.com”
Pr(j|q) = 0.3
R = 3 relevant
documents
2 intents
D-measures (2)
Reldoc1
Reldoc2
Reldoc3
0.7*1+0.3*7=2.8
0.7*1+0.3*1=1.0
0.7*3+0.3*0=2.1
D-DCG*
= 2.8 + 2.1/log2(2+1) +1.0/log2(3+1)
= 4.62
Per-intent gain values
gi gj
R = 3 relevant
documents
2 intents
Intent i:
“harry potter
books”
Pr(i|q) = 0.7
Intent j:
“pottermore.com”
Pr(j|q) = 0.3
Ideal list based on
global gains
Pr(i|q) gi + Pr(j|q) gj
Partially rel:1
Highly rel:3
Perfect:7
Nonrel:0
Partially rel:1 Partially rel:1
D-measures (3)
nonrel
nonrel
2.1
nonrel
Reldoc1
Reldoc2
Reldoc3Reldoc2
Ideal list based on
global gains
Pr(i|q) gi + Pr(j|q) gj
D-DCG
= 2.1/log2(3+1)
= 1.05
D-DCG*
= 4.62
D-nDCG =
D-DCG/D-DCG*
= 0.23
Per-intent gain values
gi gj
SERP to be
evaluated
R = 3 relevant
documents
2 intents
Intent i:
“harry potter
books”
Pr(i|q) = 0.7
Intent j:
“pottermore.com”
Pr(j|q) = 0.3
Partially rel:1
Highly rel:3
Perfect:7
Nonrel:0
Partially rel:1 Partially rel:1
0.7*1+0.3*7=2.8
0.7*1+0.3*1=1.0
0.7*3+0.3*0=2.1
Intent recall (aka
subtopic recall [Zhai03] )
I-rec =
#intents covered by SERP / #intents
= 1/2
nonrel
nonrel
nonrel
Reldoc2
Per-intent gain values
gi gj
R = 3 relevant
documents
2 intents
Reldoc1
Reldoc2
Reldoc3Only Intent i is
covered by SERP
Intent i:
“harry potter
books”
Pr(i|q) = 0.7
Intent j:
“pottermore.com”
Pr(j|q) = 0.3
SERP to be
evaluated
Partially rel:1
Highly rel:3
Perfect:7
Nonrel:0
Partially rel:1 Partially rel:1
D#-measure = Îł I-rec + (1-Îł) D-measure
D#-nDCG
contour
lines
Pure
diversity
Overall
relevance
Official results from the NTCIR-10 INTENT-2 task
So which adhoc/diversity measures are “good”?
https://waseda.box.com/sigir2019preprint
TALK OUTLINE
1. NTCIR and me
2. Survey of NTCIR overviews (1999-2019)
3. Q-measures etc.
4. D-measures etc.
5. Beyond graded relevance
6. Summary
Current approaches:
gold relevance labels
0 1
0 1
Assessors’
diverse ratings
0 1
0 1
Final
relevance
grade: 0.5
Final
relevance
grade: 0.5
New approaches:
gold distributions
0 1
0 1
Assessors’
diverse ratings
0 1
0 1
Use the distributions
directly for evaluation!
The gold data preserves
the diverse views of
users.
Please see the STC-3 overview AND
https://waseda.box.com/SIGIR2018preprint
TALK OUTLINE
1. NTCIR and me
2. Survey of NTCIR overviews (1999-2019)
3. Q-measures etc.
4. D-measures etc.
5. Beyond graded relevance
6. Summary
Summary
• Survey of NTCIR ranked retrieval tasks (1999-2019):
most of them utilise graded relevance measures,
but not all.
• If relevance grades are important for your task,
graded relevance measures should be used.
Converting graded relevance to binary relevance is
inadequate.
• Beyond relevance labels: utilise gold distributions
that preserve diverse views.
• THE NTCIR BOOK WILL BE OUT IN 2020 FROM
SPRINGER!
ALSO FROM SPRINGER…

Weitere ähnliche Inhalte

Ähnlich wie evia2019

Interactive System for Collaborative Historical Analogy
Interactive System for Collaborative Historical AnalogyInteractive System for Collaborative Historical Analogy
Interactive System for Collaborative Historical AnalogyRyo YOSHIKAWA
 
THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION; The Lived Experien...
THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION;  The Lived Experien...THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION;  The Lived Experien...
THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION; The Lived Experien...African Virtual University
 
Literature overview "OSS" and "Civic tech" 2017
Literature overview "OSS" and "Civic tech" 2017Literature overview "OSS" and "Civic tech" 2017
Literature overview "OSS" and "Civic tech" 2017Keiko Ono
 
What i think about when i conduct research in the society
What i think about when i conduct research in the societyWhat i think about when i conduct research in the society
What i think about when i conduct research in the societyMasaki Ito
 
Impact of Interstate Bus terminal on the Builtform of Residential Neighbourho...
Impact of Interstate Bus terminal on the Builtform of Residential Neighbourho...Impact of Interstate Bus terminal on the Builtform of Residential Neighbourho...
Impact of Interstate Bus terminal on the Builtform of Residential Neighbourho...Shivika Mehrotra
 
Innovation Ecosystem Transformation – Finnish Perspective
Innovation Ecosystem Transformation – Finnish PerspectiveInnovation Ecosystem Transformation – Finnish Perspective
Innovation Ecosystem Transformation – Finnish PerspectiveJukka Huhtamäki
 
Research Seminar
Research SeminarResearch Seminar
Research SeminarAther Nawaz
 
Urban Inquiries, RRI and Partnerships
Urban Inquiries, RRI and PartnershipsUrban Inquiries, RRI and Partnerships
Urban Inquiries, RRI and PartnershipsAlexandra Okada
 
Exploring classroom interaction with dynamic social network analysis
Exploring classroom interaction with dynamic social network analysisExploring classroom interaction with dynamic social network analysis
Exploring classroom interaction with dynamic social network analysisChristian Bokhove
 
NTCIR-12 task proposal: Short Text Conversation (STC)
NTCIR-12 task proposal: Short Text Conversation (STC)NTCIR-12 task proposal: Short Text Conversation (STC)
NTCIR-12 task proposal: Short Text Conversation (STC)Tetsuya Sakai
 
Urban Environmental Management (UEM) Students Research Summary
Urban Environmental Management (UEM) Students Research Summary Urban Environmental Management (UEM) Students Research Summary
Urban Environmental Management (UEM) Students Research Summary Dr.Choen Krainara
 
Kuniko 20150311 seameo
Kuniko 20150311 seameoKuniko 20150311 seameo
Kuniko 20150311 seameogatothp
 
Managing Research Collaboration: International, Inter-disciplinary, and Trans...
Managing Research Collaboration: International, Inter-disciplinary, and Trans...Managing Research Collaboration: International, Inter-disciplinary, and Trans...
Managing Research Collaboration: International, Inter-disciplinary, and Trans...Toru Oga
 
Visual Methodologies in Participatory ICT4D
Visual Methodologies in Participatory ICT4DVisual Methodologies in Participatory ICT4D
Visual Methodologies in Participatory ICT4DSara Vannini
 
Dsir2019(nishiyama)
Dsir2019(nishiyama)Dsir2019(nishiyama)
Dsir2019(nishiyama)Keita Nishiyama
 
Sakurai, et al., competing meanings of international experiences for research...
Sakurai, et al., competing meanings of international experiences for research...Sakurai, et al., competing meanings of international experiences for research...
Sakurai, et al., competing meanings of international experiences for research...Yusuke SAKURAI, PhD
 
Transiting tokyo
Transiting tokyoTransiting tokyo
Transiting tokyoSwatiThomas1
 
Contract cheating a view from three Calgary post secondary institutions
Contract cheating  a view from three Calgary post secondary institutionsContract cheating  a view from three Calgary post secondary institutions
Contract cheating a view from three Calgary post secondary institutionsUniversity of Calgary
 

Ähnlich wie evia2019 (20)

Interactive System for Collaborative Historical Analogy
Interactive System for Collaborative Historical AnalogyInteractive System for Collaborative Historical Analogy
Interactive System for Collaborative Historical Analogy
 
THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION; The Lived Experien...
THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION;  The Lived Experien...THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION;  The Lived Experien...
THE USE OF CLOUD COMPUTING SYSTEMS IN HIGHER EDUCATION; The Lived Experien...
 
Literature overview "OSS" and "Civic tech" 2017
Literature overview "OSS" and "Civic tech" 2017Literature overview "OSS" and "Civic tech" 2017
Literature overview "OSS" and "Civic tech" 2017
 
What i think about when i conduct research in the society
What i think about when i conduct research in the societyWhat i think about when i conduct research in the society
What i think about when i conduct research in the society
 
Impact of Interstate Bus terminal on the Builtform of Residential Neighbourho...
Impact of Interstate Bus terminal on the Builtform of Residential Neighbourho...Impact of Interstate Bus terminal on the Builtform of Residential Neighbourho...
Impact of Interstate Bus terminal on the Builtform of Residential Neighbourho...
 
Innovation Ecosystem Transformation – Finnish Perspective
Innovation Ecosystem Transformation – Finnish PerspectiveInnovation Ecosystem Transformation – Finnish Perspective
Innovation Ecosystem Transformation – Finnish Perspective
 
Research Seminar
Research SeminarResearch Seminar
Research Seminar
 
Urban Inquiries, RRI and Partnerships
Urban Inquiries, RRI and PartnershipsUrban Inquiries, RRI and Partnerships
Urban Inquiries, RRI and Partnerships
 
20200408 payal vaidya panel on acadmic rigor issip april8
20200408 payal vaidya panel on acadmic rigor issip april820200408 payal vaidya panel on acadmic rigor issip april8
20200408 payal vaidya panel on acadmic rigor issip april8
 
20140429 egu
20140429 egu20140429 egu
20140429 egu
 
Exploring classroom interaction with dynamic social network analysis
Exploring classroom interaction with dynamic social network analysisExploring classroom interaction with dynamic social network analysis
Exploring classroom interaction with dynamic social network analysis
 
NTCIR-12 task proposal: Short Text Conversation (STC)
NTCIR-12 task proposal: Short Text Conversation (STC)NTCIR-12 task proposal: Short Text Conversation (STC)
NTCIR-12 task proposal: Short Text Conversation (STC)
 
Urban Environmental Management (UEM) Students Research Summary
Urban Environmental Management (UEM) Students Research Summary Urban Environmental Management (UEM) Students Research Summary
Urban Environmental Management (UEM) Students Research Summary
 
Kuniko 20150311 seameo
Kuniko 20150311 seameoKuniko 20150311 seameo
Kuniko 20150311 seameo
 
Managing Research Collaboration: International, Inter-disciplinary, and Trans...
Managing Research Collaboration: International, Inter-disciplinary, and Trans...Managing Research Collaboration: International, Inter-disciplinary, and Trans...
Managing Research Collaboration: International, Inter-disciplinary, and Trans...
 
Visual Methodologies in Participatory ICT4D
Visual Methodologies in Participatory ICT4DVisual Methodologies in Participatory ICT4D
Visual Methodologies in Participatory ICT4D
 
Dsir2019(nishiyama)
Dsir2019(nishiyama)Dsir2019(nishiyama)
Dsir2019(nishiyama)
 
Sakurai, et al., competing meanings of international experiences for research...
Sakurai, et al., competing meanings of international experiences for research...Sakurai, et al., competing meanings of international experiences for research...
Sakurai, et al., competing meanings of international experiences for research...
 
Transiting tokyo
Transiting tokyoTransiting tokyo
Transiting tokyo
 
Contract cheating a view from three Calgary post secondary institutions
Contract cheating  a view from three Calgary post secondary institutionsContract cheating  a view from three Calgary post secondary institutions
Contract cheating a view from three Calgary post secondary institutions
 

Mehr von Tetsuya Sakai

NTCIR15WWW3overview
NTCIR15WWW3overviewNTCIR15WWW3overview
NTCIR15WWW3overviewTetsuya Sakai
 
ipsjifat201909
ipsjifat201909ipsjifat201909
ipsjifat201909Tetsuya Sakai
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overviewTetsuya Sakai
 
ecir2019tutorial-finalised
ecir2019tutorial-finalisedecir2019tutorial-finalised
ecir2019tutorial-finalisedTetsuya Sakai
 
ecir2019tutorial
ecir2019tutorialecir2019tutorial
ecir2019tutorialTetsuya Sakai
 
WSDM2019tutorial
WSDM2019tutorialWSDM2019tutorial
WSDM2019tutorialTetsuya Sakai
 
sigir2018tutorial
sigir2018tutorialsigir2018tutorial
sigir2018tutorialTetsuya Sakai
 
Evia2017unanimity
Evia2017unanimityEvia2017unanimity
Evia2017unanimityTetsuya Sakai
 
Evia2017assessors
Evia2017assessorsEvia2017assessors
Evia2017assessorsTetsuya Sakai
 
Evia2017dialogues
Evia2017dialoguesEvia2017dialogues
Evia2017dialoguesTetsuya Sakai
 
sigir2017bayesian
sigir2017bayesiansigir2017bayesian
sigir2017bayesianTetsuya Sakai
 
NL20161222invited
NL20161222invitedNL20161222invited
NL20161222invitedTetsuya Sakai
 
ICTIR2016tutorial
ICTIR2016tutorialICTIR2016tutorial
ICTIR2016tutorialTetsuya Sakai
 

Mehr von Tetsuya Sakai (20)

NTCIR15WWW3overview
NTCIR15WWW3overviewNTCIR15WWW3overview
NTCIR15WWW3overview
 
sigir2020
sigir2020sigir2020
sigir2020
 
ipsjifat201909
ipsjifat201909ipsjifat201909
ipsjifat201909
 
sigir2019
sigir2019sigir2019
sigir2019
 
assia2019
assia2019assia2019
assia2019
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
 
ecir2019tutorial-finalised
ecir2019tutorial-finalisedecir2019tutorial-finalised
ecir2019tutorial-finalised
 
ecir2019tutorial
ecir2019tutorialecir2019tutorial
ecir2019tutorial
 
WSDM2019tutorial
WSDM2019tutorialWSDM2019tutorial
WSDM2019tutorial
 
sigir2018tutorial
sigir2018tutorialsigir2018tutorial
sigir2018tutorial
 
Evia2017unanimity
Evia2017unanimityEvia2017unanimity
Evia2017unanimity
 
Evia2017assessors
Evia2017assessorsEvia2017assessors
Evia2017assessors
 
Evia2017dialogues
Evia2017dialoguesEvia2017dialogues
Evia2017dialogues
 
Evia2017wcw
Evia2017wcwEvia2017wcw
Evia2017wcw
 
sigir2017bayesian
sigir2017bayesiansigir2017bayesian
sigir2017bayesian
 
NL20161222invited
NL20161222invitedNL20161222invited
NL20161222invited
 
AIRS2016
AIRS2016AIRS2016
AIRS2016
 
Nl201609
Nl201609Nl201609
Nl201609
 
ictir2016
ictir2016ictir2016
ictir2016
 
ICTIR2016tutorial
ICTIR2016tutorialICTIR2016tutorial
ICTIR2016tutorial
 

KĂźrzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

KĂźrzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

evia2019

  • 1. Graded Relevance Assessments and Graded Relevance Measures of NTCIR Tetsuya Sakai Waseda University tetsuyasakai@acm.org 10th June, 2019@EVIA2019/NTCIR-14, Tokyo.
  • 3. TALK OUTLINE 1. NTCIR and me 2. Survey of NTCIR overviews (1999-2019) 3. Q-measures etc. 4. D-measures etc. 5. Beyond graded relevance 6. Summary
  • 4. NTCIR-1, -2, -3 (1999-2003) • Sakai, T., Shibazaki, Y., Suzuki, M., Kajiura, M., Manabe, T. and Sumita, K.: Cross-Language Information Retrieval for NTCIR at Toshiba, Proceedings of NTCIR-1, 1999. • Sakai, T., Robertson, S.E. and Walker, S.: Flexible Pseudo-Relevance Feedback for NTCIR-2, Proceedings of NTCIR-2, 2001. • Sakai, T., Koyama, M., Suzuki, M. and Manabe, T.: Toshiba KIDS at NTCIR-3: Japanese and English- Japanese IR, Proceedings of NTCIR-3, 2003. 1 paper per NTCIR
  • 5. NTCIR-4 (2004) • Sakai, T., Koyama, M., Kumano, A. and Manabe, T.: Toshiba BRIDJE at NTCIR-4 CLIR: Monolingual/Bilingual IR and Flexible Feedback, Proceedings of NTCIR-4, 2004. • Sakai, T., Saito, Y., Ichimura, Y., Koyama, M. and Kokubu, T.: Toshiba ASKMi at NTCIR-4 QAC2, Procedings of NTCIR-4, 2004. • Sakai, T.: New Performance Metrics based on Multigrade Relevance: Their Application to Question Answering, Proceedings of NTCIR-4 Proceedings (Open Submission Session), 2004. Q-measure This later evolved into EVIA 3 papers
  • 6. NTCIR-5 (2005) • Kokubu, T., Sakai, T., Saito, Y., Tsutsui, H., Manabe, T., Koyama, M. and Fujii, H.: The Relationship between Answer Ranking and User Satisfaction in a Question Answering System, Proceedings of NTCIR-5 (Open Submission Session), 2005. • Sakai, T.: The Effect of Topic Sampling on Sensitivity Comparisons of Information Retrieval Metrics, Proceedings of NTCIR-5 (Open Submission Session), 2005. • Sakai, T., Manabe, T., Kumano, A., Koyama, M. and Kokubu, T.: Toshiba BRIDJE at NTCIR-5: Evaluation using Geometric Means, Proceedings of NTCIR-5, 2005. 3 papers
  • 7. NTCIR-6 (2007) • Sakai, T.: On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance, Proceedings of EVIA 2007. • Sakai, T.: User Satisfaction Task: A Proposal for NTCIR-7, Proceedings of EVIA 2007. • Sakai, T., Koyama, M., Izuha, T., Kumano, A., Manabe, T. and Kokubu, T.: Toshiba BRIDJE at NTCIR-6 CLIR: The Head/Lead Method and Graded Relevance Feedback, Proceedings of NTCIR-6, 2007. 3 papers
  • 8. NTCIR-7 (2008) • Sakai, T. and Robertson, S.: Modelling A User Population for Designing Information Retrieval Metrics, Proceedings of EVIA 2008. • Sakai, T. and Kando, N.: Are Popular Documents More Likely To Be Relevant? A Dive into the ACLIA IR4QA Pools, Proceedings of EVIA 2008. • Mitamura, T., Nyberg, E., Shima, H., Kato, T., Mori, T., Lin, C.- Y., Song, R., Lin, C.-J., Sakai, T., Ji, D. and Kando, N.: Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access, Proceedings of NTCIR-7, 2008. • Sakai, T., Kando, N., Lin, C.-J., Mitamura, T., Shima, H., Ji, D., Chen, K.-H., and Nyberg, E.: Overview of the NTCIR-7 ACLIA IR4QA Task, Proceedings of NTCIR-7, 2008. NCU Debut as a task organiser 4 papers
  • 9. NTCIR-8 (2010) • Song, R., Qi, D., Liu, H., Sakai, T., Nie, J.-Y., Hon, H.-W. and Yu, Y.: Constructing a Test Collection with Multi-Intent Queries, Proceedings of EVIA 2010. • Sakai, T., Craswell, N., Song, R., Robertson, S., Dou, Z. and Lin, C.-Y.: Simple Evaluation Metrics for Diversified Search Results, Proceedings of EVIA 2010. • Sakai, T. and Lin, C.-Y.: Ranking Retrieval Systems without Relevance Assessments ? Revisited, Proceedings of EVIA 2010. • Mitamura, T., Shima, H., Sakai, T., Kando, N., Mori, T., Takeda, K., Lin, C.-Y., Song, R., Lin, C.-J. and Lee, C.-W.: Overview of the NTCIR-8 ACLIA Tasks: Advanced Cross-Lingual Information Access, Proceedings of NTCIR-8, 2010. • Sakai, T., Shima, H., Kando, N., Song, R., Lin, C.-J., Mitamura, T., Sugimoto, M. and Lee, C.-W.: Overview of NTCIR-8 ACLIA IR4QA, Proceedings of NTCIR-8, 2010. • Gey, F., Larson, R., Kando, N., Machado, J. and Sakai, T.: NTCIR-GeoTime Overview: Evaluating Geographic and Temporal Search, Proceedings of NTCIR-8, 2010. • Ishikawa, D., Sakai, T. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part I): The Test Collection and the Task, Proceedings of NTCIR-8, 2010. • Sakai, T., Ishikawa, D. and Kando, N.: Overview of the NTCIR-8 Community QA Pilot Task (Part II): System Evaluation, Proceedings of NTCIR-8, 2010. • Song, Y.-I., Liu, J., Sakai, T., Wang, X.-J., Feng, G., Cao, Y., Suzuki, H. and Lin, C.-Y.: Microsoft Research Asia with Redmond at the NTCIR-8 Community QA Pilot Task, Proceedings of NTCIR-8, 2010. D-measures 9 papers
  • 10. NTCIR-9 (2011) • Ishikawa, D., Kando, N. and Sakai, T.: What Makes a Good Answer in Community Question Answering? An Analysis of Assessors' Criteria, Proceedings of EVIA 2011. • Song, R., Zhang, M., Sakai, T., Kato, M.P., Liu, Y., Sugimoto, M., Wang, Q. and Orii, N.: Overview of the NTCIR-9 INTENT Task, Proceedings of NTCIR-9, 2011. • Sakai, T., Kato, M.P. and Song, Y.-I.: Overview of NTCIR-9 1CLICK, Proceedings of NTCIR-9, 2011. • Orii, N., Song, Y.-I. and Sakai, T.: Microsoft Research Asia at the NTCIR-9 1CLICK Task, Proceedings of NTCIR-9, 2011. • Han, J., Wang, Q., Orii, N., Dou, Z., Sakai. T. and Song, R.: Microsoft Research Asia at the NTCIR-9 Intent Task, Proceedings of NTCIR-9, 2011. • Morita, H., Makino, T., Sakai, T., Takamura, H. and Okumura, M.: TTOKU Summarization Based Systems at NTCIR-9 1CLICK Task, Proceedings of NTCIR-9, 2011. • Joho, H. and Sakai, T.: Grid-based Interaction for NTCIR-9 VisEx Task, Proceedings of NTCIR-9, 2011. 7 papers
  • 11. NTCIR-10 (2013) • Sakai, T.: The Unreusability of Diversified Search Test Collections, Proceedings of EVIA 2013. • Sakai, T., Dou, Z., Yamamoto, T., Liu, Y., Zhang, M., Song, R., Kato, M.P. and Iwata, M.: Overview of the NTCIR-10 INTENT-2 Task, Proceedings of NTCIR-10, 2013. • Kato, M.P., Ekstrand-Abueg, M., Pavlu, V., Sakai, T., Yamamoto, T. and Iwata, M.: Overview of the NTCIR-10 1CLICK-2 Task, Proceedings of NTCIR-10, 2013. • Tsukuda, K., Dou, Z. and Sakai, T.: Microsoft Research Asia at the NTCIR-10 Intent Task, Proceedigns of NTCIR- 10, 2013. • Narita, K., Sakai, T., Dou, Z. and Song, Y.-I.: MSRA at NTCIR-10 1CLICK-2, Proceedings of NTCIR-10, 2013. 5 papers
  • 12. NTCIR-11 (2014) • Sakai, T.: Topic Set Size Design with Variance Estimates from Two-Way ANOVA, Proceedings of EVIA 2014. • Kato, M.P., Ekstrand-Abueg, M., Pavlu, V., Sakai, T., Yamamoto, T. and Iwata, M.: Overview of the NTCIR-11 MobileClick Task, Proceedings of NTCIR- 11, 2014. Joined Waseda in September 2013 2 papers
  • 13. NTCIR-12 (2016) • Sakai, T. and Shang, L: On Estimating Variances for Topic Set Size Design, Proceedings of EVIA 2016. • Kato, M.P., Pavlu, V., Sakai, T., Yamamoto, T. and Morita, H.: Two-layered Summaries for Mobile Search: Does the Evaluation Measure Reflect User Preferences?, Proceedings of EVIA 2016. • Shang, L., Sakai, T., Lu, Z., Li, H., Higashinaka, R. and Miyao, Y.: Overview of the NTCIR-12 Short Text Conversation Task, Proceedings of NTCIR-12, 2016. • Kato, M.P., Sakai, T., Yamamoto, T., Pavlu, V., Morita, H. and Fujita, S.: Overview of the NTCIR-12 MobileClick Task, Proceedings of NTCIR-12, 2016. • Nanba, H., Sakai, T., Kando, N., Keyaki, A., Eguchi, K., Hatano, K., Shimizu, T., Hirate, Y. and Fujii, A.: NEXTI at NTCIR-12 IMine-2 Task, Proceedings of NTCIR-12, 2016. • Higuchi, S. and Sakai, T.: SLQAL at the NTCIR-12 QALab-2 Task, Proceedings of NTCIR-12, 2016. • Denawa, H., Sano, T., Kadotami, Y., Kato, S. and Sakai, T.: SLSTC at the NTCIR-12 STC Task, Proceedings of NTCIR-12, 2016. • Iijima, S. and Sakai, T.: SLLL at the NTCIR-12 Lifelog Task: Sleepflower and the LIT Subtask, Proceedings of NTCIR-12 My students’ debut at NTCIR 8 papers
  • 14. NTCIR-13 (2017) • Shang, L., Sakai, T., Li, H., Higashinaka, R., Miyao, Y., Arase, Y., and Nomoto,M.: Overview of the NTCIR-13 Short Text Conversation Task, Proceedings of NTCIR-13, 2017. • Luo, C., Sakai, T., Liu, Y., Dou, Z., Xiong, C., and Xu, J.: Overview of the NTCIR-13 We Want Web Task, Proceedings of NTCIR-13, 2017. • Kashimura, R. and Sakai, T.: SLOLQ at the NTCIR-13 OpenLiveQ Task, Proceedings of NTCIR-13, 2017. • Sato, K. and Sakai, T.: SLQAL at the NTCIR-13 QA Lab-3 Task, Proceedings of NTCIR-13, 2017. • Guan, J. and Sakai, T.: SLSTC at the NTCIR-13 STC Task, Proceedings of NTCIR-13, 2017. • Xiao, P., Li, L., Fan, Y., and Sakai, T.: SLWWW at the NTCIR-13 WWW Task, Proceedings of NTCIR-13, 2017. • Zeng, Z., Luo, C., Shang, L., Li, H., and Sakai, T.: Test Collections and Measures for Evaluating Customer- Helpdesk Dialogues, Proceedings of EVIA 2017. • Sakai, T.: Evaluating Evaluation Measures with Worst-Case Confidence Interval Widths, Proceedings of EVIA 2017. • Sakai, T.: Towards Automatic Evaluation of Multi-Turn Dialogues: A Task Design that Leverages Inherently Subjective Annotations, Proceedings of EVIA 2017. • Sakai, T.: The Effect of Inter-Assessor Disagreement on IR System Evaluation: A Case Study with Lancers and Students, Proceedings of EVIA 2017. • Sakai, T.: Unanimity-Aware Gain for Highly Subjective Assessments, Proceedings of EVIA 2017. 11 papers
  • 15. NTCIR-14 (2019) • Sakai, T., Ferro, N., Soboroff, I., Zeng, Z., Xiao, P., and Maistro, M.: Overview of the NTCIR-14 CENTRE Task, Proceedings of NTCIR-14, 2019. • Mao, J., Sakai, T., Luo, C., Xiao, P., Liu, Y., and Dou, Z.: Overview of the NTCIR-14 We Want Web Task, Proceedings of NTCIR-14, 2019. • Zeng, Z., Kato, S., and Sakai, T.: Overview of the NTCIR-14 Short Text Conversation Task: Dialogue Quality and Nugget Detection Subtasks, Proceedings of NTCIR-14, 2019. • Kato, S., Suzuki, R., Zeng, Z., and Sakai, T.: SLSTC at the NTCIR-14 STC-3 Dialogue Quality and Nugget Detection Subtasks, Proceedings of NTCIR-14, 2019. • Xiao, P. and Sakai, T.: SLWWW at the NTCIR-14 We Want Web Task, Proceedings of NTCIR-14, 2019. For the first time, I don’t have a paper at EVIA! 5 papers?
  • 16. Or so I thought... • Oard, D.W., Sakai, T., and Kando, N.: Celebrating 20 Years of NTCIR: The Book, Proceedings of EVIA 2019.
  • 17. TALK OUTLINE 1. NTCIR and me 2. Survey of NTCIR overviews (1999-2019) 3. Q-measures etc. 4. D-measures etc. 5. Beyond graded relevance 6. Summary
  • 18. [Harman05] (The TREC book) “Relevance was defined within the task of the information analyst, with TREC assessors instructed to judge a document relevant if information from that document would be used in some manner for the writing of a report on the subject of the topic. This also implies the use of binary relevance judgments;”
  • 19. NTCIR overviews (1999-2019) survey method • Examined all overview papers (for tasks that involved ranked retrieval only) • Examined how many relevance levels were used and how they were obtained in each task (ALL NTCIR retrieval tasks use graded relevance levels!) • Examined whether graded relevance measures were used to evaluate the participating systems.
  • 20. IF you want (a) > (b) > (c), then you should use graded relevance measures. Relevant Partially relevant Partially relevant Nonrelevant (a) Partially relevant Partially relevant Relevant Nonrelevant Nonrelevant Nonrelevant Relevant Nonrelevant (b) (c)
  • 21. IF you want (a) > (b) > (c), “relaxed relevance” doesn’t work. Relevant Partially relevant Partially relevant Nonrelevant (a) Partially relevant Partially relevant Relevant Nonrelevant Nonrelevant Nonrelevant Relevant Nonrelevant (b) (c) Considered equally effective
  • 22. IF you want (a) > (b) > (c), “rigid relevance” doesn’t work. Relevant Partially relevant Partially relevant Nonrelevant (a) Partially relevant Partially relevant Relevant Nonrelevant Nonrelevant Nonrelevant Relevant Nonrelevant (b) (c) Considered equally effective
  • 23. Tasks that used only binary relevance measures
  • 24. Tasks that used grade relevance measures (1)
  • 25. Tasks that used grade relevance measures (2)
  • 26. TALK OUTLINE 1. NTCIR and me 2. Survey of NTCIR overviews (1999-2019) 3. Q-measures etc. 4. D-measures etc. 5. Beyond graded relevance 6. Summary
  • 27. Normalised Cumulative Utility (1) [Sakai+Robertson EVIA08] : r 1 2 3 : Population of users who scan the ranked list
  • 28. Normalised Cumulative Utility (2) : r 1 2 3 : Stopping probability at r Users who abandon the list at r=1 Users who abandon the list at r=3
  • 29. Normalised Cumulative Utility (3) : r 1 2 3 : Measure utility of this doc for this user group Measure utility of these docs for this user group Utility at r NCU is “expected utility”
  • 30. AP is an NCU (1) • Suppose R=3 relevant docs are known. Nonrelevant Relevant Nonrelevant Relevant 33% of users 33% of users Nonrelevant Stopping probability distribution: uniform over relevant docs 33% of users Retrieved Not retrieved Relevant
  • 31. AP is an NCU (2) • Suppose R=3 relevant docs are known. Nonrelevant Relevant Nonrelevant Relevant 33% of users 33% of users Nonrelevant Prec(2) = 1/2 Prec(5) = 2/5 AP = ( Prec(2) + Prec(5) + 0 ) / 3 = 0.300
  • 32. Q-measure is an NCU (1) • Suppose R=3 relevant (1 highly rel, 2 partially rel) docs are known. Nonrelevant Highly rel: 3 Nonrelevant Partially rel: 1 33% of users 33% of users Nonrelevant Stopping probability distribution: uniform over relevant docs 33% of users Retrieved Not retrieved Partially rel: 1
  • 33. Q-measure is an NCU (2) • Suppose R=3 relevant (1 highly rel, 2 partially rel) docs are known. Nonrelevant Highly rel: 3 Nonrelevant Partially rel: 1 33% of users 33% of users Nonrelevant BR(2) = 4/6 BR(5) = 6/10 Q = ( BR(2) + BR(5) + 0 ) / 3 = 0.422 Q generalizes AP by using the Blended Ratio instead of Prec as Utility
  • 34. BR combines Prec and Normalised Cumulative Gain (1) • Suppose R=3 relevant (1 highly rel, 2 partially rel) docs are known. Nonrelevant Highly rel: 3 Nonrelevant Partially rel: 1 Nonrelevant Prec(2) = 1/2 Highly rel: 3 Partially rel: 1 Partially rel: 1 Ideal list cg(r) cg*(r) Cumulative gain 0 3 3 3 4 3 4 5 5 5 BR(2) = (1+3)/(2+4) = 4/6 with β=1
  • 35. BR combines Prec and Normalised Cumulative Gain (2) • Suppose R=3 relevant (1 highly rel, 2 partially rel) docs are known. Nonrelevant Highly rel: 3 Nonrelevant Partially rel: 1 Nonrelevant Prec(5) = 2/5 Highly rel: 3 Partially rel: 1 Partially rel: 1 Ideal list cg(r) cg*(r) Cumulative gain 0 3 3 3 4 3 4 5 5 5 BR(5) = (2+4)/(5+5) = 6/10 with β=1
  • 36. Patience parameter β of BR (binary relevance environment) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 β=0.1 β=1 β=10 r1 <= R ⇒ BR(r1)=(1+β)/(r1+βr1)=1/r1 r1 > R ⇒ BR(r1)=(1+β)/(r1+βR) r1 : rank of the 1st relevant doc Large β ⇒ more tolerance to relevant docs at low ranks BR(r1) R=5
  • 37. TALK OUTLINE 1. NTCIR and me 2. Survey of NTCIR overviews (1999-2019) 3. Q-measures etc. 4. D-measures etc. 5. Beyond graded relevance 6. Summary
  • 38. Diversified search • Given an ambiguous/underspecified query, produce a single Search Engine Result Page that satisfies different user intents! • Challenge: balancing relevance and diversity SERP(SearchEngineResultPage) Highly relevant near the top Give more space to popular intents? Give more space to informational intents? Cover many intents
  • 39. Approaches to evaluating diversified search • Îą-nDCG [Clarke+SIGIR08] • Intent-Aware measures [Agrawal+WSDM09, Chapelle+IR11] (1) Compute a measure for each intent (2) Combine the measures using intent probabilities as weights • D(#)-measures [Sakai+EVIA10,Sakai+SIGIR11] (1) Combine intentwise graded relevance with intent probabilities to compute the gain of each document (2) Construct an ideal list based on the gain, and then compute a graded relevance measure based on it
  • 40. D-measures (1) Intent i: “harry potter books” Pr(i|q) = 0.7 Partially rel:1 Highly rel:3 Perfect:7 Nonrel:0 Partially rel:1 Partially rel:1 Reldoc1 Reldoc2 Reldoc3 Per-intent gain values gi gj Intent j: “pottermore.com” Pr(j|q) = 0.3 R = 3 relevant documents 2 intents
  • 41. D-measures (2) Reldoc1 Reldoc2 Reldoc3 0.7*1+0.3*7=2.8 0.7*1+0.3*1=1.0 0.7*3+0.3*0=2.1 D-DCG* = 2.8 + 2.1/log2(2+1) +1.0/log2(3+1) = 4.62 Per-intent gain values gi gj R = 3 relevant documents 2 intents Intent i: “harry potter books” Pr(i|q) = 0.7 Intent j: “pottermore.com” Pr(j|q) = 0.3 Ideal list based on global gains Pr(i|q) gi + Pr(j|q) gj Partially rel:1 Highly rel:3 Perfect:7 Nonrel:0 Partially rel:1 Partially rel:1
  • 42. D-measures (3) nonrel nonrel 2.1 nonrel Reldoc1 Reldoc2 Reldoc3Reldoc2 Ideal list based on global gains Pr(i|q) gi + Pr(j|q) gj D-DCG = 2.1/log2(3+1) = 1.05 D-DCG* = 4.62 D-nDCG = D-DCG/D-DCG* = 0.23 Per-intent gain values gi gj SERP to be evaluated R = 3 relevant documents 2 intents Intent i: “harry potter books” Pr(i|q) = 0.7 Intent j: “pottermore.com” Pr(j|q) = 0.3 Partially rel:1 Highly rel:3 Perfect:7 Nonrel:0 Partially rel:1 Partially rel:1 0.7*1+0.3*7=2.8 0.7*1+0.3*1=1.0 0.7*3+0.3*0=2.1
  • 43. Intent recall (aka subtopic recall [Zhai03] ) I-rec = #intents covered by SERP / #intents = 1/2 nonrel nonrel nonrel Reldoc2 Per-intent gain values gi gj R = 3 relevant documents 2 intents Reldoc1 Reldoc2 Reldoc3Only Intent i is covered by SERP Intent i: “harry potter books” Pr(i|q) = 0.7 Intent j: “pottermore.com” Pr(j|q) = 0.3 SERP to be evaluated Partially rel:1 Highly rel:3 Perfect:7 Nonrel:0 Partially rel:1 Partially rel:1
  • 44. D#-measure = Îł I-rec + (1-Îł) D-measure D#-nDCG contour lines Pure diversity Overall relevance Official results from the NTCIR-10 INTENT-2 task
  • 45. So which adhoc/diversity measures are “good”? https://waseda.box.com/sigir2019preprint
  • 46. TALK OUTLINE 1. NTCIR and me 2. Survey of NTCIR overviews (1999-2019) 3. Q-measures etc. 4. D-measures etc. 5. Beyond graded relevance 6. Summary
  • 47. Current approaches: gold relevance labels 0 1 0 1 Assessors’ diverse ratings 0 1 0 1 Final relevance grade: 0.5 Final relevance grade: 0.5
  • 48. New approaches: gold distributions 0 1 0 1 Assessors’ diverse ratings 0 1 0 1 Use the distributions directly for evaluation! The gold data preserves the diverse views of users.
  • 49. Please see the STC-3 overview AND https://waseda.box.com/SIGIR2018preprint
  • 50. TALK OUTLINE 1. NTCIR and me 2. Survey of NTCIR overviews (1999-2019) 3. Q-measures etc. 4. D-measures etc. 5. Beyond graded relevance 6. Summary
  • 51. Summary • Survey of NTCIR ranked retrieval tasks (1999-2019): most of them utilise graded relevance measures, but not all. • If relevance grades are important for your task, graded relevance measures should be used. Converting graded relevance to binary relevance is inadequate. • Beyond relevance labels: utilise gold distributions that preserve diverse views. • THE NTCIR BOOK WILL BE OUT IN 2020 FROM SPRINGER!