SlideShare a Scribd company logo
1 of 12
Download to read offline
Evaluation and Relevance

                                   Lecture 8
                                  CS 410/510
                      Information Retrieval on the Internet




                  What should we evaluate?
          •   Time
          •   Space
          •   Cost
          •   Usability
          •   Retrieval performance




                                   CS 510 Winter 2007         2




(c) Susan Price and David Maier



                                                                  1
Evaluating Retrieval Performance
          • Perspectives
               – System perspective
               – User perspective
          • Modes
               – Batch mode
                    • Repeatable, scalable
                    • May not reflect user experience
               – Interactive mode
                    • UI
                    • Separate user and system performance?

                                           CS 510 Winter 2007                              3




                           System evaluation*
          • “An abstraction of the retrieval process
            that equates good performance with good
            document rankings”1
          • Advantages
               – Can control some of the variables
                    • Comparative experiments more powerful
               – Less expensive than user evaluations
               – More diagnostic information about system
        *Based on: 1Voorhees, EM. The Philosophy of Information Retrieval Evaluation. CA
        Peters et al. (Eds): CLEF 2001, LNCS 2406, pp 355-370, 2002.
                                           CS 510 Winter 2007                              4




(c) Susan Price and David Maier



                                                                                               2
Test collections
          • Cranfield paradigm
          • Components
               – Documents
               – Requests
               – Relevance judgments
          • Advantages
               – Allow comparing performance of retrieval
                 algorithms while controlling other variables
               – Less expensive than user evaluations
                                      CS 510 Winter 2007          5




               Properties of test collections
          • Number of documents
          • Kinds of documents
               – Domain
               – Format/purpose/language
               – Full text or not? Indexed?
          • Number of requests
               – Representative of real requests?
          • Relevance judgments
               – Complete? By who? Binary? Using what standard?

                                      CS 510 Winter 2007          6




(c) Susan Price and David Maier



                                                                      3
Evaluation using test collections
          • A score calculated for an evaluation
            measure depends on the characteristics of
            the test collection
               – Meaningless by itself
               – Only useful for comparison with score from
                 another system using exact same collection
          • A larger number of requests increases
            confidence in conclusions
               – Typically 25 to 50 in TREC

                                  CS 510 Winter 2007                 7




           Text Retrieval Conference (TREC)
          • Annual conferences since 1992
               – Co-sponsored by NIST and DARPA
          • Promote IR research by providing infrastructure
            to work on large collections
               – Standardized document collections and information
                 need statements
               – Provide relevance judgments
          • Annual cycle of tasks, topics
               – Submit results in late summer/early fall
               – Workshop in November to present, discuss results

                                  CS 510 Winter 2007                 8




(c) Susan Price and David Maier



                                                                         4
Some IR tasks studied by TREC
          • Text retrieval                   • Other collections
               –   Ad hoc                          –    Video
               –   Filtering                       –    Web
               –   High accuracy                   –    Terabyte
               –   Interactive                     –    Blog
               –   Novelty                         –    Genomics
               –   Question answering              –    Legal
          • Other languages                        –    Enterprise
               – Cross-language                    –    Spam


                                   CS 510 Winter 2007                 9




            Relevance judgments at TREC
          • If any of the document relates to the topic
            of the query, the document is relevant
          • Binary judgments
          • Judgments based on pooled sample
               – Too expensive to judge all documents
                 (hundreds of thousands)
               – Pool the top n-ranked documents from each
                 submitted run and judge those

                                   CS 510 Winter 2007                10




(c) Susan Price and David Maier



                                                                          5
Evaluation: Metrics

                                                       Retrieved   Retr.   Relevant
          • Two basics:                                docs         &         docs
                                                                   Rel.



                                  # documents retrieved and relevant
                 Recall =
                                  # documents relevant

                                   # documents retrieved and relevant
                 Precision =
                                   # documents retrieved

                                        CS 510 Winter 2007                            11




                           Evaluation: Metrics
          • What about ranked results?
               – Recall and precision fit the Boolean model
               – A relevant document first on a list is more
                 useful than 89th on a list
          • Two main approaches
               – Consider precision at various levels of recall
                    • Plot precision as a function of recall
               – Summarize performance with a single statistic

                                        CS 510 Winter 2007                            12




(c) Susan Price and David Maier



                                                                                           6
Plotting recall and precision
          • Typically reported at 11 standard levels of recall
               – 0, 10, 20 ... 100 percent
               – Allows averaging over multiple topics with different
                 numbers of relevant documents
          • Interpolate based on actual values
               – For any standard recall level i, take maximum
                 precision at any actual recall level >= i
               – This defines a precision at the standard recall of 0
                 even though precision at actual recall 0 is undefined


                                              CS 510 Winter 2007                                13




                Plotting recall and precision
          Relevant     Rank       DocID   Recall   Precision            Recall   Interpolated
           docs                                    at this recall       level      precision
            0123       1          0234    0                         0            0.5
            0132       2          0132    0.111    0.5              10           0.5
            0241
                       3          0115    0.111                     20           0.4
            0256
                       4          0193    0.111                     30           0.4
            0299
            0311       5          0123    0.222    0.4              40           0.4
            0324       6          0345    0.222                     50           0
            0357                                                    60           0
                       7          0387    0.222
            0399
                       8          0256    0.333    0.375            70           0
                       9          0078    0.333                     80           0
                       10         0311    0.444    0.4              90           0
                       11         0231    0.444                     100          0
                       12         0177    0.444
                                              CS 510 Winter 2007                                14




(c) Susan Price and David Maier



                                                                                                     7
Plotting recall and precision
             Recall    Interpolated
             level       precision    Recall and precision for a single query
         0             0.5
         10            0.5                                 11-point Interpolated Recall-
         20            0.4                                           Precision

         30            0.4                           0.6
         40            0.4                           0.5



                                         Precision
         50            0                             0.4

         60            0                             0.3
                                                     0.2
         70            0
                                                     0.1
         80            0
                                                      0
         90            0
                                                      0
                                                           10
                                                                20

                                                                     30
                                                                          40

                                                                               50
                                                                                    60

                                                                                         70
                                                                                              80
                                                                                                    90

                                                                                                     0
                                                                                                   10
         100           0                                                       Recall

                                         CS 510 Winter 2007                                              15




                 Plotting recall and precision
          • Single query performance not necessarily
            representative of system
               – Compute recall and precision for multiple
                 queries
               – Average the interpolated values at each recall
                 level




                                         CS 510 Winter 2007                                              16




(c) Susan Price and David Maier



                                                                                                              8
Which system is better?
          Recall   Average interpolated
          level         precision
                                                            11-point Interpolated Recall-
                   System 1 System 2                                  Precision
         0         0.6        1.0                                      System 1           System 2
                                                      1.2
         10        0.5        0.8
                                                       1
         20        0.4        0.75




                                          Precision
                                                      0.8
         30        0.375      0.6
                                                      0.6
         40        0.33       0.6
                                                      0.4
         50        0.33       0.5
                                                      0.2
         60        0.33       0.4
                                                       0
         70        0.2        0.375                    0




                                                                                                      0
                                                            10

                                                                 20

                                                                      30

                                                                           40

                                                                                50
                                                                                     60

                                                                                          70

                                                                                               80

                                                                                                     90
                                                                                                    10
         80        0.2        0.33
                                                                                Recall
         90        0.125      0.2
         100       0.0        0.0
                                          CS 510 Winter 2007                                              17




                   Which system is better?
          Recall   Average interpolated
          level         precision                           11-point Interpolated Recall-
                   System 1 System 2                                  Precision
         0         1.0        0.6
                                                                       System 1           System 2
         10        0.8        0.5
                                                      1.2
         20        0.75       0.4
                                                       1
         30        0.6        0.4
                                          Precision




                                                      0.8
         40        0.6        0.4                     0.6
         50        0.33       0.4                     0.4
         60        0.2        0.4                     0.2
                                                       0
         70        0.125      0.375
                                                       0




                                                                                                      0
                                                            10
                                                                 20

                                                                      30

                                                                           40

                                                                                50

                                                                                     60

                                                                                          70

                                                                                               80

                                                                                                     90




         80        0.1        0.33
                                                                                                    10




         90        0.0        0.2                                               Recall

         100       0.0        0.0
                                          CS 510 Winter 2007                                              18




(c) Susan Price and David Maier



                                                                                                               9
Mean average precision (MAP)
          • Calculate average precision (AP) for each query
               – Calculate precision at each “seen” relevant doc
                    • Not interpolated
                    • For each relevant doc not returned, precision = 0
               – Calculate the average for the precisions for each
                                              i
                                    AP = (∑i =1
                 relevant doc            R
                                                             )/ R
                                                    rank i
                  where R = number of relevant docs for that query and
                  i/ranki = 0 if document i was not retrieved
          • Calculate the mean of the APs for all the queries

                                        CS 510 Winter 2007                           19




                          Mean Average Precision
            Average precision (AP)
            Docs: (9 relevant)     Precision       Docs                  Precision
            1 Not relevant                         6 Not relevant
            2 Relevant             1/2 = 0.5       7 Not relevant
            3 Not relevant                         8 Relevant            3/8=0.375
            4 Not relevant                         9 Not relevant
            5 Relevant             2/5 = 0.4     10 Relevant             4/10=0.4
            Not found                 0
            AP       (0.5 + 0.4 + 0.375 + 0.4 + 0 + 0 + 0 + 0 + 0) / 9 = 0.1861

            Mean average precision (MAP)
            • calculated for a batch of queries
            • MAP = (∑i =1 APi ) / Q where Q = number of queries in a batch
                         Q




                                        CS 510 Winter 2007                           20




(c) Susan Price and David Maier



                                                                                          10
bpref
          • Based on idea of preference relation
               – Prefer doc A to doc B (RelA > RelB)
          • bpref assumes binary relevance judgments
               – Is a function of # of times judged non-relevant docs
                 retrieved before relevant docs
               – Does not assume complete judgments
               – Is more stable than other measures to incomplete
                 relevance judgments (e.g. very large test collection)
                 and imperfect relevance judgments (e.g web pages
                 that disappear from the collection)

                                      CS 510 Winter 2007                 21




                                        bpref
                         1         | n ranked higher than r |
            bpref =
                         R
                           ∑r (1 −         min( R, N )
                                                              )

          where
               – R = number of judged relevant documents
               – N = number of judged non-relevant
                 documents
               – r is a relevant retrieved document
               – n is a member of the first R non-relevant
                 retrieved documents

                                      CS 510 Winter 2007                 22




(c) Susan Price and David Maier



                                                                              11
Other metrics
          • Calculate average precision for the top N documents
               – Precision@10, precision@20, etc.
               – Easy to calculate, interpretation is intuitive
               – Doesn’t average well – fails to account for different recall levels
                 (diff queries have different number relevant docs)
          • R-precision
               – R is total number of relevant docs
               – Calculate precision@R for each query and average
          • Query histograms
               – Plot performance difference for each query
                     1
                   0.8
                   0.6
                   0.4
                   0.2
                     0
                   -0.2
                   -0.4
                   -0.6   1   2   3   4   5   6   7   8

                                      Query


                                              CS 510 Winter 2007                       23




(c) Susan Price and David Maier



                                                                                            12

More Related Content

Similar to Cs 510iri lecture8_relevanceevaluation-revised

Empirical se 2013-01-17
Empirical se 2013-01-17Empirical se 2013-01-17
Empirical se 2013-01-17Ivica Crnkovic
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search SolutionsFindwise
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Pablo Mendes
 
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesToine Bogers
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalSayeed Choudhury
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 
Got documents Code Mash Revision
Got documents Code Mash RevisionGot documents Code Mash Revision
Got documents Code Mash RevisionMaggie Pint
 
Got documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionGot documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionMaggie Pint
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
 
랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표
랭킹 최적화를 넘어 인간적인 검색으로  - 서울대 융합기술원 발표랭킹 최적화를 넘어 인간적인 검색으로  - 서울대 융합기술원 발표
랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표Jin Young Kim
 
An Open Platform DM Solution for New Account Intake
An Open Platform DM Solution for New Account IntakeAn Open Platform DM Solution for New Account Intake
An Open Platform DM Solution for New Account IntakeAlfresco Software
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16GenomeInABottle
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objectsseanb
 
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedis Labs
 

Similar to Cs 510iri lecture8_relevanceevaluation-revised (20)

Empirical se 2013-01-17
Empirical se 2013-01-17Empirical se 2013-01-17
Empirical se 2013-01-17
 
Designing and Implementing Search Solutions
Designing and Implementing Search SolutionsDesigning and Implementing Search Solutions
Designing and Implementing Search Solutions
 
Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012Sieve - Data Quality and Fusion - LWDM2012
Sieve - Data Quality and Fusion - LWDM2012
 
Benchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program CommitteesBenchmarking Domain-specific Expert Search using Workshop Program Committees
Benchmarking Domain-specific Expert Search using Workshop Program Committees
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Bas van Velzen (University of Amsterdam), Documentation or Reporting?
Bas van Velzen (University of Amsterdam), Documentation or Reporting?Bas van Velzen (University of Amsterdam), Documentation or Reporting?
Bas van Velzen (University of Amsterdam), Documentation or Reporting?
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_final
 
Enterprise Search @EPAM
Enterprise Search @EPAMEnterprise Search @EPAM
Enterprise Search @EPAM
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Got documents Code Mash Revision
Got documents Code Mash RevisionGot documents Code Mash Revision
Got documents Code Mash Revision
 
Got documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionGot documents - The Raven Bouns Edition
Got documents - The Raven Bouns Edition
 
Got documents?
Got documents?Got documents?
Got documents?
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
 
랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표
랭킹 최적화를 넘어 인간적인 검색으로  - 서울대 융합기술원 발표랭킹 최적화를 넘어 인간적인 검색으로  - 서울대 융합기술원 발표
랭킹 최적화를 넘어 인간적인 검색으로 - 서울대 융합기술원 발표
 
An Open Platform DM Solution for New Account Intake
An Open Platform DM Solution for New Account IntakeAn Open Platform DM Solution for New Account Intake
An Open Platform DM Solution for New Account Intake
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
 
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir ShpilraienRedisSearch / CRDT: Kyle Davis, Meir Shpilraien
RedisSearch / CRDT: Kyle Davis, Meir Shpilraien
 

Recently uploaded

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 

Recently uploaded (20)

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 

Cs 510iri lecture8_relevanceevaluation-revised

  • 1. Evaluation and Relevance Lecture 8 CS 410/510 Information Retrieval on the Internet What should we evaluate? • Time • Space • Cost • Usability • Retrieval performance CS 510 Winter 2007 2 (c) Susan Price and David Maier 1
  • 2. Evaluating Retrieval Performance • Perspectives – System perspective – User perspective • Modes – Batch mode • Repeatable, scalable • May not reflect user experience – Interactive mode • UI • Separate user and system performance? CS 510 Winter 2007 3 System evaluation* • “An abstraction of the retrieval process that equates good performance with good document rankings”1 • Advantages – Can control some of the variables • Comparative experiments more powerful – Less expensive than user evaluations – More diagnostic information about system *Based on: 1Voorhees, EM. The Philosophy of Information Retrieval Evaluation. CA Peters et al. (Eds): CLEF 2001, LNCS 2406, pp 355-370, 2002. CS 510 Winter 2007 4 (c) Susan Price and David Maier 2
  • 3. Test collections • Cranfield paradigm • Components – Documents – Requests – Relevance judgments • Advantages – Allow comparing performance of retrieval algorithms while controlling other variables – Less expensive than user evaluations CS 510 Winter 2007 5 Properties of test collections • Number of documents • Kinds of documents – Domain – Format/purpose/language – Full text or not? Indexed? • Number of requests – Representative of real requests? • Relevance judgments – Complete? By who? Binary? Using what standard? CS 510 Winter 2007 6 (c) Susan Price and David Maier 3
  • 4. Evaluation using test collections • A score calculated for an evaluation measure depends on the characteristics of the test collection – Meaningless by itself – Only useful for comparison with score from another system using exact same collection • A larger number of requests increases confidence in conclusions – Typically 25 to 50 in TREC CS 510 Winter 2007 7 Text Retrieval Conference (TREC) • Annual conferences since 1992 – Co-sponsored by NIST and DARPA • Promote IR research by providing infrastructure to work on large collections – Standardized document collections and information need statements – Provide relevance judgments • Annual cycle of tasks, topics – Submit results in late summer/early fall – Workshop in November to present, discuss results CS 510 Winter 2007 8 (c) Susan Price and David Maier 4
  • 5. Some IR tasks studied by TREC • Text retrieval • Other collections – Ad hoc – Video – Filtering – Web – High accuracy – Terabyte – Interactive – Blog – Novelty – Genomics – Question answering – Legal • Other languages – Enterprise – Cross-language – Spam CS 510 Winter 2007 9 Relevance judgments at TREC • If any of the document relates to the topic of the query, the document is relevant • Binary judgments • Judgments based on pooled sample – Too expensive to judge all documents (hundreds of thousands) – Pool the top n-ranked documents from each submitted run and judge those CS 510 Winter 2007 10 (c) Susan Price and David Maier 5
  • 6. Evaluation: Metrics Retrieved Retr. Relevant • Two basics: docs & docs Rel. # documents retrieved and relevant Recall = # documents relevant # documents retrieved and relevant Precision = # documents retrieved CS 510 Winter 2007 11 Evaluation: Metrics • What about ranked results? – Recall and precision fit the Boolean model – A relevant document first on a list is more useful than 89th on a list • Two main approaches – Consider precision at various levels of recall • Plot precision as a function of recall – Summarize performance with a single statistic CS 510 Winter 2007 12 (c) Susan Price and David Maier 6
  • 7. Plotting recall and precision • Typically reported at 11 standard levels of recall – 0, 10, 20 ... 100 percent – Allows averaging over multiple topics with different numbers of relevant documents • Interpolate based on actual values – For any standard recall level i, take maximum precision at any actual recall level >= i – This defines a precision at the standard recall of 0 even though precision at actual recall 0 is undefined CS 510 Winter 2007 13 Plotting recall and precision Relevant Rank DocID Recall Precision Recall Interpolated docs at this recall level precision 0123 1 0234 0 0 0.5 0132 2 0132 0.111 0.5 10 0.5 0241 3 0115 0.111 20 0.4 0256 4 0193 0.111 30 0.4 0299 0311 5 0123 0.222 0.4 40 0.4 0324 6 0345 0.222 50 0 0357 60 0 7 0387 0.222 0399 8 0256 0.333 0.375 70 0 9 0078 0.333 80 0 10 0311 0.444 0.4 90 0 11 0231 0.444 100 0 12 0177 0.444 CS 510 Winter 2007 14 (c) Susan Price and David Maier 7
  • 8. Plotting recall and precision Recall Interpolated level precision Recall and precision for a single query 0 0.5 10 0.5 11-point Interpolated Recall- 20 0.4 Precision 30 0.4 0.6 40 0.4 0.5 Precision 50 0 0.4 60 0 0.3 0.2 70 0 0.1 80 0 0 90 0 0 10 20 30 40 50 60 70 80 90 0 10 100 0 Recall CS 510 Winter 2007 15 Plotting recall and precision • Single query performance not necessarily representative of system – Compute recall and precision for multiple queries – Average the interpolated values at each recall level CS 510 Winter 2007 16 (c) Susan Price and David Maier 8
  • 9. Which system is better? Recall Average interpolated level precision 11-point Interpolated Recall- System 1 System 2 Precision 0 0.6 1.0 System 1 System 2 1.2 10 0.5 0.8 1 20 0.4 0.75 Precision 0.8 30 0.375 0.6 0.6 40 0.33 0.6 0.4 50 0.33 0.5 0.2 60 0.33 0.4 0 70 0.2 0.375 0 0 10 20 30 40 50 60 70 80 90 10 80 0.2 0.33 Recall 90 0.125 0.2 100 0.0 0.0 CS 510 Winter 2007 17 Which system is better? Recall Average interpolated level precision 11-point Interpolated Recall- System 1 System 2 Precision 0 1.0 0.6 System 1 System 2 10 0.8 0.5 1.2 20 0.75 0.4 1 30 0.6 0.4 Precision 0.8 40 0.6 0.4 0.6 50 0.33 0.4 0.4 60 0.2 0.4 0.2 0 70 0.125 0.375 0 0 10 20 30 40 50 60 70 80 90 80 0.1 0.33 10 90 0.0 0.2 Recall 100 0.0 0.0 CS 510 Winter 2007 18 (c) Susan Price and David Maier 9
  • 10. Mean average precision (MAP) • Calculate average precision (AP) for each query – Calculate precision at each “seen” relevant doc • Not interpolated • For each relevant doc not returned, precision = 0 – Calculate the average for the precisions for each i AP = (∑i =1 relevant doc R )/ R rank i where R = number of relevant docs for that query and i/ranki = 0 if document i was not retrieved • Calculate the mean of the APs for all the queries CS 510 Winter 2007 19 Mean Average Precision Average precision (AP) Docs: (9 relevant) Precision Docs Precision 1 Not relevant 6 Not relevant 2 Relevant 1/2 = 0.5 7 Not relevant 3 Not relevant 8 Relevant 3/8=0.375 4 Not relevant 9 Not relevant 5 Relevant 2/5 = 0.4 10 Relevant 4/10=0.4 Not found 0 AP (0.5 + 0.4 + 0.375 + 0.4 + 0 + 0 + 0 + 0 + 0) / 9 = 0.1861 Mean average precision (MAP) • calculated for a batch of queries • MAP = (∑i =1 APi ) / Q where Q = number of queries in a batch Q CS 510 Winter 2007 20 (c) Susan Price and David Maier 10
  • 11. bpref • Based on idea of preference relation – Prefer doc A to doc B (RelA > RelB) • bpref assumes binary relevance judgments – Is a function of # of times judged non-relevant docs retrieved before relevant docs – Does not assume complete judgments – Is more stable than other measures to incomplete relevance judgments (e.g. very large test collection) and imperfect relevance judgments (e.g web pages that disappear from the collection) CS 510 Winter 2007 21 bpref 1 | n ranked higher than r | bpref = R ∑r (1 − min( R, N ) ) where – R = number of judged relevant documents – N = number of judged non-relevant documents – r is a relevant retrieved document – n is a member of the first R non-relevant retrieved documents CS 510 Winter 2007 22 (c) Susan Price and David Maier 11
  • 12. Other metrics • Calculate average precision for the top N documents – Precision@10, precision@20, etc. – Easy to calculate, interpretation is intuitive – Doesn’t average well – fails to account for different recall levels (diff queries have different number relevant docs) • R-precision – R is total number of relevant docs – Calculate precision@R for each query and average • Query histograms – Plot performance difference for each query 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 1 2 3 4 5 6 7 8 Query CS 510 Winter 2007 23 (c) Susan Price and David Maier 12