SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Human Machine Cooperation:
 User Corrections for AKBC
 Michael Wick, Karl Schultz, Andrew McCallum
     University of Massachusetts, Amherst.
Motivation
• KBs for real-world decision making
• Problem: data needs integration
 • AKBC/IE is scalable, but inaccurate
 • Humans are more accurate, lack coverage
• Question: how do we combine human
  and machine KBC?
Goal: build a database of every scientist in the world.
Knowledge Base Construction
 .pdf
  Text
   Text
 .bib
  docs                                               Structured
   docs
.html                                                   Data                   query




               Entity                   Relation                   Entities,
               Mentions                 Mentions                   Relations
  Entity                   Relation                  Resolution                  KB
Extraction                Extraction                  (Coref)
               Wei Li                   Attends(                    Wei Li
                W. Li                    Wei Li,                     W. Li
             Xinghua U.                Xinghua U.)                Xinghua U.

                                                                    “truth”
                                                                               answer


          Problem:
          (1) errors snowball in IE pipeline
          (2) errors persist in DB - forever
KB Coreference Errors
       First: Fernando                    First: Fernando
       Last: Pereira                      Last: Pereira
       Institution: Google,UPenn          Institution: U. Edinburgh, SRI
       Topics: CRF, IE, NLP               Topics: logic programming,
       Venues: ICML, NIPS, EMNLP          AI, urban traffic modeling, NLP
                                          Venues: Logic programming
id=5                               id=3




id=1
KB Coreference Errors
       First: Fernando                     First: Fernando
       Last: Pereira                       Last: Pereira
       Institution: Google,UPenn           Institution: U. Edinburgh, SRI
       Topics: CRF, IE, NLP                Topics: logic programming,
       Venues: ICML, NIPS, EMNLP           AI, urban traffic modeling, NLP
                                           Venues: Logic programming
id=5                                id=3
                           Coref?



id=1
KB Coreference Errors
       First: Fernando                               First: Fernando
       Last: Pereira                                 Last: Pereira
       Institution: Google,UPenn                     Institution: U. Edinburgh, SRI
       Topics: CRF, IE, NLP                          Topics: logic programming,
       Venues: ICML, NIPS, EMNLP                     AI, urban traffic modeling, NLP
                                                     Venues: Logic programming
id=5                                       id=3
                            Coref?
                      Features:
                      1. Institution overlap... NO
                      2.Venue overlap... NO
                      3. Topic overlap... LOW


id=1
KB Coreference Errors
       First: Fernando                               First: Fernando
       Last: Pereira                                 Last: Pereira
       Institution: Google,UPenn                     Institution: U. Edinburgh, SRI
       Topics: CRF, IE, NLP                          Topics: logic programming,
       Venues: ICML, NIPS, EMNLP                     AI, urban traffic modeling, NLP
                                                     Venues: Logic programming
id=5                                       id=3
                            Coref? NO
                      Features:
                      1. Institution overlap... NO
                      2.Venue overlap... NO
                      3. Topic overlap... LOW


id=1
Human Edits to Coreference

    “Fernando Pereira with id=5 is Fernando Pereira with id=3 ”
Human Edits to Coreference

       “Fernando Pereira with id=5 is Fernando Pereira with id=3 ”




 “Fernando Pereira with id=2 is Fernando Pereira with id=1”
Human Edits to Coreference

       “Fernando Pereira with id=5 is Fernando Pereira with id=3 ”




 “Fernando Pereira with id=2 is Fernando Pereira with id=1”




 “Fernando Pereira with id=5 is Fernando Pereira with id=4”
How should these edits be managed?
Edits to Coreference
KB with coref errors
Edits to Coreference
KB with coref errors
Edits to Coreference
KB with coref errors




Stream of user edits
          good edit      bad edit

                          must-link
          must-link
Edits to Coreference
KB with coref errors




Stream of user edits
          good edit                   bad edit

                                       must-link
          must-link




Incorporate edits: how do we resolve conflicts?
Strategy 1: Most recent edit gets priority
 Edit 1: good edit      Edit 2: bad edit


                                    must-link
            must-link




 Edit order: 2 then 1
Strategy 1: Most recent edit gets priority
 Edit 1: good edit      Edit 2: bad edit


                                    must-link
            must-link




 Edit order: 2 then 1
Strategy 1: Most recent edit gets priority
 Edit 1: good edit      Edit 2: bad edit


                                    must-link
            must-link




 Edit order: 2 then 1
Strategy 1: Most recent edit gets priority
 Edit 1: good edit      Edit 2: bad edit


                                    must-link
            must-link




 Edit order: 1 then 2
Strategy 1: Most recent edit gets priority
 Edit 1: good edit      Edit 2: bad edit


                                    must-link
            must-link




 Edit order: 1 then 2
Strategy 1: Most recent edit gets priority
 Edit 1: good edit      Edit 2: bad edit


                                    must-link
            must-link




 Edit order: 1 then 2
Strategy 2: Deterministic integration of edits
 Edit 1: good edit             Edit 2: bad edit


                                           must-link
           must-link




                                 ity
                           si e
                         an rc
                             tiv
                       tr fo
                          En
Strategy 2: Deterministic integration of edits
 Edit 1: good edit             Edit 2: bad edit


                                           must-link
           must-link




                                 ity
                           si e
                         an rc
                             tiv
                       tr fo
                          En
How should edits be managed?
How should edits be managed?
• User modification of “the truth” is risky
 • Humans disagree
 • Humans make mistakes
 • “Truth” changes over time
How should edits be managed?
• User modification of “the truth” is risky
 • Humans disagree
 • Humans make mistakes
 • “Truth” changes over time
• Our approach:
 • edits as statistical evidence
 • “truth” inferred from evidence
What is the truth?
What is the truth?
       The
      truth




     Evidence
What is the truth?
        The
       truth




     Evidence



    Unstructured data (e.g.PDFs)
What is the truth?
        The
       truth     Structured data
                (e.g., ACM, DBLP)



     Evidence



    Unstructured data (e.g.PDFs)
What is the truth?
            The
           truth      Structured data
                     (e.g., ACM, DBLP)



          Evidence
User edits

        Unstructured data (e.g.PDFs)
What is the truth?
                        The
Infered by MCMC
                       truth      Structured data
  IE models                      (e.g., ACM, DBLP)
 (e.g., CRFs)

                      Evidence
            User edits

                    Unstructured data (e.g.PDFs)
Human Edits as Evidence
 “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”
Human Edits as Evidence
 “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”




 “The CRF Fernando Pereira is the Prolog Fernando Pereira”
Human Edits as Evidence
 “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”




 “The CRF Fernando Pereira is the Prolog Fernando Pereira”




 “The NLP Fernando Pereira is the MPEG Fernando Pereira”
Human Edits as Evidence
 “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”

  Name: Fernando Pereira
  Institution: Google



 “The CRF Fernando Pereira is the Prolog Fernando Pereira”




 “The NLP Fernando Pereira is the MPEG Fernando Pereira”
Human Edits as Evidence
 “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”

  Name: Fernando Pereira                     Name: Fernando Pereira
  Institution: Google                        Institution: U. Edinburgh



 “The CRF Fernando Pereira is the Prolog Fernando Pereira”




 “The NLP Fernando Pereira is the MPEG Fernando Pereira”
Human Edits as Evidence
 “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”

  Name: Fernando Pereira                     Name: Fernando Pereira
  Institution: Google          must-link     Institution: U. Edinburgh



 “The CRF Fernando Pereira is the Prolog Fernando Pereira”




 “The NLP Fernando Pereira is the MPEG Fernando Pereira”
Human Edits as Evidence
 “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”

  Name: Fernando Pereira                     Name: Fernando Pereira
  Institution: Google          must-link     Institution: U. Edinburgh



 “The CRF Fernando Pereira is the Prolog Fernando Pereira”
 Name: Fernando Pereira                      Name: Fernando Pereira
 Topics: CRF                  must-link      Topics: Prolog



 “The NLP Fernando Pereira is the MPEG Fernando Pereira”
Human Edits as Evidence
 “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”

  Name: Fernando Pereira                     Name: Fernando Pereira
  Institution: Google          must-link     Institution: U. Edinburgh



 “The CRF Fernando Pereira is the Prolog Fernando Pereira”
 Name: Fernando Pereira                      Name: Fernando Pereira
 Topics: CRF                  must-link      Topics: Prolog



 “The NLP Fernando Pereira is the MPEG Fernando Pereira”
 Name: Fernando Pereira                      Name: Fernando Pereira
 Topics: NLP                 must-link       Topics: MPEG
Human Edits: Mentions
         Added to DB
        First: Fernando                               First: Fernando
        Last: Pereira                                 Last: Pereira
        Institution: Google,UPenn                     Institution: U. Edinburgh, SRI
        Topics: CRF, IE, NLP                          Topics: logic programming,
        Venues: ICML, NIPS, EMNLP                     AI, urban traffic modeling, NLP
                                                      Venues: Logic programming



Name: Fernando Pereira              Name: Fernando Pereira
Institution: Google                 Institution: U. Edinburgh


               Name: Fernando Pereira                Name: Fernando Pereira
               Topics: CRF                           Topics: Prolog

 Name: Fernando Pereira             Name: Fernando Pereira
 Topics: NLP                        Topics: MPEG
Human Edits:
     Perform Coreference
     First: Fernando                    First: Fernando
     Last: Pereira                      Last: Pereira
     Institution: Google,UPenn          Institution: U. Edinburgh, SRI
     Topics: CRF, IE, NLP               Topics: logic programming,
     Venues: ICML, NIPS, EMNLP          AI, urban traffic modeling, NLP
                                        Venues: Logic programming


Name: Fernando Pereira                  Name: Fernando Pereira
Institution: Google                     Institution: U. Edinburgh
Name: Fernando Pereira                  Name: Fernando Pereira
Topics: CRF                             Topics: Prolog
Name: Fernando Pereira
Topics: NLP
                                 Name: Fernando Pereira
                                 Topics: MPEG
Human Edits:
Perform Coreference
First: Fernando             First: Fernando
Last: Pereira               Last: Pereira
Institution: Google,UPenn   Institution: U. Edinburgh, SRI
Topics: CRF, IE, NLP        Topics: logic programming,
Venues: ICML, NIPS, EMNLP   AI, urban traffic modeling, NLP
                            Venues: Logic programming
Human Edits:
Perform Coreference
First: Fernando                               First: Fernando
Last: Pereira                                 Last: Pereira
Institution: Google,UPenn                     Institution: U. Edinburgh, SRI
Topics: CRF, IE, NLP                          Topics: logic programming,
Venues: ICML, NIPS, EMNLP                     AI, urban traffic modeling, NLP
                                              Venues: Logic programming



                     Coref?
               Features:
               1. Institution overlap... NO
               2.Venue overlap... NO
               3. Topic overlap... LOW
               4. Should-link... YES
Human Edits:
Perform Coreference
First: Fernando                               First: Fernando
Last: Pereira                                 Last: Pereira
Institution: Google,UPenn                     Institution: U. Edinburgh, SRI
Topics: CRF, IE, NLP                          Topics: logic programming,
Venues: ICML, NIPS, EMNLP                     AI, urban traffic modeling, NLP
                                              Venues: Logic programming



                     Coref? YES
               Features:
               1. Institution overlap... NO
               2.Venue overlap... NO
               3. Topic overlap... LOW
               4. Should-link... YES
Incorrect edit
First: Fernando             First: Fernando
Last: Pereira               Last: Pereira
Institution: Google,UPenn   Institution: Superior Tecnic
Topics: CRF, IE, NLP        Topics: MPEG
Venues: ICML, NIPS, EMNLP   Venues: ICIP
Incorrect edit
First: Fernando                 First: Fernando
Last: Pereira                   Last: Pereira
Institution: Google,UPenn       Institution: Superior Tecnic
Topics: CRF, IE, NLP            Topics: MPEG
Venues: ICML, NIPS, EMNLP       Venues: ICIP



                       Coref?
Incorrect edit
First: Fernando                                  First: Fernando
Last: Pereira                                    Last: Pereira
Institution: Google,UPenn                        Institution: Superior Tecnic
Topics: CRF, IE, NLP                             Topics: MPEG
Venues: ICML, NIPS, EMNLP                        Venues: ICIP



                        Coref?
                  Features:
                  1. Institution overlap... NO
                  2.Venue overlap... NO
                  3. Topic overlap... NO
                  4. Should-link... YES
Incorrect edit
First: Fernando                                  First: Fernando
Last: Pereira                                    Last: Pereira
Institution: Google,UPenn                        Institution: Superior Tecnic
Topics: CRF, IE, NLP                             Topics: MPEG
Venues: ICML, NIPS, EMNLP                        Venues: ICIP



                        Coref? NO
                  Features:
                  1. Institution overlap... NO
                  2.Venue overlap... NO
                  3. Topic overlap... NO
                  4. Should-link... YES
Experiments
1. Build initial KB with automatic coreference
Experiments
1. Build initial KB with automatic coreference
Experiments
1. Build initial KB with automatic coreference




2. Simulate user edits
           good edit                    bad edit

                                         must-link
          must-link
Experiments
1. Build initial KB with automatic coreference




2. Simulate user edits
           good edit                       bad edit

                                            must-link
            must-link




3. Apply edits: our probabilistic vs two deterministic approaches
Hierarchical + Human Edits
                     Better incorporation of correct human edits
                                 Database quality versus the number of correct human edits

                          Edit incorporation strategy
                                                                                                  Our probabilistic
              0.80




                      Epistemological (probabilistic)
                      Overwrite
                      Maximally satisfy
                                                                                                  reasoning
              0.75
              0.70
F1 accuracy




                                                                                                  Local
              0.65




                                                                                                  satisfaction
              0.60




                                                                                                  Traditional
              0.55




                                                                                                  Overwrite
                      0                 5               10          15            20   25    30

                                                             No. of human edits
Hierarchical + Human Edits
                       More robust to incorrect human edits
                        Database quality versus the number of errorful human edits


                                              Our probabilistic     Edit incorporation strategy
            0.8




                                                                    Epistemological (probabilistic)
                                              reasoning             Complete trust in users
            0.7
            0.6
Precision

            0.5




                                             Complete trust
                                             in humans
            0.4




                   0        10          20          30         40             50              60
Come see our poster!
• Technical details including
 - Hierarchical CRF for coreference
 - MCMC for inference
• Probabilistic incorporation of human edits
• Epistemological Databases

                 THANK YOU

Weitere ähnliche Inhalte

Ähnlich wie Michael Wick - Human Machine Cooperation: User Corrections for AKBC

Improving search with neural ranking methods
Improving search with neural ranking methodsImproving search with neural ranking methods
Improving search with neural ranking methodsvoginip
 
Large Components in the Rearview Mirror
Large Components in the Rearview MirrorLarge Components in the Rearview Mirror
Large Components in the Rearview MirrorMichelle Brush
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big DataMiguel Pastor
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptxfahmi324663
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
 
Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of tools
Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of toolsAnthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of tools
Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of toolsAnthony Nyström
 
Distributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in PythonDistributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in PythonClare Corthell
 
Purpose of programming and the Clojure Nirvana
Purpose of programming and the Clojure NirvanaPurpose of programming and the Clojure Nirvana
Purpose of programming and the Clojure NirvanaJoão Vazão Vasques
 
Deep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationDeep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationHPCC Systems
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
AI Technology Overview and Career Advice
AI Technology Overview and Career AdviceAI Technology Overview and Career Advice
AI Technology Overview and Career AdviceKunling Geng
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladevPavel Tsukanov
 
From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsAndre Freitas
 

Ähnlich wie Michael Wick - Human Machine Cooperation: User Corrections for AKBC (20)

Improving search with neural ranking methods
Improving search with neural ranking methodsImproving search with neural ranking methods
Improving search with neural ranking methods
 
Being Professional
Being ProfessionalBeing Professional
Being Professional
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
Large Components in the Rearview Mirror
Large Components in the Rearview MirrorLarge Components in the Rearview Mirror
Large Components in the Rearview Mirror
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptx
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of tools
Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of toolsAnthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of tools
Anthony Nystrom - Intridea - Date Science in the NOW, it takes an Army of tools
 
Distributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in PythonDistributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in Python
 
Purpose of programming and the Clojure Nirvana
Purpose of programming and the Clojure NirvanaPurpose of programming and the Clojure Nirvana
Purpose of programming and the Clojure Nirvana
 
Deep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text ClassificationDeep Content Learning in Traffic Prediction and Text Classification
Deep Content Learning in Traffic Prediction and Text Classification
 
2019 Triangle Machine Learning Day - Stacking Audience Models - Adaptive Deep...
2019 Triangle Machine Learning Day - Stacking Audience Models - Adaptive Deep...2019 Triangle Machine Learning Day - Stacking Audience Models - Adaptive Deep...
2019 Triangle Machine Learning Day - Stacking Audience Models - Adaptive Deep...
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
AI Technology Overview and Career Advice
AI Technology Overview and Career AdviceAI Technology Overview and Career Advice
AI Technology Overview and Career Advice
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladev
 
11 Python CBSE Syllabus
11    Python CBSE Syllabus11    Python CBSE Syllabus
11 Python CBSE Syllabus
 
11 syllabus
11    syllabus11    syllabus
11 syllabus
 
From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic Applications
 

Kürzlich hochgeladen

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEaurabinda banchhor
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 

Kürzlich hochgeladen (20)

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Dust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSEDust Of Snow By Robert Frost Class-X English CBSE
Dust Of Snow By Robert Frost Class-X English CBSE
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 

Michael Wick - Human Machine Cooperation: User Corrections for AKBC

  • 1. Human Machine Cooperation: User Corrections for AKBC Michael Wick, Karl Schultz, Andrew McCallum University of Massachusetts, Amherst.
  • 2. Motivation • KBs for real-world decision making • Problem: data needs integration • AKBC/IE is scalable, but inaccurate • Humans are more accurate, lack coverage • Question: how do we combine human and machine KBC?
  • 3. Goal: build a database of every scientist in the world.
  • 4. Knowledge Base Construction .pdf Text Text .bib docs Structured docs .html Data query Entity Relation Entities, Mentions Mentions Relations Entity Relation Resolution KB Extraction Extraction (Coref) Wei Li Attends( Wei Li W. Li Wei Li, W. Li Xinghua U. Xinghua U.) Xinghua U. “truth” answer Problem: (1) errors snowball in IE pipeline (2) errors persist in DB - forever
  • 5. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming id=5 id=3 id=1
  • 6. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming id=5 id=3 Coref? id=1
  • 7. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming id=5 id=3 Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW id=1
  • 8. KB Coreference Errors First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming id=5 id=3 Coref? NO Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW id=1
  • 9. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ”
  • 10. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 is Fernando Pereira with id=1”
  • 11. Human Edits to Coreference “Fernando Pereira with id=5 is Fernando Pereira with id=3 ” “Fernando Pereira with id=2 is Fernando Pereira with id=1” “Fernando Pereira with id=5 is Fernando Pereira with id=4”
  • 12. How should these edits be managed?
  • 13. Edits to Coreference KB with coref errors
  • 14. Edits to Coreference KB with coref errors
  • 15. Edits to Coreference KB with coref errors Stream of user edits good edit bad edit must-link must-link
  • 16. Edits to Coreference KB with coref errors Stream of user edits good edit bad edit must-link must-link Incorporate edits: how do we resolve conflicts?
  • 17. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  • 18. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  • 19. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 2 then 1
  • 20. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  • 21. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  • 22. Strategy 1: Most recent edit gets priority Edit 1: good edit Edit 2: bad edit must-link must-link Edit order: 1 then 2
  • 23. Strategy 2: Deterministic integration of edits Edit 1: good edit Edit 2: bad edit must-link must-link ity si e an rc tiv tr fo En
  • 24. Strategy 2: Deterministic integration of edits Edit 1: good edit Edit 2: bad edit must-link must-link ity si e an rc tiv tr fo En
  • 25. How should edits be managed?
  • 26. How should edits be managed? • User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” changes over time
  • 27. How should edits be managed? • User modification of “the truth” is risky • Humans disagree • Humans make mistakes • “Truth” changes over time • Our approach: • edits as statistical evidence • “truth” inferred from evidence
  • 28. What is the truth?
  • 29. What is the truth? The truth Evidence
  • 30. What is the truth? The truth Evidence Unstructured data (e.g.PDFs)
  • 31. What is the truth? The truth Structured data (e.g., ACM, DBLP) Evidence Unstructured data (e.g.PDFs)
  • 32. What is the truth? The truth Structured data (e.g., ACM, DBLP) Evidence User edits Unstructured data (e.g.PDFs)
  • 33. What is the truth? The Infered by MCMC truth Structured data IE models (e.g., ACM, DBLP) (e.g., CRFs) Evidence User edits Unstructured data (e.g.PDFs)
  • 34. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh”
  • 35. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira is the Prolog Fernando Pereira”
  • 36. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  • 37. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Institution: Google “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  • 38. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  • 39. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  • 40. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: CRF must-link Topics: Prolog “The NLP Fernando Pereira is the MPEG Fernando Pereira”
  • 41. Human Edits as Evidence “The Fernando Pereira at Google is the Fernando Pereira at U. Edinburgh” Name: Fernando Pereira Name: Fernando Pereira Institution: Google must-link Institution: U. Edinburgh “The CRF Fernando Pereira is the Prolog Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: CRF must-link Topics: Prolog “The NLP Fernando Pereira is the MPEG Fernando Pereira” Name: Fernando Pereira Name: Fernando Pereira Topics: NLP must-link Topics: MPEG
  • 42. Human Edits: Mentions Added to DB First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Name: Fernando Pereira Name: Fernando Pereira Institution: Google Institution: U. Edinburgh Name: Fernando Pereira Name: Fernando Pereira Topics: CRF Topics: Prolog Name: Fernando Pereira Name: Fernando Pereira Topics: NLP Topics: MPEG
  • 43. Human Edits: Perform Coreference First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Name: Fernando Pereira Name: Fernando Pereira Institution: Google Institution: U. Edinburgh Name: Fernando Pereira Name: Fernando Pereira Topics: CRF Topics: Prolog Name: Fernando Pereira Topics: NLP Name: Fernando Pereira Topics: MPEG
  • 44. Human Edits: Perform Coreference First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming
  • 45. Human Edits: Perform Coreference First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW 4. Should-link... YES
  • 46. Human Edits: Perform Coreference First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: U. Edinburgh, SRI Topics: CRF, IE, NLP Topics: logic programming, Venues: ICML, NIPS, EMNLP AI, urban traffic modeling, NLP Venues: Logic programming Coref? YES Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... LOW 4. Should-link... YES
  • 47. Incorrect edit First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: Superior Tecnic Topics: CRF, IE, NLP Topics: MPEG Venues: ICML, NIPS, EMNLP Venues: ICIP
  • 48. Incorrect edit First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: Superior Tecnic Topics: CRF, IE, NLP Topics: MPEG Venues: ICML, NIPS, EMNLP Venues: ICIP Coref?
  • 49. Incorrect edit First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: Superior Tecnic Topics: CRF, IE, NLP Topics: MPEG Venues: ICML, NIPS, EMNLP Venues: ICIP Coref? Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... NO 4. Should-link... YES
  • 50. Incorrect edit First: Fernando First: Fernando Last: Pereira Last: Pereira Institution: Google,UPenn Institution: Superior Tecnic Topics: CRF, IE, NLP Topics: MPEG Venues: ICML, NIPS, EMNLP Venues: ICIP Coref? NO Features: 1. Institution overlap... NO 2.Venue overlap... NO 3. Topic overlap... NO 4. Should-link... YES
  • 51. Experiments 1. Build initial KB with automatic coreference
  • 52. Experiments 1. Build initial KB with automatic coreference
  • 53. Experiments 1. Build initial KB with automatic coreference 2. Simulate user edits good edit bad edit must-link must-link
  • 54. Experiments 1. Build initial KB with automatic coreference 2. Simulate user edits good edit bad edit must-link must-link 3. Apply edits: our probabilistic vs two deterministic approaches
  • 55. Hierarchical + Human Edits Better incorporation of correct human edits Database quality versus the number of correct human edits Edit incorporation strategy Our probabilistic 0.80 Epistemological (probabilistic) Overwrite Maximally satisfy reasoning 0.75 0.70 F1 accuracy Local 0.65 satisfaction 0.60 Traditional 0.55 Overwrite 0 5 10 15 20 25 30 No. of human edits
  • 56. Hierarchical + Human Edits More robust to incorrect human edits Database quality versus the number of errorful human edits Our probabilistic Edit incorporation strategy 0.8 Epistemological (probabilistic) reasoning Complete trust in users 0.7 0.6 Precision 0.5 Complete trust in humans 0.4 0 10 20 30 40 50 60
  • 57.
  • 58. Come see our poster! • Technical details including - Hierarchical CRF for coreference - MCMC for inference • Probabilistic incorporation of human edits • Epistemological Databases THANK YOU

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n
  87. \n
  88. \n
  89. \n
  90. \n
  91. \n
  92. \n
  93. \n
  94. \n
  95. \n
  96. \n
  97. \n
  98. \n
  99. \n
  100. \n
  101. \n
  102. \n
  103. \n
  104. \n
  105. \n
  106. *Reminder of an epistemological database: streaming evidence is stored, truth is inferred\n *“Coref is the foundation for everything”\n *“Coref everywhere”\n * I will speak today about our work scaling coreference to large scales\n
  107. \n
  108. \n
  109. \n
  110. \n
  111. \n
  112. \n
  113. \n
  114. \n
  115. \n
  116. \n
  117. \n
  118. \n
  119. \n
  120. \n
  121. \n
  122. \n
  123. \n
  124. \n
  125. \n
  126. \n
  127. \n
  128. \n
  129. \n
  130. \n
  131. \n
  132. \n
  133. \n
  134. \n
  135. \n
  136. \n
  137. \n
  138. \n
  139. \n
  140. \n
  141. \n
  142. \n
  143. \n