SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Relationship Extraction from Text

   Extending the Espresso Method for Greater Recall


                  Derek Springer
         UCLA Computer Science Department
                November 19, 2009
Related Works

• Ganapathi, Swathi. “Relationship Extraction from Text:
  Comparison and Experimental Evaluation of the State-of-
  the-Art.” UCLA comp exam. March 2009.
• Chu, A., Sakurai, S., Cárdenas, A. F., "Automatic Detection
  of Treatment Relationships in Patent Retrieval." 2008 CIKM
  Patent Information Retrieval Workshop. October 2008.
Related Works, cont'd

• Girju, R. "Automatic Detection of Causal Relations for
  Question Answering." In the proceedings of the 41st Annual
  Meeting of the Association for Computational Linguistics
  (ACL 2003). Workshop on "Multilingual Summarization and
  Question Answering - Machine Learning and Beyond".
  2003.
• Pantel, Patrick and Pennacchiotti, Marco. "Espresso:
  Leveraging Generic Patterns for Automatically Harvesting
  Semantic Relations." In Proceedings of Conference on
  Computational Linguistics / Association for Computational
  Linguistics (COLING/ACL- 06). pp. 113-120.
  Sydney, Australia. 2006.
Relationship Extraction

• The task of recognizing the assertion of a
  particular relationship between two or more
  entities in text.
• Can aid in the development of
  standalone, intelligent, automated and adaptable
  user-specific content retrieval systems.
• We focus on extracting treatment relationships
  → A (subject) used to treat B (object).
Goals and Contributions

• Extended state-of-the-art Espresso relationship
  extraction system originally implemented by
  Ganapathi.
• Did an in-depth experimental evaluation of the
  developed system while comparing it to prior
  work (Chu, Ganapathi).
• Future goal is to use the system developed here
  as a plug for relationship feature extractor in
  iScore.
Integration Into iScore

• iScore presents additional articles based on an
  aggregate score of “interestingness.”
• We believe filtering articles based on
  relationships can improve the results of iScore.
• We hypothesize that extending the Espresso
  system implemented by Swathi Ganapathi will
  improve the ability of a system such as iScore to
  utilize relationship extraction as a feature.
Comparison Criteria

• Performance: Want system to have high
  precision and recall
• Minimal Supervision: Want system to require
  little to no human supervision
• Breadth: Want system to extract relations from
  varying corpus sizes, domains and formats.
• Generality: Want system to extract wide variety
  of relation types without losing its edge in any of
  the above criteria.
The Espresso Algorithm

• General purpose algorithm which can be used to
  extract a wide variety of binary relations.
• Requires minimal supervision. Only input is a
  small seed set of known relations.
• By looking at individual sentences in detecting
  relationships, works well on all kinds of corpora.
• On tests conducted by the creators of the
  algorithm, Espresso generated balanced
  precision and recall.
The Espresso Method
Extending Espresso

Ganapathi's                           37.8%
Implementation



Extension                             91.2%
Ganapathi's Implementation

• Ganapathi's approach uses lexico-syntactic
  patterns of the form NP1 VP NP2 (Verb category
  in Table 1).
• VP contains treatment verb or pattern and the
  two NPs would contain the subject and object.
• This structure is a very common
  relationship, accounting for 37.8% of all
  relationships.
Extension

• There still remains a large number of
  relationships that may provide fruitful results.
• Expanding the implementation to include:
  - Noun+Prep e.g. "X settlement with Y"
  - Verb+Prep e.g. "X moved to Y"
  - Infinitive e.g. "X plans to acquire Y" and
  - Modifier e.g. "X is Y winner" relationship
• Retrieves 91.2% of common relationships.
Test Corpora

• Patent Corpus: Developed by Shige
   o 50,000 drug patent documents from 2008 from Class 424 & 514 of
     the U.S. Patents Classification: “drug, bio-affecting and body
     treating compositions” and their subclasses.
   o Patents were pre-filtered to only contain keywords
     “diabetes”, “metastatic”, “cancer”, “tuberculosis”, “lung”, “bronchitis”,
      “coronary artery”
   o All sentences from each document added to a sentence table in the
     schema
• PubMed Corpus: Developed by Gustavo
   o Comprised of medical abstracts from PubMed
   o Each abstract was parsed and all sentences from each abstract
     was stored as individual tuples in the sentence table
Performance Measures
Seed Treatment Relationships

•   (Xanax, Anxiety)           •   (Glycoside, Depression)
•   (Ambien, Insomnia)         •   (Ibuprofen, Arthritis)
•   (Effexor, Depression)      •   (Ibuprofen, Headache)
•   (Paxil, Depression)        •   (Tylenol, Fever)
•   (Lexapro, Depression)      •   (Tylenol, Headache)
•   (Caffeine, Depression)     •   (Antibody, Inflammation)
•   (Zoloft, Depression)       •   (Ibuprofen, Inflammation)
•   (Imipramine, Depression)   •   (Surgery, Glaucoma)
Procedure

1.Re-tag original data set to incorporate extended
  relationship types.
2.Re-run Ganapathi's baseline Espresso
  implementation to compare against updated data
  set.
3.Run extended Espresso implementation to
  compare against updated data set.
Experiment #1: Extraction on Drug
        Patent Corpus
• Drug Patent corpus used.
• Algorithm was run with seed relations and 12 verbs were extracted as
  being relevant (verbs with rπ greater than 0.2).
• These treatment verbs were used to create a test sentence set of 120
  sentences i.e. 10 sentences containing a treatment verb for every
  relevant treatment verb.
• 358 possible relations were extracted for each of which we calculated
  the ri score.
• 208 relations were obtained with ri score greater than the threshold out
  of which 126 were actually correct (through manual tagging).
• Of the original 358 relations, manual tagging determined that 213 of
  them were correct treatment relations.
Experiment #1 Results
Experiment #2: Number of
 Relationships and Performance
• Drug Patent corpus used.
• Test the performance of the system under
  smaller and larger data loads.
• Started with initial set of 120 sentences obtained
  from Drug Patent corpus (10 sentences for each
  verb, 12 verbs as in test #1)
• Increased the number of sentences for each
  verb by 10 in each case, so that we had
  sentence sets of 240 and 360 sentences each
Experiment #2 Results
Experiment #2 Analysis

• Performance of the system and the number of
  relationships are inversely related.
• ri scores are affected inversely by the max pmi across
  all relationship instances, it is possible that having more
  relationship instances in a set lowers the ri for all those
  relationships.
• more relationships => chance of a greater max pmi =>
  lowered ri for all relationship instances.
• Not worried → articles likely won't have 200 relations of
  the same type.
Experiment #3: Extraction on
          PubMed Corpus
• PubMed corpus used.
• Want to test the performance of the system on a different
  type and sized corpus
• Algorithm was run with input seed relations on this corpus
  and10 verbs with the topmost rπ values were extracted
• We constructed a test sentence set of 80 sentences (8
  sentences for every relevant verb)
• We then extracted a total of 162 relations from this test set
  and calculated their ri scores.
• The average ri score was used as the threshold value
Experiment #3 Results
Comparison Over Both Corpora
Experiment #3 Analysis

• Performance is worse on PubMed corpus.
• Patent corpus dealt with drugs and cures for diseases.
• Therefore, there was an abundance of treatment type
  relations in patent corpus.
• PubMed had more general medical data and only
  contained abstracts => less info.
• Therefore, there were fewer treatment relations in
  PubMed which affected performance.
Comparison with Previous Work




               * signifies our contribution
Analysis

• F-score of Ganapathi's version of Espresso fell
  nearly 10% → due to lower recall, as predicted.
• Results of extension over the re-tagged data are
  on par with Ganapathi's original results.
• When you consider that Ganapathi's system
  dropped nearly 10%, it seems to indicate the
  increased general purpose nature of the
  extension over the original version.
Success

• Recall of system is more important than
  precision, especially when it comes to using
  relationships as a feature in iScore.
• Method is almost completely automated.
• Easily expanded to extract other relationship types by
  changing the input seed relations.
• Initial results seem insignificant, but analysis indicates
  that extended system has the potential to be a general-
  purpose relationship extraction feature.
Future Work

• Development of a relationship feature extractor
  for iScore.
• Relations will have to be syntactically and
  semantically compared with relations present in
  other articles and the best article matches will be
  returned as “interesting” choices for a user.
• Optimizations: algorithm design
  improvements, database connection
  optimizations and parallelization.

Weitere ähnliche Inhalte

Was ist angesagt?

Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingShakas Technologies
 
ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19Angelo Pugliese
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataGreg Landrum
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 
Ieee transactions on 2018 knowledge and data engineering topics with abstract .
Ieee transactions on 2018 knowledge and data engineering topics with abstract .Ieee transactions on 2018 knowledge and data engineering topics with abstract .
Ieee transactions on 2018 knowledge and data engineering topics with abstract .tsysglobalsolutions
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Pistoia Alliance
 
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...ijcseit
 

Was ist angesagt? (10)

Efficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity rankingEfficient instant fuzzy search with proximity ranking
Efficient instant fuzzy search with proximity ranking
 
FINAL REVIEW
FINAL REVIEWFINAL REVIEW
FINAL REVIEW
 
ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Instance Matching Benchmarks for Linked Data - ESWC 2016 Tutorial
Instance Matching Benchmarks for Linked Data - ESWC 2016 TutorialInstance Matching Benchmarks for Linked Data - ESWC 2016 Tutorial
Instance Matching Benchmarks for Linked Data - ESWC 2016 Tutorial
 
Ieee transactions on 2018 knowledge and data engineering topics with abstract .
Ieee transactions on 2018 knowledge and data engineering topics with abstract .Ieee transactions on 2018 knowledge and data engineering topics with abstract .
Ieee transactions on 2018 knowledge and data engineering topics with abstract .
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SW...
 

Andere mochten auch

Cytokine purine interactions in behavioral depression in rats
Cytokine purine interactions in behavioral depression in ratsCytokine purine interactions in behavioral depression in rats
Cytokine purine interactions in behavioral depression in ratszpzp0312
 
Screen Robots: UI Tests in Espresso
Screen Robots: UI Tests in EspressoScreen Robots: UI Tests in Espresso
Screen Robots: UI Tests in EspressoAnnyce Davis
 
Ui testing with espresso
Ui testing with espressoUi testing with espresso
Ui testing with espressoDroidcon Spain
 
Oh so you test? - A guide to testing on Android from Unit to Mutation
Oh so you test? - A guide to testing on Android from Unit to MutationOh so you test? - A guide to testing on Android from Unit to Mutation
Oh so you test? - A guide to testing on Android from Unit to MutationPaul Blundell
 
Do You Enjoy Espresso in Android App Testing?
Do You Enjoy Espresso in Android App Testing?Do You Enjoy Espresso in Android App Testing?
Do You Enjoy Espresso in Android App Testing?Bitbar
 
A guide to Android automated testing
A guide to Android automated testingA guide to Android automated testing
A guide to Android automated testingjotaemepereira
 
Fast deterministic screenshot tests for Android
Fast deterministic screenshot tests for AndroidFast deterministic screenshot tests for Android
Fast deterministic screenshot tests for AndroidArnold Noronha
 
Testing android apps with espresso
Testing android apps with espressoTesting android apps with espresso
Testing android apps with espressoÉdipo Souza
 
Robotium vs Espresso: Get ready to rumble ! - DroidCon Paris 2014
Robotium vs Espresso: Get ready to rumble ! - DroidCon Paris 2014Robotium vs Espresso: Get ready to rumble ! - DroidCon Paris 2014
Robotium vs Espresso: Get ready to rumble ! - DroidCon Paris 2014Paris Android User Group
 
CITOQUINAS. Fisiología General
CITOQUINAS. Fisiología GeneralCITOQUINAS. Fisiología General
CITOQUINAS. Fisiología GeneralLola FFB
 
Utilizando Espresso e UIAutomator no Teste de Apps Android
Utilizando Espresso e UIAutomator no Teste de Apps AndroidUtilizando Espresso e UIAutomator no Teste de Apps Android
Utilizando Espresso e UIAutomator no Teste de Apps AndroidEduardo Carrara de Araujo
 
Android Unit Tesing at I/O rewind 2015
Android Unit Tesing at I/O rewind 2015Android Unit Tesing at I/O rewind 2015
Android Unit Tesing at I/O rewind 2015Somkiat Puisungnoen
 

Andere mochten auch (20)

Cytokine purine interactions in behavioral depression in rats
Cytokine purine interactions in behavioral depression in ratsCytokine purine interactions in behavioral depression in rats
Cytokine purine interactions in behavioral depression in rats
 
Extracto La hija de los sueños
Extracto La hija de los sueñosExtracto La hija de los sueños
Extracto La hija de los sueños
 
Por qué las suecas son mito erótico
Por qué las suecas son mito erótico Por qué las suecas son mito erótico
Por qué las suecas son mito erótico
 
Screen Robots: UI Tests in Espresso
Screen Robots: UI Tests in EspressoScreen Robots: UI Tests in Espresso
Screen Robots: UI Tests in Espresso
 
Ui testing with espresso
Ui testing with espressoUi testing with espresso
Ui testing with espresso
 
Oh so you test? - A guide to testing on Android from Unit to Mutation
Oh so you test? - A guide to testing on Android from Unit to MutationOh so you test? - A guide to testing on Android from Unit to Mutation
Oh so you test? - A guide to testing on Android from Unit to Mutation
 
Do You Enjoy Espresso in Android App Testing?
Do You Enjoy Espresso in Android App Testing?Do You Enjoy Espresso in Android App Testing?
Do You Enjoy Espresso in Android App Testing?
 
A guide to Android automated testing
A guide to Android automated testingA guide to Android automated testing
A guide to Android automated testing
 
Fast deterministic screenshot tests for Android
Fast deterministic screenshot tests for AndroidFast deterministic screenshot tests for Android
Fast deterministic screenshot tests for Android
 
Android Espresso
Android EspressoAndroid Espresso
Android Espresso
 
Testing android apps with espresso
Testing android apps with espressoTesting android apps with espresso
Testing android apps with espresso
 
Automation test for Android
Automation test for AndroidAutomation test for Android
Automation test for Android
 
Robotium vs Espresso: Get ready to rumble ! - DroidCon Paris 2014
Robotium vs Espresso: Get ready to rumble ! - DroidCon Paris 2014Robotium vs Espresso: Get ready to rumble ! - DroidCon Paris 2014
Robotium vs Espresso: Get ready to rumble ! - DroidCon Paris 2014
 
CITOQUINAS. Fisiología General
CITOQUINAS. Fisiología GeneralCITOQUINAS. Fisiología General
CITOQUINAS. Fisiología General
 
Android Test Automation Workshop
Android Test Automation WorkshopAndroid Test Automation Workshop
Android Test Automation Workshop
 
Abordaje depresion y tiroides svp
Abordaje depresion y tiroides svpAbordaje depresion y tiroides svp
Abordaje depresion y tiroides svp
 
Interleucinas-Citocinas
Interleucinas-CitocinasInterleucinas-Citocinas
Interleucinas-Citocinas
 
Utilizando Espresso e UIAutomator no Teste de Apps Android
Utilizando Espresso e UIAutomator no Teste de Apps AndroidUtilizando Espresso e UIAutomator no Teste de Apps Android
Utilizando Espresso e UIAutomator no Teste de Apps Android
 
Android Unit Tesing at I/O rewind 2015
Android Unit Tesing at I/O rewind 2015Android Unit Tesing at I/O rewind 2015
Android Unit Tesing at I/O rewind 2015
 
Espresso Barista
Espresso BaristaEspresso Barista
Espresso Barista
 

Ähnlich wie Extending the Espresso Method for Greater Recall

Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003robertstevens65
 
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
 
Query aware determinization of uncertain
Query aware determinization of uncertainQuery aware determinization of uncertain
Query aware determinization of uncertainjpstudcorner
 
Rule based method for entity resolution
Rule based method for entity resolutionRule based method for entity resolution
Rule based method for entity resolutionieeepondy
 
G filter a general gram filter for string similarity search
G filter a general gram filter for string similarity searchG filter a general gram filter for string similarity search
G filter a general gram filter for string similarity searchieeepondy
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSgerogepatton
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Projectvaibhavdeoda
 
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...ijiert bestjournal
 
Expert System With Python -1
Expert System With Python -1Expert System With Python -1
Expert System With Python -1Ahmad Hussein
 
Meta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationMeta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationPriyatham Bollimpalli
 
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumJonathas Magalhães
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 

Ähnlich wie Extending the Espresso Method for Greater Recall (20)

Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Query aware determinization of uncertain
Query aware determinization of uncertainQuery aware determinization of uncertain
Query aware determinization of uncertain
 
Rule based method for entity resolution
Rule based method for entity resolutionRule based method for entity resolution
Rule based method for entity resolution
 
G filter a general gram filter for string similarity search
G filter a general gram filter for string similarity searchG filter a general gram filter for string similarity search
G filter a general gram filter for string similarity search
 
weka data mining
weka data mining weka data mining
weka data mining
 
UNIT 4.pptx
UNIT 4.pptxUNIT 4.pptx
UNIT 4.pptx
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMSA COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
A COMPARISON OF DOCUMENT SIMILARITY ALGORITHMS
 
Gene Ontology Project
Gene Ontology ProjectGene Ontology Project
Gene Ontology Project
 
Competition16
Competition16Competition16
Competition16
 
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
A SURVEY PAPER ON EXTRACTION OF OPINION WORD AND OPINION TARGET FROM ONLINE R...
 
Expert System With Python -1
Expert System With Python -1Expert System With Python -1
Expert System With Python -1
 
Meta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter OptimizationMeta Machine Learning: Hyperparameter Optimization
Meta Machine Learning: Hyperparameter Optimization
 
Recommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User CurriculumRecommending Scientific Papers: Investigating the User Curriculum
Recommending Scientific Papers: Investigating the User Curriculum
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
Query processing System
Query processing SystemQuery processing System
Query processing System
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Extending the Espresso Method for Greater Recall

  • 1. Relationship Extraction from Text Extending the Espresso Method for Greater Recall Derek Springer UCLA Computer Science Department November 19, 2009
  • 2. Related Works • Ganapathi, Swathi. “Relationship Extraction from Text: Comparison and Experimental Evaluation of the State-of- the-Art.” UCLA comp exam. March 2009. • Chu, A., Sakurai, S., Cárdenas, A. F., "Automatic Detection of Treatment Relationships in Patent Retrieval." 2008 CIKM Patent Information Retrieval Workshop. October 2008.
  • 3. Related Works, cont'd • Girju, R. "Automatic Detection of Causal Relations for Question Answering." In the proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003). Workshop on "Multilingual Summarization and Question Answering - Machine Learning and Beyond". 2003. • Pantel, Patrick and Pennacchiotti, Marco. "Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations." In Proceedings of Conference on Computational Linguistics / Association for Computational Linguistics (COLING/ACL- 06). pp. 113-120. Sydney, Australia. 2006.
  • 4. Relationship Extraction • The task of recognizing the assertion of a particular relationship between two or more entities in text. • Can aid in the development of standalone, intelligent, automated and adaptable user-specific content retrieval systems. • We focus on extracting treatment relationships → A (subject) used to treat B (object).
  • 5. Goals and Contributions • Extended state-of-the-art Espresso relationship extraction system originally implemented by Ganapathi. • Did an in-depth experimental evaluation of the developed system while comparing it to prior work (Chu, Ganapathi). • Future goal is to use the system developed here as a plug for relationship feature extractor in iScore.
  • 6. Integration Into iScore • iScore presents additional articles based on an aggregate score of “interestingness.” • We believe filtering articles based on relationships can improve the results of iScore. • We hypothesize that extending the Espresso system implemented by Swathi Ganapathi will improve the ability of a system such as iScore to utilize relationship extraction as a feature.
  • 7. Comparison Criteria • Performance: Want system to have high precision and recall • Minimal Supervision: Want system to require little to no human supervision • Breadth: Want system to extract relations from varying corpus sizes, domains and formats. • Generality: Want system to extract wide variety of relation types without losing its edge in any of the above criteria.
  • 8. The Espresso Algorithm • General purpose algorithm which can be used to extract a wide variety of binary relations. • Requires minimal supervision. Only input is a small seed set of known relations. • By looking at individual sentences in detecting relationships, works well on all kinds of corpora. • On tests conducted by the creators of the algorithm, Espresso generated balanced precision and recall.
  • 10. Extending Espresso Ganapathi's 37.8% Implementation Extension 91.2%
  • 11. Ganapathi's Implementation • Ganapathi's approach uses lexico-syntactic patterns of the form NP1 VP NP2 (Verb category in Table 1). • VP contains treatment verb or pattern and the two NPs would contain the subject and object. • This structure is a very common relationship, accounting for 37.8% of all relationships.
  • 12. Extension • There still remains a large number of relationships that may provide fruitful results. • Expanding the implementation to include: - Noun+Prep e.g. "X settlement with Y" - Verb+Prep e.g. "X moved to Y" - Infinitive e.g. "X plans to acquire Y" and - Modifier e.g. "X is Y winner" relationship • Retrieves 91.2% of common relationships.
  • 13. Test Corpora • Patent Corpus: Developed by Shige o 50,000 drug patent documents from 2008 from Class 424 & 514 of the U.S. Patents Classification: “drug, bio-affecting and body treating compositions” and their subclasses. o Patents were pre-filtered to only contain keywords “diabetes”, “metastatic”, “cancer”, “tuberculosis”, “lung”, “bronchitis”, “coronary artery” o All sentences from each document added to a sentence table in the schema • PubMed Corpus: Developed by Gustavo o Comprised of medical abstracts from PubMed o Each abstract was parsed and all sentences from each abstract was stored as individual tuples in the sentence table
  • 15. Seed Treatment Relationships • (Xanax, Anxiety) • (Glycoside, Depression) • (Ambien, Insomnia) • (Ibuprofen, Arthritis) • (Effexor, Depression) • (Ibuprofen, Headache) • (Paxil, Depression) • (Tylenol, Fever) • (Lexapro, Depression) • (Tylenol, Headache) • (Caffeine, Depression) • (Antibody, Inflammation) • (Zoloft, Depression) • (Ibuprofen, Inflammation) • (Imipramine, Depression) • (Surgery, Glaucoma)
  • 16. Procedure 1.Re-tag original data set to incorporate extended relationship types. 2.Re-run Ganapathi's baseline Espresso implementation to compare against updated data set. 3.Run extended Espresso implementation to compare against updated data set.
  • 17. Experiment #1: Extraction on Drug Patent Corpus • Drug Patent corpus used. • Algorithm was run with seed relations and 12 verbs were extracted as being relevant (verbs with rπ greater than 0.2). • These treatment verbs were used to create a test sentence set of 120 sentences i.e. 10 sentences containing a treatment verb for every relevant treatment verb. • 358 possible relations were extracted for each of which we calculated the ri score. • 208 relations were obtained with ri score greater than the threshold out of which 126 were actually correct (through manual tagging). • Of the original 358 relations, manual tagging determined that 213 of them were correct treatment relations.
  • 19. Experiment #2: Number of Relationships and Performance • Drug Patent corpus used. • Test the performance of the system under smaller and larger data loads. • Started with initial set of 120 sentences obtained from Drug Patent corpus (10 sentences for each verb, 12 verbs as in test #1) • Increased the number of sentences for each verb by 10 in each case, so that we had sentence sets of 240 and 360 sentences each
  • 21. Experiment #2 Analysis • Performance of the system and the number of relationships are inversely related. • ri scores are affected inversely by the max pmi across all relationship instances, it is possible that having more relationship instances in a set lowers the ri for all those relationships. • more relationships => chance of a greater max pmi => lowered ri for all relationship instances. • Not worried → articles likely won't have 200 relations of the same type.
  • 22. Experiment #3: Extraction on PubMed Corpus • PubMed corpus used. • Want to test the performance of the system on a different type and sized corpus • Algorithm was run with input seed relations on this corpus and10 verbs with the topmost rπ values were extracted • We constructed a test sentence set of 80 sentences (8 sentences for every relevant verb) • We then extracted a total of 162 relations from this test set and calculated their ri scores. • The average ri score was used as the threshold value
  • 25. Experiment #3 Analysis • Performance is worse on PubMed corpus. • Patent corpus dealt with drugs and cures for diseases. • Therefore, there was an abundance of treatment type relations in patent corpus. • PubMed had more general medical data and only contained abstracts => less info. • Therefore, there were fewer treatment relations in PubMed which affected performance.
  • 26. Comparison with Previous Work * signifies our contribution
  • 27. Analysis • F-score of Ganapathi's version of Espresso fell nearly 10% → due to lower recall, as predicted. • Results of extension over the re-tagged data are on par with Ganapathi's original results. • When you consider that Ganapathi's system dropped nearly 10%, it seems to indicate the increased general purpose nature of the extension over the original version.
  • 28. Success • Recall of system is more important than precision, especially when it comes to using relationships as a feature in iScore. • Method is almost completely automated. • Easily expanded to extract other relationship types by changing the input seed relations. • Initial results seem insignificant, but analysis indicates that extended system has the potential to be a general- purpose relationship extraction feature.
  • 29. Future Work • Development of a relationship feature extractor for iScore. • Relations will have to be syntactically and semantically compared with relations present in other articles and the best article matches will be returned as “interesting” choices for a user. • Optimizations: algorithm design improvements, database connection optimizations and parallelization.