SlideShare a Scribd company logo
1 of 20
The Triplex Approach for
Recognizing Semantic Relations
from Noun Phrases,
Appositions, and Adjectives
Iman Mirrezaei, Bruno Martins, and Isabel F.
Cruz
ADVIS Lab, Department of Computer Science,
University of Illinois at Chicago, USA
Instituto Superior Tecnico, Universidade de
Lisboa, Portugal
1
1
2
2
1
Motivation
 How to extract useful knowledge from
textual resources?
 How to identify relations between
entities?
2
Microsoft is an American corporation headquartered in
Redmond , Washington
Michelle Obama (born January 17, 1964), an
American lawyer and writer, is the wife of the ...
Triples
 Each triple represents an atomic fact by stating
a subject, a predicate (property) and an object
(value)
◦ e.g., “The sky has the color blue.” <the sky; has;
the color blue>
 Triples can be expressed by verbs, or by
particular noun phrases in textual resources
◦ Verb-mediated formats
◦ Noun-mediated formats
 An information extractor converts an input text
to a set of triples
3
Information extractors
 Verb-mediated triple extractors
◦ TextRunner [Banko et al. 2007], WOE [Wu and Weld
2010], ReVerb [Fader et al. 2011], and OLLIE
[Mausam et al. 2012]
◦ e.g., “Obama will be elected President of the
United States” <Obama; will be elected;
President of the United States>
 Noun-mediated triple extractors
◦ OLLIE: the first noun-mediated triple extractor
◦ OLLIE has patterns to extract noun-mediated
triples if they can also be expressed by a verb-
mediated format
◦ e.g., “Microsoft co-founder Bill Gates spoke at
the conference” <Bill Gates; be co-founder of;
Microsoft> 4
Information extractors
Deep
syntacti
c
features
Shallow
syntactic
features
Lexical
constraints
Type
constrains(e.g.,
person,
location, …)
TextRunne
r
WOE-pos
WOE-
parse
ReVerb
OLLIE
Triplex
(suggeste
d)
5
Noun-mediated triples
 Noun-mediated triples can be expressed
through noun phrase with adjectives,
compound nouns and appositions
 How to extract noun-mediated triples that are
not expressed via verb-mediated formats?
 How to extract templates automatically from
text to generate noun-mediated triples?
6
Architecture
Sentences
Template
Extractio
n
Stanford NLP
Toolkit
WordNet
Synonym sets
of Wikipedia
pages
Infoboxe
s
Sentence
Extraction
Text
7
The bootstrapping process
 A sentence of a wiki page is extracted if it
contains an infobox value (object) and a synset
member (subject)
◦ The sentence is checked if there is a dependency
path between object and subject (noun, adjective, or
apposition dependencies)
◦ Tokens in the dependency paths between subject and
object are annotated with POS tags, lexical
constraints, WordNet synsets and named entity tags
 Annotated paths are seen as extraction
templates
 Constraint on the length of the dependency
path
8
Example
 Microsoft Corporation is an American
multinational software corporation
headquartered in Redmond,
Washington that develops….
◦ vmod(corporation-8,
headquartered-9)
prep(headquartered-9, in-10)
nn(Washington-13, Redmond-11)
9
Microsoft is an American corporation headquartered in Redmond , Washington
NNP VBZ DT JJ NN VBN IN NNP , NNP
ORG O O MISC O O O LOC O LOC
O O O O ORG O O O O O
Infobox name: Headquarters
Infobox value: Redmond, Washington
Range of headquarters : Location
Synset member: Corporation
Synset member type: Organization
Lexical constraint: Headquarter in
Microsoft
corporation
Coreference
nn
vmod
prep-in
O O O O Subject O O Object
O: No label PER: person NUM: number ORG:
organization
Example
10
POS tags
Named Entities
WordNet synsets
Occurrences of
subject and object
Templates
 Templates express how a class of triples is
expressed in a sentence.
◦ Deep syntactic features: dependencies
◦ Shallow syntactic features: POS tags, noun
phrases
◦ Lexical features
◦ Named entity types: WordNet synsets
◦ Property ranges (Person, Organization,
Location, or unknown)
11
Triplex
 Confidence score for triples
◦ A logistic regression classifier
◦ Features: frequency of the extraction
templates, existence of lexical words, range
of properties, semantic object type
 Template matching
◦ Recognizing candidate subjects by NER
types and WordNet synsets
◦ The dependency paths between subject and
all potential objects are annotated
◦ Matching with templates
12
Evaluation
 Automatic evaluation according to the
procedure suggested by Bronzi et al.[2012]
◦ 1000 random sentences from Wikipedia
◦ Create a gold standard by using PMI, DBPedia,
and Freebase
 Manual evaluation
◦ 50 random sentences from Wikipedia
◦ The agreement between the automatic and
manual evaluation is about .71
13
The gold standard
 A fact is a triple <subject, property, object>
 All possible entities are recognized by NER
types and WordNet synsets
 All verbs(predicates) are detected by the
Stanford CoreNLP and predicates are
expanded by adding DBPedia and
Freebase properties
 All extracted facts of sentences are verified
by
◦ DBPedia
◦ Freebase
14
Evaluation results
Automatic evaluation Manual evaluation
Precision Recall
F-
measure
Precision Recall
F-
measure
REVERB
0.61 0.15 0.24 0.55 0.11 0.18
OLLIE
0.64 0.30 0.40 0.65 0.32 0.42
OLLIE*
0.62 0.1 0.17 0.63 0.11 0.18
Triplex
0.55 0.17 0.25 0.62 0.22 0.32
Triplex +
OLLIE
0.57 0.40 0.47 0.63 0.44 0.51
Triplex +
REVERB
0.58 0.32 0.41 0.55 0.35 0.42
OLLIE* only generates triples according to noun-mediated formats 15
Error analysis
Missed extractions
10% No semantic types
12% Dependency parser problems
7% Coreferencing errors
6% Over-generalized templates
65% Verb-mediated triples (outside the of scope for Triplex)
16
Correctly extracted triples
Distribution Triple category
Noun-
mediated
12%
Conjonctions, adjectives
and noun phrases
9%
Apposition and parenthetical
phrases
6% Titles or professions
8% Templates with lexicon
Verb-
mediated
65%
Verb-mediated triples
17
Conclusion
 Triplex generates noun-mediated
triples from compound nouns,
adjective, and appositions
 Triplex complements the output of
verb-mediated triple extractors
 IE systems like Triplex can assist
authors to annotate Wikipedia pages
(recognize missing infobox values)
18
Future works
 Improve results for triples involving
numerical values with different units
(i.e., square meter, meter)
 Enrich the bootstrapping process by
using a probabilistic
knowledgebase(e.g., Probase [2012])
19
References
 M. Banko, M.J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni:
Open Information Extraction for the Web. In: International Joint
Conferences on Artificial Intelligence (IJCAI). pp. 2670–2676 (2007)
 A. Fader, S. Soderland, O. Etzioni: Identifying Relations for Open
Information Extraction. In: Conference on Empirical Methods in
Natural Language Processing. pp. 1535–1545 (2011)
 Mausam, M. Schmitz, R. Bart, S. Soderland, O. Etzioni: Open
Language Learning for Information Extraction. In: Joint Conference
on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning. pp. 523–534 (2012)
 F. Wu, and D.S. Weld: Open Information Extraction Using Wikipedia.
In: Annual Meeting of the Association for Computational Linguistics.
pp. 118–127 (2010)
 M. Bronzi, Z. Guo, F. Mesquita, D. Barbosa, P. Merialdo : Automatic
Evaluation of Relation Extraction Systems on Large-scale. In: Joint
Workshop on Automatic Knowledge Base Construction and Web-
scale Knowledge Extraction. pp. 19–24 (2012)
 W. Wu, H. Li, H. Wang, K.Q. Zhu: Probase: A Probabilistic Taxonomy
for Text Understanding. In: ACM SIGMOD International Conference
on Management of Data. pp. 481–492 (2012)
20

More Related Content

What's hot

download
downloaddownload
download
butest
 
Linked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsLinked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender Systems
Vito Ostuni
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseKnowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuse
Andrea Nuzzolese
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
Jae Hong Kil
 
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and DatabasesESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
eswcsummerschool
 
Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignmentOntology engineering: Ontology alignment
Ontology engineering: Ontology alignment
Guus Schreiber
 

What's hot (20)

Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
 
Ontology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغهOntology development in protégé-آنتولوژی در پروتوغه
Ontology development in protégé-آنتولوژی در پروتوغه
 
download
downloaddownload
download
 
Linked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsLinked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender Systems
 
Learning ontologies
Learning ontologiesLearning ontologies
Learning ontologies
 
Oke
OkeOke
Oke
 
Pattern based approach for Natural Language Interface to Database
Pattern based approach for Natural Language Interface to DatabasePattern based approach for Natural Language Interface to Database
Pattern based approach for Natural Language Interface to Database
 
Knowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuseKnowledge Patterns for the Web: extraction, transformation, and reuse
Knowledge Patterns for the Web: extraction, transformation, and reuse
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
 
Ontology
Ontology Ontology
Ontology
 
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech DisambiguationElena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
Elena Bruches - The Hybrid Approach to Part-of-Speech Disambiguation
 
Neo4j GraphTour New YorkOntologies and Knowledge Graphs
Neo4j GraphTour New YorkOntologies and Knowledge GraphsNeo4j GraphTour New YorkOntologies and Knowledge Graphs
Neo4j GraphTour New YorkOntologies and Knowledge Graphs
 
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and DatabasesESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
ESWC SS 2012 - Monday Keynote Enrico Franconi: Ontologies and Databases
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big Data
 
4 semantic web and ontology
4 semantic web and ontology4 semantic web and ontology
4 semantic web and ontology
 
Context, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge OntologyContext, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge Ontology
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignmentOntology engineering: Ontology alignment
Ontology engineering: Ontology alignment
 
XML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling ApproachXML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling Approach
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 

Similar to The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
Uma Se
 
download
downloaddownload
download
butest
 
download
downloaddownload
download
butest
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
jcscholtes
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
Mohamed BEN ELLEFI
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger
 

Similar to The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives (20)

Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Web and text
Web and textWeb and text
Web and text
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
download
downloaddownload
download
 
download
downloaddownload
download
 
G04124041046
G04124041046G04124041046
G04124041046
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
EricRochesterResume
EricRochesterResumeEricRochesterResume
EricRochesterResume
 
From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic Applications
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 

Recently uploaded

CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

  • 1. The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives Iman Mirrezaei, Bruno Martins, and Isabel F. Cruz ADVIS Lab, Department of Computer Science, University of Illinois at Chicago, USA Instituto Superior Tecnico, Universidade de Lisboa, Portugal 1 1 2 2 1
  • 2. Motivation  How to extract useful knowledge from textual resources?  How to identify relations between entities? 2 Microsoft is an American corporation headquartered in Redmond , Washington Michelle Obama (born January 17, 1964), an American lawyer and writer, is the wife of the ...
  • 3. Triples  Each triple represents an atomic fact by stating a subject, a predicate (property) and an object (value) ◦ e.g., “The sky has the color blue.” <the sky; has; the color blue>  Triples can be expressed by verbs, or by particular noun phrases in textual resources ◦ Verb-mediated formats ◦ Noun-mediated formats  An information extractor converts an input text to a set of triples 3
  • 4. Information extractors  Verb-mediated triple extractors ◦ TextRunner [Banko et al. 2007], WOE [Wu and Weld 2010], ReVerb [Fader et al. 2011], and OLLIE [Mausam et al. 2012] ◦ e.g., “Obama will be elected President of the United States” <Obama; will be elected; President of the United States>  Noun-mediated triple extractors ◦ OLLIE: the first noun-mediated triple extractor ◦ OLLIE has patterns to extract noun-mediated triples if they can also be expressed by a verb- mediated format ◦ e.g., “Microsoft co-founder Bill Gates spoke at the conference” <Bill Gates; be co-founder of; Microsoft> 4
  • 6. Noun-mediated triples  Noun-mediated triples can be expressed through noun phrase with adjectives, compound nouns and appositions  How to extract noun-mediated triples that are not expressed via verb-mediated formats?  How to extract templates automatically from text to generate noun-mediated triples? 6
  • 8. The bootstrapping process  A sentence of a wiki page is extracted if it contains an infobox value (object) and a synset member (subject) ◦ The sentence is checked if there is a dependency path between object and subject (noun, adjective, or apposition dependencies) ◦ Tokens in the dependency paths between subject and object are annotated with POS tags, lexical constraints, WordNet synsets and named entity tags  Annotated paths are seen as extraction templates  Constraint on the length of the dependency path 8
  • 9. Example  Microsoft Corporation is an American multinational software corporation headquartered in Redmond, Washington that develops…. ◦ vmod(corporation-8, headquartered-9) prep(headquartered-9, in-10) nn(Washington-13, Redmond-11) 9
  • 10. Microsoft is an American corporation headquartered in Redmond , Washington NNP VBZ DT JJ NN VBN IN NNP , NNP ORG O O MISC O O O LOC O LOC O O O O ORG O O O O O Infobox name: Headquarters Infobox value: Redmond, Washington Range of headquarters : Location Synset member: Corporation Synset member type: Organization Lexical constraint: Headquarter in Microsoft corporation Coreference nn vmod prep-in O O O O Subject O O Object O: No label PER: person NUM: number ORG: organization Example 10 POS tags Named Entities WordNet synsets Occurrences of subject and object
  • 11. Templates  Templates express how a class of triples is expressed in a sentence. ◦ Deep syntactic features: dependencies ◦ Shallow syntactic features: POS tags, noun phrases ◦ Lexical features ◦ Named entity types: WordNet synsets ◦ Property ranges (Person, Organization, Location, or unknown) 11
  • 12. Triplex  Confidence score for triples ◦ A logistic regression classifier ◦ Features: frequency of the extraction templates, existence of lexical words, range of properties, semantic object type  Template matching ◦ Recognizing candidate subjects by NER types and WordNet synsets ◦ The dependency paths between subject and all potential objects are annotated ◦ Matching with templates 12
  • 13. Evaluation  Automatic evaluation according to the procedure suggested by Bronzi et al.[2012] ◦ 1000 random sentences from Wikipedia ◦ Create a gold standard by using PMI, DBPedia, and Freebase  Manual evaluation ◦ 50 random sentences from Wikipedia ◦ The agreement between the automatic and manual evaluation is about .71 13
  • 14. The gold standard  A fact is a triple <subject, property, object>  All possible entities are recognized by NER types and WordNet synsets  All verbs(predicates) are detected by the Stanford CoreNLP and predicates are expanded by adding DBPedia and Freebase properties  All extracted facts of sentences are verified by ◦ DBPedia ◦ Freebase 14
  • 15. Evaluation results Automatic evaluation Manual evaluation Precision Recall F- measure Precision Recall F- measure REVERB 0.61 0.15 0.24 0.55 0.11 0.18 OLLIE 0.64 0.30 0.40 0.65 0.32 0.42 OLLIE* 0.62 0.1 0.17 0.63 0.11 0.18 Triplex 0.55 0.17 0.25 0.62 0.22 0.32 Triplex + OLLIE 0.57 0.40 0.47 0.63 0.44 0.51 Triplex + REVERB 0.58 0.32 0.41 0.55 0.35 0.42 OLLIE* only generates triples according to noun-mediated formats 15
  • 16. Error analysis Missed extractions 10% No semantic types 12% Dependency parser problems 7% Coreferencing errors 6% Over-generalized templates 65% Verb-mediated triples (outside the of scope for Triplex) 16
  • 17. Correctly extracted triples Distribution Triple category Noun- mediated 12% Conjonctions, adjectives and noun phrases 9% Apposition and parenthetical phrases 6% Titles or professions 8% Templates with lexicon Verb- mediated 65% Verb-mediated triples 17
  • 18. Conclusion  Triplex generates noun-mediated triples from compound nouns, adjective, and appositions  Triplex complements the output of verb-mediated triple extractors  IE systems like Triplex can assist authors to annotate Wikipedia pages (recognize missing infobox values) 18
  • 19. Future works  Improve results for triples involving numerical values with different units (i.e., square meter, meter)  Enrich the bootstrapping process by using a probabilistic knowledgebase(e.g., Probase [2012]) 19
  • 20. References  M. Banko, M.J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni: Open Information Extraction for the Web. In: International Joint Conferences on Artificial Intelligence (IJCAI). pp. 2670–2676 (2007)  A. Fader, S. Soderland, O. Etzioni: Identifying Relations for Open Information Extraction. In: Conference on Empirical Methods in Natural Language Processing. pp. 1535–1545 (2011)  Mausam, M. Schmitz, R. Bart, S. Soderland, O. Etzioni: Open Language Learning for Information Extraction. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. pp. 523–534 (2012)  F. Wu, and D.S. Weld: Open Information Extraction Using Wikipedia. In: Annual Meeting of the Association for Computational Linguistics. pp. 118–127 (2010)  M. Bronzi, Z. Guo, F. Mesquita, D. Barbosa, P. Merialdo : Automatic Evaluation of Relation Extraction Systems on Large-scale. In: Joint Workshop on Automatic Knowledge Base Construction and Web- scale Knowledge Extraction. pp. 19–24 (2012)  W. Wu, H. Li, H. Wang, K.Q. Zhu: Probase: A Probabilistic Taxonomy for Text Understanding. In: ACM SIGMOD International Conference on Management of Data. pp. 481–492 (2012) 20

Editor's Notes

  1. An example??
  2. I did not understand this