The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

The Triplex Approach for
Recognizing Semantic Relations
from Noun Phrases,
Appositions, and Adjectives
Iman Mirrezaei, Bruno Martins, and Isabel F.
Cruz
ADVIS Lab, Department of Computer Science,
University of Illinois at Chicago, USA
Instituto Superior Tecnico, Universidade de
Lisboa, Portugal
1
1
2
2
1

Motivation
 How to extract useful knowledge from
textual resources?
 How to identify relations between
entities?
2
Microsoft is an American corporation headquartered in
Redmond , Washington
Michelle Obama (born January 17, 1964), an
American lawyer and writer, is the wife of the ...

Triples
 Each triple represents an atomic fact by stating
a subject, a predicate (property) and an object
(value)
◦ e.g., “The sky has the color blue.” <the sky; has;
the color blue>
 Triples can be expressed by verbs, or by
particular noun phrases in textual resources
◦ Verb-mediated formats
◦ Noun-mediated formats
 An information extractor converts an input text
to a set of triples
3

Information extractors
 Verb-mediated triple extractors
◦ TextRunner [Banko et al. 2007], WOE [Wu and Weld
2010], ReVerb [Fader et al. 2011], and OLLIE
[Mausam et al. 2012]
◦ e.g., “Obama will be elected President of the
United States” <Obama; will be elected;
President of the United States>
 Noun-mediated triple extractors
◦ OLLIE: the first noun-mediated triple extractor
◦ OLLIE has patterns to extract noun-mediated
triples if they can also be expressed by a verb-
mediated format
◦ e.g., “Microsoft co-founder Bill Gates spoke at
the conference” <Bill Gates; be co-founder of;
Microsoft> 4

Information extractors
Deep
syntacti
c
features
Shallow
syntactic
features
Lexical
constraints
Type
constrains(e.g.,
person,
location, …)
TextRunne
r
WOE-pos
WOE-
parse
ReVerb
OLLIE
Triplex
(suggeste
d)
5

Noun-mediated triples
 Noun-mediated triples can be expressed
through noun phrase with adjectives,
compound nouns and appositions
 How to extract noun-mediated triples that are
not expressed via verb-mediated formats?
 How to extract templates automatically from
text to generate noun-mediated triples?
6

Architecture
Sentences
Template
Extractio
n
Stanford NLP
Toolkit
WordNet
Synonym sets
of Wikipedia
pages
Infoboxe
s
Sentence
Extraction
Text
7

The bootstrapping process
 A sentence of a wiki page is extracted if it
contains an infobox value (object) and a synset
member (subject)
◦ The sentence is checked if there is a dependency
path between object and subject (noun, adjective, or
apposition dependencies)
◦ Tokens in the dependency paths between subject and
object are annotated with POS tags, lexical
constraints, WordNet synsets and named entity tags
 Annotated paths are seen as extraction
templates
 Constraint on the length of the dependency
path
8

Example
 Microsoft Corporation is an American
multinational software corporation
headquartered in Redmond,
Washington that develops….
◦ vmod(corporation-8,
headquartered-9)
prep(headquartered-9, in-10)
nn(Washington-13, Redmond-11)
9

Microsoft is an American corporation headquartered in Redmond , Washington
NNP VBZ DT JJ NN VBN IN NNP , NNP
ORG O O MISC O O O LOC O LOC
O O O O ORG O O O O O
Infobox name: Headquarters
Infobox value: Redmond, Washington
Range of headquarters : Location
Synset member: Corporation
Synset member type: Organization
Lexical constraint: Headquarter in
Microsoft
corporation
Coreference
nn
vmod
prep-in
O O O O Subject O O Object
O: No label PER: person NUM: number ORG:
organization
Example
10
POS tags
Named Entities
WordNet synsets
Occurrences of
subject and object

Templates
 Templates express how a class of triples is
expressed in a sentence.
◦ Deep syntactic features: dependencies
◦ Shallow syntactic features: POS tags, noun
phrases
◦ Lexical features
◦ Named entity types: WordNet synsets
◦ Property ranges (Person, Organization,
Location, or unknown)
11

Triplex
 Confidence score for triples
◦ A logistic regression classifier
◦ Features: frequency of the extraction
templates, existence of lexical words, range
of properties, semantic object type
 Template matching
◦ Recognizing candidate subjects by NER
types and WordNet synsets
◦ The dependency paths between subject and
all potential objects are annotated
◦ Matching with templates
12

Evaluation
 Automatic evaluation according to the
procedure suggested by Bronzi et al.[2012]
◦ 1000 random sentences from Wikipedia
◦ Create a gold standard by using PMI, DBPedia,
and Freebase
 Manual evaluation
◦ 50 random sentences from Wikipedia
◦ The agreement between the automatic and
manual evaluation is about .71
13

The gold standard
 A fact is a triple <subject, property, object>
 All possible entities are recognized by NER
types and WordNet synsets
 All verbs(predicates) are detected by the
Stanford CoreNLP and predicates are
expanded by adding DBPedia and
Freebase properties
 All extracted facts of sentences are verified
by
◦ DBPedia
◦ Freebase
14

Evaluation results
Automatic evaluation Manual evaluation
Precision Recall
F-
measure
Precision Recall
F-
measure
REVERB
0.61 0.15 0.24 0.55 0.11 0.18
OLLIE
0.64 0.30 0.40 0.65 0.32 0.42
OLLIE*
0.62 0.1 0.17 0.63 0.11 0.18
Triplex
0.55 0.17 0.25 0.62 0.22 0.32
Triplex +
OLLIE
0.57 0.40 0.47 0.63 0.44 0.51
Triplex +
REVERB
0.58 0.32 0.41 0.55 0.35 0.42
OLLIE* only generates triples according to noun-mediated formats 15

Error analysis
Missed extractions
10% No semantic types
12% Dependency parser problems
7% Coreferencing errors
6% Over-generalized templates
65% Verb-mediated triples (outside the of scope for Triplex)
16

Correctly extracted triples
Distribution Triple category
Noun-
mediated
12%
Conjonctions, adjectives
and noun phrases
9%
Apposition and parenthetical
phrases
6% Titles or professions
8% Templates with lexicon
Verb-
mediated
65%
Verb-mediated triples
17

Conclusion
 Triplex generates noun-mediated
triples from compound nouns,
adjective, and appositions
 Triplex complements the output of
verb-mediated triple extractors
 IE systems like Triplex can assist
authors to annotate Wikipedia pages
(recognize missing infobox values)
18

Future works
 Improve results for triples involving
numerical values with different units
(i.e., square meter, meter)
 Enrich the bootstrapping process by
using a probabilistic
knowledgebase(e.g., Probase [2012])
19

References
 M. Banko, M.J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni:
Open Information Extraction for the Web. In: International Joint
Conferences on Artificial Intelligence (IJCAI). pp. 2670–2676 (2007)
 A. Fader, S. Soderland, O. Etzioni: Identifying Relations for Open
Information Extraction. In: Conference on Empirical Methods in
Natural Language Processing. pp. 1535–1545 (2011)
 Mausam, M. Schmitz, R. Bart, S. Soderland, O. Etzioni: Open
Language Learning for Information Extraction. In: Joint Conference
on Empirical Methods in Natural Language Processing and
Computational Natural Language Learning. pp. 523–534 (2012)
 F. Wu, and D.S. Weld: Open Information Extraction Using Wikipedia.
In: Annual Meeting of the Association for Computational Linguistics.
pp. 118–127 (2010)
 M. Bronzi, Z. Guo, F. Mesquita, D. Barbosa, P. Merialdo : Automatic
Evaluation of Relation Extraction Systems on Large-scale. In: Joint
Workshop on Automatic Knowledge Base Construction and Web-
scale Knowledge Extraction. pp. 19–24 (2012)
 W. Wu, H. Li, H. Wang, K.Q. Zhu: Probase: A Probabilistic Taxonomy
for Text Understanding. In: ACM SIGMOD International Conference
on Management of Data. pp. 481–492 (2012)
20

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

Similar to The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives (20)

Recently uploaded

Recently uploaded (20)

The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives

Editor's Notes