1. RelationExtractionfromBiologicalText
Dialekti Valsamou, Claire Nedellec and the Bibliome Team @ MIG, INRA
dialekti.valsamou@jouy.inra.fr, claire.nedellec@jouy.inra.fr
Introduction
Information Extraction is the extraction of
meaningful structured information from text. This
can be divided in three tasks: a) named entity
recognition (NER) b) anaphora resolution and c)
relation (or event) extraction (RE).
Relation Extraction is the problem of detecting
and classifying the existence of a relation between
entities in text. Approaches vary from simple
pattern matching [3][1] to more sophisticated ones.
Machine Learning seems to be indispensable
for the task of RE and there exist methods that
employ kernel-based algorithms [12][6], (logistic)
regression [7][9] or even neural networks [2].
The features used vary as well: sequences or subse-
quences [5][4], syntactic parse trees [8], dependency
graphs [6], convolution trees [13] and shallow
parsing [12] are some important examples.
An example for Genic Interactions
In the example above, we are trying to detect a relation of type
’Interaction’. The very first approach would be to use a bag of words, or a
slightly more sophisticated solution would look for k-subsequences.
We have adapted two sophisticated approaches using syntactic and
semantic information.
Bag-of-words Subsequences
Performance on the LLL Corpus
The LLL corpus [10] provides a good benchmark for relation extraction methods. The topic is genic
interaction, just like the examples. We tried the two approaches presented here and got encouraging results.
Here’s a table of the F-measure (10-fold cross-validation).
String Kernel
Linguistic Annotation
none auto manual
Sem.Classes
none 52.2 ± 3.1 64.4 ± 1.8 69.0 ± 2.3
manual 52.4 ± 3.7 68.4 ± 2.3 75.4 ± 2.6
Global Alignment Kernel
Linguistic Annotation
auto manual
Sem.Classes
none 61.0 ± 4.1 77.0 ± 2.4
manual 59.4 ± 5.4 79.1 ± 2.8
Dependency Graphs
Using the parsing information we can try and build
a dependency graph on the sentence that contains
candidate arguments.
The dependency graph for our example:
Goal: discover a connection between the two
arguments: a path in this graph that connects the
corresponding nodes.
⇒ the shortest path in the dependency graph,
used as we’d use a sequence
Dependency Graphs Kernel Learning
Features: The shortest path between the argu-
ments in the dependency graph of each sentence.
Algorithm: a Support Vector Machine or any other
kernel method
Global Alignment Kernel
Idea: use the “edit distance” of two sentences as a
kernel function. How?
⇒ Find the global alignment between them:
Similarity score: the optimal alignment score
given a substitution function and a gap penalty.
Substitution cost: Minimum (zero) if the elements
belong to the same semantic class (ex. activate-
control), medium if they share the same POS tag
and high otherwise.
Gap Penalty: Empirically shown that lower values
produce better results.
Algorithm: a Support Vector Machine or any other
kernel method
What Information to Use?
In recent years the linguistic analysis tools at our
disposal have become more and more efficient, al-
lowing us to obtain better results by using deeper
analysis. We call this information that we add to
the original text data, an annotation and it can be
obtained either manually or, ideally, automatically.
The levels are:
• Lexical (with possible lemmatisation), ex.
Bag-of-words, word n-grams etc
• Morpho-syntactic, ex. Part-of-Speech (POS)
tagging
• Parsing, ex. Dependency or constituency
graphs (paths, trees, etc)
• Semantic, ex. the use of semantic classes
In both of the algorithms presented in this poster,
performance improved considerably when using syn-
tactic and/or semantic information.This was made
possible by the AlvisNLP pipeline developped by
our lab.
Distributed Semantics: Unsupervised learning
of semantically close words from entire document
collections in order to form classes.
String Kernel Comparison
Precision/Recall graph for a
simple string kernel (bag of
words) and the shortest path
on dependency graphs version.
References
[1] E. Agichtein and L. Gravano. Snowball: Extracting relations from
large plain-text collections. 2000.
[2] T. Barnickel, J. Weston, R. Collobert, H. Mewes, and V. St¨umpflen.
Large scale application of neural network based semantic role labeling
for automated relation extraction from biomedical texts. 2009.
[3] S. Brin. Extracting patterns and relations from the world wide web.
1999.
[4] R. Bunescu and R. Mooney. Subsequence kernels for relation extrac-
tion. 2006.
[5] A. Culotta, A. McCallum, and J. Betz. Integrating probabilistic ex-
traction models and data mining to discover relations and patterns in
text. 2006.
[6] A. Culotta and J. Sorensen. Dependency tree kernels for relation ex-
traction. 2004.
[7] N. Kambhatla. Combining lexical, syntactic, and semantic features
with maximum entropy models for extracting relations. 2004.
[8] Y. Liu, Z. Shi, and A. Sarkar. Exploiting rich syntactic information
for relation extraction from biomedical articles. 2007.
[9] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for
relation extraction without labeled data. 2009.
[10] C. N´edellec. Learning language in logic-genic interaction extraction
challenge. 2005.
[11] S. Riedel, L. Yao, and A. McCallum. Collective cross-document rela-
tion extraction without labelled data. 2010.
[12] D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation
extraction. 2003.
[13] M. Zhang, J. Zhang, and J. Su. Exploring syntactic features for rela-
tion extraction using a convolution tree kernel. 2006.
Future Work
This work is continuously being improved on all as-
pects (linguistic parsing, semantic classes, etc). We
are also focusing on the fact that when dealing with
data of biological nature,
• it is hard to engage experts in the tedious and
time consuming task of manual annotation
• but, there exists an abundance of databases
Distant supervision: Project structured relation
data onto text documents in order to produce posi-
tive and negative examples [11].
⇒ Pre-annotate examples for the experts to con-
firm, creating larger datasets that allow for general-
ization.