SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
RelationExtractionfromBiologicalText
Dialekti Valsamou, Claire Nedellec and the Bibliome Team @ MIG, INRA
dialekti.valsamou@jouy.inra.fr, claire.nedellec@jouy.inra.fr
Introduction
Information Extraction is the extraction of
meaningful structured information from text. This
can be divided in three tasks: a) named entity
recognition (NER) b) anaphora resolution and c)
relation (or event) extraction (RE).
Relation Extraction is the problem of detecting
and classifying the existence of a relation between
entities in text. Approaches vary from simple
pattern matching [3][1] to more sophisticated ones.
Machine Learning seems to be indispensable
for the task of RE and there exist methods that
employ kernel-based algorithms [12][6], (logistic)
regression [7][9] or even neural networks [2].
The features used vary as well: sequences or subse-
quences [5][4], syntactic parse trees [8], dependency
graphs [6], convolution trees [13] and shallow
parsing [12] are some important examples.
An example for Genic Interactions
In the example above, we are trying to detect a relation of type
’Interaction’. The very first approach would be to use a bag of words, or a
slightly more sophisticated solution would look for k-subsequences.
We have adapted two sophisticated approaches using syntactic and
semantic information.
Bag-of-words Subsequences
Performance on the LLL Corpus
The LLL corpus [10] provides a good benchmark for relation extraction methods. The topic is genic
interaction, just like the examples. We tried the two approaches presented here and got encouraging results.
Here’s a table of the F-measure (10-fold cross-validation).
String Kernel
Linguistic Annotation
none auto manual
Sem.Classes
none 52.2 ± 3.1 64.4 ± 1.8 69.0 ± 2.3
manual 52.4 ± 3.7 68.4 ± 2.3 75.4 ± 2.6
Global Alignment Kernel
Linguistic Annotation
auto manual
Sem.Classes
none 61.0 ± 4.1 77.0 ± 2.4
manual 59.4 ± 5.4 79.1 ± 2.8
Dependency Graphs
Using the parsing information we can try and build
a dependency graph on the sentence that contains
candidate arguments.
The dependency graph for our example:
Goal: discover a connection between the two
arguments: a path in this graph that connects the
corresponding nodes.
⇒ the shortest path in the dependency graph,
used as we’d use a sequence
Dependency Graphs Kernel Learning
Features: The shortest path between the argu-
ments in the dependency graph of each sentence.
Algorithm: a Support Vector Machine or any other
kernel method
Global Alignment Kernel
Idea: use the “edit distance” of two sentences as a
kernel function. How?
⇒ Find the global alignment between them:
Similarity score: the optimal alignment score
given a substitution function and a gap penalty.
Substitution cost: Minimum (zero) if the elements
belong to the same semantic class (ex. activate-
control), medium if they share the same POS tag
and high otherwise.
Gap Penalty: Empirically shown that lower values
produce better results.
Algorithm: a Support Vector Machine or any other
kernel method
What Information to Use?
In recent years the linguistic analysis tools at our
disposal have become more and more efficient, al-
lowing us to obtain better results by using deeper
analysis. We call this information that we add to
the original text data, an annotation and it can be
obtained either manually or, ideally, automatically.
The levels are:
• Lexical (with possible lemmatisation), ex.
Bag-of-words, word n-grams etc
• Morpho-syntactic, ex. Part-of-Speech (POS)
tagging
• Parsing, ex. Dependency or constituency
graphs (paths, trees, etc)
• Semantic, ex. the use of semantic classes
In both of the algorithms presented in this poster,
performance improved considerably when using syn-
tactic and/or semantic information.This was made
possible by the AlvisNLP pipeline developped by
our lab.
Distributed Semantics: Unsupervised learning
of semantically close words from entire document
collections in order to form classes.
String Kernel Comparison
Precision/Recall graph for a
simple string kernel (bag of
words) and the shortest path
on dependency graphs version.
References
[1] E. Agichtein and L. Gravano. Snowball: Extracting relations from
large plain-text collections. 2000.
[2] T. Barnickel, J. Weston, R. Collobert, H. Mewes, and V. St¨umpflen.
Large scale application of neural network based semantic role labeling
for automated relation extraction from biomedical texts. 2009.
[3] S. Brin. Extracting patterns and relations from the world wide web.
1999.
[4] R. Bunescu and R. Mooney. Subsequence kernels for relation extrac-
tion. 2006.
[5] A. Culotta, A. McCallum, and J. Betz. Integrating probabilistic ex-
traction models and data mining to discover relations and patterns in
text. 2006.
[6] A. Culotta and J. Sorensen. Dependency tree kernels for relation ex-
traction. 2004.
[7] N. Kambhatla. Combining lexical, syntactic, and semantic features
with maximum entropy models for extracting relations. 2004.
[8] Y. Liu, Z. Shi, and A. Sarkar. Exploiting rich syntactic information
for relation extraction from biomedical articles. 2007.
[9] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for
relation extraction without labeled data. 2009.
[10] C. N´edellec. Learning language in logic-genic interaction extraction
challenge. 2005.
[11] S. Riedel, L. Yao, and A. McCallum. Collective cross-document rela-
tion extraction without labelled data. 2010.
[12] D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation
extraction. 2003.
[13] M. Zhang, J. Zhang, and J. Su. Exploring syntactic features for rela-
tion extraction using a convolution tree kernel. 2006.
Future Work
This work is continuously being improved on all as-
pects (linguistic parsing, semantic classes, etc). We
are also focusing on the fact that when dealing with
data of biological nature,
• it is hard to engage experts in the tedious and
time consuming task of manual annotation
• but, there exists an abundance of databases
Distant supervision: Project structured relation
data onto text documents in order to produce posi-
tive and negative examples [11].
⇒ Pre-annotate examples for the experts to con-
firm, creating larger datasets that allow for general-
ization.

Weitere ähnliche Inhalte

Was ist angesagt?

Topic models
Topic modelsTopic models
Topic models
Ajay Ohri
 
Complex Relations Extraction
Complex Relations ExtractionComplex Relations Extraction
Complex Relations Extraction
Naveed Afzal
 
Blei lafferty2009
Blei lafferty2009Blei lafferty2009
Blei lafferty2009
Ajay Ohri
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
butest
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
Editor IJMTER
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
csandit
 

Was ist angesagt? (18)

Topic models
Topic modelsTopic models
Topic models
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
Canini09a
Canini09aCanini09a
Canini09a
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Learning ontologies
Learning ontologiesLearning ontologies
Learning ontologies
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
EXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEBEXTRACTING ARABIC RELATIONS FROM THE WEB
EXTRACTING ARABIC RELATIONS FROM THE WEB
 
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
An Enhanced Suffix Tree Approach to Measure Semantic Similarity between Multi...
 
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESNAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURES
 
Complex Relations Extraction
Complex Relations ExtractionComplex Relations Extraction
Complex Relations Extraction
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Blei lafferty2009
Blei lafferty2009Blei lafferty2009
Blei lafferty2009
 
Ontology Mapping
Ontology MappingOntology Mapping
Ontology Mapping
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
Ijetcas14 624
Ijetcas14 624Ijetcas14 624
Ijetcas14 624
 
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
AN EFFICIENT APPROACH TO IMPROVE ARABIC DOCUMENTS CLUSTERING BASED ON A NEW K...
 
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASESTRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
TRANSFORMATION RULES FOR BUILDING OWL ONTOLOGIES FROM RELATIONAL DATABASES
 

Ähnlich wie mlss

SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
cscpconf
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
Patricia Tavares Boralli
 

Ähnlich wie mlss (20)

CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
IRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency ParserIRJET- An Analysis of Recent Advancements on the Dependency Parser
IRJET- An Analysis of Recent Advancements on the Dependency Parser
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...
 
A Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense DisambiguationA Survey on Unsupervised Graph-based Word Sense Disambiguation
A Survey on Unsupervised Graph-based Word Sense Disambiguation
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONA NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
 
Higher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifsHigher-order spectral graph clustering with motifs
Higher-order spectral graph clustering with motifs
 
G04124041046
G04124041046G04124041046
G04124041046
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 

mlss

  • 1. RelationExtractionfromBiologicalText Dialekti Valsamou, Claire Nedellec and the Bibliome Team @ MIG, INRA dialekti.valsamou@jouy.inra.fr, claire.nedellec@jouy.inra.fr Introduction Information Extraction is the extraction of meaningful structured information from text. This can be divided in three tasks: a) named entity recognition (NER) b) anaphora resolution and c) relation (or event) extraction (RE). Relation Extraction is the problem of detecting and classifying the existence of a relation between entities in text. Approaches vary from simple pattern matching [3][1] to more sophisticated ones. Machine Learning seems to be indispensable for the task of RE and there exist methods that employ kernel-based algorithms [12][6], (logistic) regression [7][9] or even neural networks [2]. The features used vary as well: sequences or subse- quences [5][4], syntactic parse trees [8], dependency graphs [6], convolution trees [13] and shallow parsing [12] are some important examples. An example for Genic Interactions In the example above, we are trying to detect a relation of type ’Interaction’. The very first approach would be to use a bag of words, or a slightly more sophisticated solution would look for k-subsequences. We have adapted two sophisticated approaches using syntactic and semantic information. Bag-of-words Subsequences Performance on the LLL Corpus The LLL corpus [10] provides a good benchmark for relation extraction methods. The topic is genic interaction, just like the examples. We tried the two approaches presented here and got encouraging results. Here’s a table of the F-measure (10-fold cross-validation). String Kernel Linguistic Annotation none auto manual Sem.Classes none 52.2 ± 3.1 64.4 ± 1.8 69.0 ± 2.3 manual 52.4 ± 3.7 68.4 ± 2.3 75.4 ± 2.6 Global Alignment Kernel Linguistic Annotation auto manual Sem.Classes none 61.0 ± 4.1 77.0 ± 2.4 manual 59.4 ± 5.4 79.1 ± 2.8 Dependency Graphs Using the parsing information we can try and build a dependency graph on the sentence that contains candidate arguments. The dependency graph for our example: Goal: discover a connection between the two arguments: a path in this graph that connects the corresponding nodes. ⇒ the shortest path in the dependency graph, used as we’d use a sequence Dependency Graphs Kernel Learning Features: The shortest path between the argu- ments in the dependency graph of each sentence. Algorithm: a Support Vector Machine or any other kernel method Global Alignment Kernel Idea: use the “edit distance” of two sentences as a kernel function. How? ⇒ Find the global alignment between them: Similarity score: the optimal alignment score given a substitution function and a gap penalty. Substitution cost: Minimum (zero) if the elements belong to the same semantic class (ex. activate- control), medium if they share the same POS tag and high otherwise. Gap Penalty: Empirically shown that lower values produce better results. Algorithm: a Support Vector Machine or any other kernel method What Information to Use? In recent years the linguistic analysis tools at our disposal have become more and more efficient, al- lowing us to obtain better results by using deeper analysis. We call this information that we add to the original text data, an annotation and it can be obtained either manually or, ideally, automatically. The levels are: • Lexical (with possible lemmatisation), ex. Bag-of-words, word n-grams etc • Morpho-syntactic, ex. Part-of-Speech (POS) tagging • Parsing, ex. Dependency or constituency graphs (paths, trees, etc) • Semantic, ex. the use of semantic classes In both of the algorithms presented in this poster, performance improved considerably when using syn- tactic and/or semantic information.This was made possible by the AlvisNLP pipeline developped by our lab. Distributed Semantics: Unsupervised learning of semantically close words from entire document collections in order to form classes. String Kernel Comparison Precision/Recall graph for a simple string kernel (bag of words) and the shortest path on dependency graphs version. References [1] E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. 2000. [2] T. Barnickel, J. Weston, R. Collobert, H. Mewes, and V. St¨umpflen. Large scale application of neural network based semantic role labeling for automated relation extraction from biomedical texts. 2009. [3] S. Brin. Extracting patterns and relations from the world wide web. 1999. [4] R. Bunescu and R. Mooney. Subsequence kernels for relation extrac- tion. 2006. [5] A. Culotta, A. McCallum, and J. Betz. Integrating probabilistic ex- traction models and data mining to discover relations and patterns in text. 2006. [6] A. Culotta and J. Sorensen. Dependency tree kernels for relation ex- traction. 2004. [7] N. Kambhatla. Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. 2004. [8] Y. Liu, Z. Shi, and A. Sarkar. Exploiting rich syntactic information for relation extraction from biomedical articles. 2007. [9] M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. 2009. [10] C. N´edellec. Learning language in logic-genic interaction extraction challenge. 2005. [11] S. Riedel, L. Yao, and A. McCallum. Collective cross-document rela- tion extraction without labelled data. 2010. [12] D. Zelenko, C. Aone, and A. Richardella. Kernel methods for relation extraction. 2003. [13] M. Zhang, J. Zhang, and J. Su. Exploring syntactic features for rela- tion extraction using a convolution tree kernel. 2006. Future Work This work is continuously being improved on all as- pects (linguistic parsing, semantic classes, etc). We are also focusing on the fact that when dealing with data of biological nature, • it is hard to engage experts in the tedious and time consuming task of manual annotation • but, there exists an abundance of databases Distant supervision: Project structured relation data onto text documents in order to produce posi- tive and negative examples [11]. ⇒ Pre-annotate examples for the experts to con- firm, creating larger datasets that allow for general- ization.