SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Università degli studi di Bari “Aldo Moro”
                                 Dipartimento di Informatica




                      Cooperating Techniques for
             Extracting Conceptual Taxonomies from Text
                                   S. Ferilli, F. Leuzzi, F. Rotella
L.A.C.A.M.
http://lacam.di.uniba.it:8000

                AI*IA 2011 XIIth Conference of the Italian Association for Artificial Intelligence
                             Workshop on Mining Complex Patterns (MCP 2011)
                                     Palermo, Italy, September 17, 2011
Overview
          1. Introduction & Objectives
          2. Extraction of knowledge from text
          3. Knowledge representation formalism
          4. Identification of relevant concepts
          5. Generalization of similar concepts
          6. Reasoning ‘by association’
          7. Conclusions & Future works




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   2
Introduction
          The spread of electronic documents and document
          repositories has generated the need for automatic techniques
          to understand and handle the documents content in order to
          help users in satisfying their information needs.


          Full Text Understading is not trivial, due to:
          1. intrinsic ambiguity of natural language;
          2. huge amount of common sense and conceptual background
             knowledge.


          For facing these problems lexical and/or conceptual
          taxonomies are useful, even if manually building is very costly
          and error prone.
Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   3
Introduction
          This lack is a strong motivation towards
          automatic construction of conceptual
          networks by mining large amounts of
          documents in natural language.




                                                   However, even assuming a correct
                                                   knowledge representation, we are
                                                   far to simulate human abilities yet.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   4
Objectives

          1. Definition of a representation formalism for knowledge
             extracted from natural language texts

          2. Extraction of concepts and relevance assessment

          3. Generalization of concepts having similar descriptions

          4. Definition of a kind of reasoning by concept association that
             looks for possible indirect connections between two
             identified concepts




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   5
Extraction of knowledge
                           from text
          Knowledge extracted by processing each sentence separately.




                    Stanford                              Stanford
                   Parser [1]                          Dependencies [2]




          The final output of the Stanford Dependencies is a typed
          syntactic structure of each sentence.



Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   6
Knowledge representation
                        formalism
          Among all grammatical roles played by words in a sentence,
          only subject, verb and complement have been considered.
          In the final conceptual graph subjects and complements will
          represent concepts, while verbs will express relations between
          them.




          subject,
                                                        subject,
           verb,
                                                      complement
        complement




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   7
Identification of
                               relevant concept
       A mix of several techniques are brought to cooperation for
       identifying relevant concepts:

       ●   Hub Words [3]: words having high frequency whose relevance is
           computed as:

                              W (t )=α w 0 +β n+γ ∑ i=1 w (t i )

           where: w0 , initial weight; n, # of relationships;
                     w(ti), tf*idf weight of i-th word related to t.

       ●   Keyword extraction techniques from single documents.
       ●   EM Clustering provided by Weka [4] based on Euclidean
           distance.


Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   8
Identification of
                               relevant concept
          Inspired to the Hub Words approach we have defined a
          Relevance Weight:

                    A                   B                       C                       D            E
            w (̄)
                c           e(̄)c          ∑( c , ̄ ) w (c ) d M −d ( c )
                                                  c                   ̄        k (̄)
                                                                                   c
W ( ̄ )=α
    c                  +β               +γ                  +δ            +ε
          max c w( c )    max c e ( c )       e( ̄ ) c           dM          max c k ( c )

          where: α + β+γ +δ +ε =1

          Nodes in the network are ranked by decreasing Relevance
          Weight.
          A suitable cut-point in the ranking is determined by choosing
          the first item such that:
                        W ( c k )-W (c k+1 )≥ p⋅ max                   ( W ( c i )-W (c i+1 ) )
                                                     i =0,.. . , n−1
          where: p∈ [ 0,1 ]
Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   9
Identification of relevant concept
               Relevance Weight in details
                          Definition of the Initial Weight

          The whole set of triples <subject,verb,complement> is
          represented in a Concepts x Attributes matrix V recalling the
          classical Terms x Documents Vector Space Model.

                                            f i, j                 ∣A∣
          Resembling tf*idf:                           ⋅log
                                         ∑   k
                                                 f k, j     ∣{ j : c i ∈a j }∣

                                                          w (c )
                                                              ̄
          Therefore component A is:                   α
                                                        max c w ( c)
          where w(c) is the initial weight assigned to node c computed
          according to the above tf*idf schema.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   10
Identification of relevant concept
               Relevance Weight in details
                                   Connections Number
          Component B considers the number of connections (edges) in
          which c is involved
                                                    e(̄)c
                                              β
                                                  max c e ( c )



                          Neighborhood Weight Summary
          Component C takes into account the average
          initial weight of all neighbors of c

               ∑ (c,c )
                    ̄
                          w ( c)
           γ
                   e( c )
                      ̄

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   11
Identification of relevant concept
               Relevance Weight in details
                            Inverse Distance form Center
          Component D represents the closeness to center of the cluster
                                                d M −d( c )
                                                        ̄
                                              δ
                                                    dM


                                           KE Influence
         Component E takes into account the outcome of three KE
         techniques suitably weighted:
                                                 k (̄ )
                                                     c
                                             ε
                                               max c k (c )
          where:

               k ( ̄ )=ςk co−occurrences ( ̄ )+ηk synset ( ̄ )+θk mvn ( ̄ )
                   c                       c               c            c

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   12
Identification of relevant concept
               Relevance Weight in details
                                                                                        2
              KE based on                                                           χ
                                              k co− occurrences=ς
          ●


                                                                                               2
              co-occurrences                                               max cluster χ

                                                                      kw synset
         ●    KE based on                     k synset =η
              WordNet Synsets                                   max ( kw synset )

              KE by means
                                                                     kw mvn
          ●



              Multivariate Normal              k mvn=θ
                                                               max ( kw mvn )
              Distribution (MVN)


Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   13
Identification of relevant concept
                                         Evaluations
                        Test #       α         β         γ           δ           ε        p
                           1       0.10      0.10      0.30       0.25        0.25     1.0
                           2       0.20      0.15      0.15       0.25        0.25     0.7
                           3       0.15      0.25      0.30       0.15        0.15     1.0


           Test #     Concept         A            B          C           D           E        W
              1      network       0.100      0.100          0.021       0.178       0.250    0.649
                     access        0.001      0.001          0.154       0.239       0.250    0.646
                     subset       6.32E-4     0.001          0.150       0.239       0.250    0.641
              2      network       0.200      0.150      0.0105          0.178       0.250    0.789
              3      network       0.150      0.250          0.021       0.146       0.150    0.717
                     user          0.127      0.195          0.022       0.146       0.150    0.641
                     number        0.113      0.187          0.022       0.146       0.150    0.619
                     individual    0.103      0.174          0.020       0.146       0.150    0.594


Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   14
Generalization of similar concepts
                         Pairwise clustering
          Take in account the description of each concept, consisting in
          a binary vector that represents presence or absence (1 or 0
          respectively) of a <subject,complement> relation between
          the involved concepts. The Hamming distance provides a
          similarity evaluation between them.




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   15
Generalization of similar concepts
                                         WordNet
            WordNet1 is an external resource that has some useful
            properties:
            1. lexical taxonomy
            2. each concept is described as a set of synonyms (synset)
            3. synsets are interlinked by means of conceptual-
                semantic and lexical relations


            We are focused on hyperonymy, a relation that links the
            current synset to more general ones.


            1. http://wordnet.princeton.edu/




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   16
Generalization of similar concepts
            Taxonomical similarity function
    More general: provides a                                  More specific: provides a
    similarity value on the bases of                          similarity value on the bases of
    common relations, without                                 common relations, relying on
    focusing on the specific path.                            the specific path.




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   17
Generalization of similar concepts
                       WSD Domain Driven
          One Domain per Discourse assumption: many uses of a word
          in a coherent portion of text tend to share the same domain.
      Prevalent domain
      Prevalent domain
          individuation
         individuation

                                Extraction of all
                                Extraction of all
                           synsets for each term
                           synsets for each term

                                                       Extraction of all
                                                       Extraction of all
                                                domains for each synset
                                                domains for each synset

                                                                            Choice of prevalent
                                                                            Choice of prevalent
                                                                                domain synset
                                                                                domain synset


Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   18
Generalization of similar concepts
                                     Evaluations
          Two toy experiments have been performed with Hamming
          distance threshold respectively equal to 0.001 and 0.0001,
          while taxonomical similarity function threshold has been kept
          equal to 0.4.




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   19
Reasoning ‘by association’
                      Breadth-First Search
          Given two nodes (concepts), a Breadth-First Search starts
          from both nodes, the former searches the latter's frontier and
          vice versa, until the two frontiers meet by common nodes.
          Then the path is restored going backward to the roots in both
          directions.




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   20
Reasoning ‘by association’
                                     Evaluations
          The table below shows a sample of possible outcomes.
          E.g., an interpretation of case 5 can be:
          “the adults write about freedom and use platform, that is
          recognized as a technology, as well as the internet”.




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   21
Conclusions
    This work proposes an approach to extract automatic conceptual
    taxonomy from natural language texts.


    It works mixing different techniques in order to:
    ●   identify relevant terms/concepts in text;
    ●   generalize similar concepts;
    ●   perform some kind of reasoning “by association”.


    Preliminary experiments show that this approach can be viable
    although extensions and refinements are needed.
    A reliable outcome might help users in understanding the text
    content and machines to automatically perform some kind of
    reasoning on the taxonomy.
Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   22
Future works
          1. Extending the knowledge representation formalism to
             express negation.

          2. Defining a strategy to make a better choice of weights in
             Relevance Weight computation.

          3. Enriching the adjacency matrix to improve concept
             descriptions.

          4. ODD alternatives exploration, to overcome its limits.

          5. Taxonomical similarity measures take into account only the
             hypernym relation, while a more accurate similarity can be
             obtained adding other relations.

          6. Define a strategy to prefer one verb rather than keeping all
             of them, in reasoning ‘by association’ phase.

Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   23
References
          [1] Dan Klein and Christopher D. Manning. Fast exact
          inference with a factored model for natural language parsing.
          In Advances in Neural Information Processing Systems,
          volume 15. MIT Press, 2003.
          [2] Marie-Catherine de Marneffe, Bill MacCartney, and
          Christopher D. Manning. Generating typed dependency parses
          from phrase structure trees. In LREC, 2006.
          [3] Sang Ok Koo, Soo Yeon Lim, and Sang-Jo Lee. Constructing
          an ontology based on hub words. In ISMIS’03, pages 93–97,
          2003.
          [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,
          and I.H. Witten. The weka data mining software: an update.
          SIGKDD Explorations, 11(1):10–18,2009.




Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella   24

Weitere ähnliche Inhalte

Was ist angesagt?

Integration in Finite Terms
Integration in Finite TermsIntegration in Finite Terms
Integration in Finite TermsKp Hart
 
Text smilarity02 corpus_based
Text smilarity02 corpus_basedText smilarity02 corpus_based
Text smilarity02 corpus_basedcyan1d3
 
Jarrar.lecture notes.aai.2012s.descriptionlogic
Jarrar.lecture notes.aai.2012s.descriptionlogicJarrar.lecture notes.aai.2012s.descriptionlogic
Jarrar.lecture notes.aai.2012s.descriptionlogicSinaInstitute
 
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Antonio Lieto
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional SemanticsAndre Freitas
 
12-Multistrategy-learning.doc
12-Multistrategy-learning.doc12-Multistrategy-learning.doc
12-Multistrategy-learning.docbutest
 
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...University of Bari (Italy)
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
 
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...University of Bari (Italy)
 
Constructive Description Logics 2006
Constructive Description Logics 2006Constructive Description Logics 2006
Constructive Description Logics 2006Valeria de Paiva
 
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDaisuke BEKKI
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...L. Thorne McCarty
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment andrefsantos
 
Constructive Hybrid Logics
Constructive Hybrid LogicsConstructive Hybrid Logics
Constructive Hybrid LogicsValeria de Paiva
 
Truth as a logical connective?
Truth as a logical connective?Truth as a logical connective?
Truth as a logical connective?Shunsuke Yatabe
 

Was ist angesagt? (19)

Integration in Finite Terms
Integration in Finite TermsIntegration in Finite Terms
Integration in Finite Terms
 
Text smilarity02 corpus_based
Text smilarity02 corpus_basedText smilarity02 corpus_based
Text smilarity02 corpus_based
 
Jarrar.lecture notes.aai.2012s.descriptionlogic
Jarrar.lecture notes.aai.2012s.descriptionlogicJarrar.lecture notes.aai.2012s.descriptionlogic
Jarrar.lecture notes.aai.2012s.descriptionlogic
 
Exempler approach
Exempler approachExempler approach
Exempler approach
 
Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...Extending the knowledge level of cognitive architectures with Conceptual Spac...
Extending the knowledge level of cognitive architectures with Conceptual Spac...
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
12-Multistrategy-learning.doc
12-Multistrategy-learning.doc12-Multistrategy-learning.doc
12-Multistrategy-learning.doc
 
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
 
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
ConNeKTion: A Tool for Exploiting Conceptual Graphs Automatically Learned fro...
 
Constructive Description Logics 2006
Constructive Description Logics 2006Constructive Description Logics 2006
Constructive Description Logics 2006
 
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
 
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
How to Ground A Language for Legal Discourse In a Prototypical Perceptual Sem...
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 
Cerutti--TAFA 2011
Cerutti--TAFA 2011Cerutti--TAFA 2011
Cerutti--TAFA 2011
 
Constructive Hybrid Logics
Constructive Hybrid LogicsConstructive Hybrid Logics
Constructive Hybrid Logics
 
Truth as a logical connective?
Truth as a logical connective?Truth as a logical connective?
Truth as a logical connective?
 

Ähnlich wie Cooperating Techniques for Extracting Conceptual Taxonomies from Text

Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...Heather Strinden
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
 
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...Facultad de Informática UCM
 
FCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of OntologiesFCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of Ontologiesalemarrena
 
Cross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interfaceCross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interfacepathsproject
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicValeria de Paiva
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSsipij
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003Ajay Ohri
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalElena Simperl
 
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...Federico Cerutti
 
Method for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domainsMethod for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domainsLuigi Ceccaroni
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAijistjournal
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations onijistjournal
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 

Ähnlich wie Cooperating Techniques for Extracting Conceptual Taxonomies from Text (20)

Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
An Approach To Assess The Existence Of A Proposed Intervention In Essay-Argum...
 
Ma
MaMa
Ma
 
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...
 
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
Like Alice in Wonderland: Unraveling Reasoning and Cognition Using Analogies ...
 
FCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of OntologiesFCA-MERGE: Bottom-Up Merging of Ontologies
FCA-MERGE: Bottom-Up Merging of Ontologies
 
Cross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interfaceCross-lingual event-mining using wordnet as a shared knowledge interface
Cross-lingual event-mining using wordnet as a shared knowledge interface
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Canini09a
Canini09aCanini09a
Canini09a
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
 
10.1.1.35.8376
10.1.1.35.837610.1.1.35.8376
10.1.1.35.8376
 
Eswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies finalEswcsummerschool2010 ontologies final
Eswcsummerschool2010 ontologies final
 
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
Cerutti--Knowledge Representation and Reasoning (postgrad seminar @ Universit...
 
Method for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domainsMethod for ontology generation from concept maps in shallow domains
Method for ontology generation from concept maps in shallow domains
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 

Kürzlich hochgeladen

KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...M56BOOKSTORE PRODUCT/SERVICE
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxiammrhaywood
 
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17Celine George
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.EnglishCEIPdeSigeiro
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeCeline George
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxPurva Nikam
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustSavipriya Raghavendra
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTDR. SNEHA NAIR
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSyedNadeemGillANi
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdfJayanti Pande
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 

Kürzlich hochgeladen (20)

KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...KARNAADA.pptx  made by -  saransh dwivedi ( SD ) -  SHALAKYA TANTRA - ENT - 4...
KARNAADA.pptx made by - saransh dwivedi ( SD ) - SHALAKYA TANTRA - ENT - 4...
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptxAUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
 
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.Easter in the USA presentation by Chloe.
Easter in the USA presentation by Chloe.
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using Code
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptxOptical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINTARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
ARTICULAR DISC OF TEMPOROMANDIBULAR JOINT
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
 
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....Riddhi Kevadiya. WILLIAM SHAKESPEARE....
Riddhi Kevadiya. WILLIAM SHAKESPEARE....
 
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic SupportMarch 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 

Cooperating Techniques for Extracting Conceptual Taxonomies from Text

  • 1. Università degli studi di Bari “Aldo Moro” Dipartimento di Informatica Cooperating Techniques for Extracting Conceptual Taxonomies from Text S. Ferilli, F. Leuzzi, F. Rotella L.A.C.A.M. http://lacam.di.uniba.it:8000 AI*IA 2011 XIIth Conference of the Italian Association for Artificial Intelligence Workshop on Mining Complex Patterns (MCP 2011) Palermo, Italy, September 17, 2011
  • 2. Overview 1. Introduction & Objectives 2. Extraction of knowledge from text 3. Knowledge representation formalism 4. Identification of relevant concepts 5. Generalization of similar concepts 6. Reasoning ‘by association’ 7. Conclusions & Future works Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 2
  • 3. Introduction The spread of electronic documents and document repositories has generated the need for automatic techniques to understand and handle the documents content in order to help users in satisfying their information needs. Full Text Understading is not trivial, due to: 1. intrinsic ambiguity of natural language; 2. huge amount of common sense and conceptual background knowledge. For facing these problems lexical and/or conceptual taxonomies are useful, even if manually building is very costly and error prone. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 3
  • 4. Introduction This lack is a strong motivation towards automatic construction of conceptual networks by mining large amounts of documents in natural language. However, even assuming a correct knowledge representation, we are far to simulate human abilities yet. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 4
  • 5. Objectives 1. Definition of a representation formalism for knowledge extracted from natural language texts 2. Extraction of concepts and relevance assessment 3. Generalization of concepts having similar descriptions 4. Definition of a kind of reasoning by concept association that looks for possible indirect connections between two identified concepts Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 5
  • 6. Extraction of knowledge from text Knowledge extracted by processing each sentence separately. Stanford Stanford Parser [1] Dependencies [2] The final output of the Stanford Dependencies is a typed syntactic structure of each sentence. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 6
  • 7. Knowledge representation formalism Among all grammatical roles played by words in a sentence, only subject, verb and complement have been considered. In the final conceptual graph subjects and complements will represent concepts, while verbs will express relations between them. subject, subject, verb, complement complement Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 7
  • 8. Identification of relevant concept A mix of several techniques are brought to cooperation for identifying relevant concepts: ● Hub Words [3]: words having high frequency whose relevance is computed as: W (t )=α w 0 +β n+γ ∑ i=1 w (t i ) where: w0 , initial weight; n, # of relationships; w(ti), tf*idf weight of i-th word related to t. ● Keyword extraction techniques from single documents. ● EM Clustering provided by Weka [4] based on Euclidean distance. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 8
  • 9. Identification of relevant concept Inspired to the Hub Words approach we have defined a Relevance Weight: A B C D E w (̄) c e(̄)c ∑( c , ̄ ) w (c ) d M −d ( c ) c ̄ k (̄) c W ( ̄ )=α c +β +γ +δ +ε max c w( c ) max c e ( c ) e( ̄ ) c dM max c k ( c ) where: α + β+γ +δ +ε =1 Nodes in the network are ranked by decreasing Relevance Weight. A suitable cut-point in the ranking is determined by choosing the first item such that: W ( c k )-W (c k+1 )≥ p⋅ max ( W ( c i )-W (c i+1 ) ) i =0,.. . , n−1 where: p∈ [ 0,1 ] Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 9
  • 10. Identification of relevant concept Relevance Weight in details Definition of the Initial Weight The whole set of triples <subject,verb,complement> is represented in a Concepts x Attributes matrix V recalling the classical Terms x Documents Vector Space Model. f i, j ∣A∣ Resembling tf*idf: ⋅log ∑ k f k, j ∣{ j : c i ∈a j }∣ w (c ) ̄ Therefore component A is: α max c w ( c) where w(c) is the initial weight assigned to node c computed according to the above tf*idf schema. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 10
  • 11. Identification of relevant concept Relevance Weight in details Connections Number Component B considers the number of connections (edges) in which c is involved e(̄)c β max c e ( c ) Neighborhood Weight Summary Component C takes into account the average initial weight of all neighbors of c ∑ (c,c ) ̄ w ( c) γ e( c ) ̄ Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 11
  • 12. Identification of relevant concept Relevance Weight in details Inverse Distance form Center Component D represents the closeness to center of the cluster d M −d( c ) ̄ δ dM KE Influence Component E takes into account the outcome of three KE techniques suitably weighted: k (̄ ) c ε max c k (c ) where: k ( ̄ )=ςk co−occurrences ( ̄ )+ηk synset ( ̄ )+θk mvn ( ̄ ) c c c c Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 12
  • 13. Identification of relevant concept Relevance Weight in details 2 KE based on χ k co− occurrences=ς ● 2 co-occurrences max cluster χ kw synset ● KE based on k synset =η WordNet Synsets max ( kw synset ) KE by means kw mvn ● Multivariate Normal k mvn=θ max ( kw mvn ) Distribution (MVN) Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 13
  • 14. Identification of relevant concept Evaluations Test # α β γ δ ε p 1 0.10 0.10 0.30 0.25 0.25 1.0 2 0.20 0.15 0.15 0.25 0.25 0.7 3 0.15 0.25 0.30 0.15 0.15 1.0 Test # Concept A B C D E W 1 network 0.100 0.100 0.021 0.178 0.250 0.649 access 0.001 0.001 0.154 0.239 0.250 0.646 subset 6.32E-4 0.001 0.150 0.239 0.250 0.641 2 network 0.200 0.150 0.0105 0.178 0.250 0.789 3 network 0.150 0.250 0.021 0.146 0.150 0.717 user 0.127 0.195 0.022 0.146 0.150 0.641 number 0.113 0.187 0.022 0.146 0.150 0.619 individual 0.103 0.174 0.020 0.146 0.150 0.594 Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 14
  • 15. Generalization of similar concepts Pairwise clustering Take in account the description of each concept, consisting in a binary vector that represents presence or absence (1 or 0 respectively) of a <subject,complement> relation between the involved concepts. The Hamming distance provides a similarity evaluation between them. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 15
  • 16. Generalization of similar concepts WordNet WordNet1 is an external resource that has some useful properties: 1. lexical taxonomy 2. each concept is described as a set of synonyms (synset) 3. synsets are interlinked by means of conceptual- semantic and lexical relations We are focused on hyperonymy, a relation that links the current synset to more general ones. 1. http://wordnet.princeton.edu/ Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 16
  • 17. Generalization of similar concepts Taxonomical similarity function More general: provides a More specific: provides a similarity value on the bases of similarity value on the bases of common relations, without common relations, relying on focusing on the specific path. the specific path. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 17
  • 18. Generalization of similar concepts WSD Domain Driven One Domain per Discourse assumption: many uses of a word in a coherent portion of text tend to share the same domain. Prevalent domain Prevalent domain individuation individuation Extraction of all Extraction of all synsets for each term synsets for each term Extraction of all Extraction of all domains for each synset domains for each synset Choice of prevalent Choice of prevalent domain synset domain synset Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 18
  • 19. Generalization of similar concepts Evaluations Two toy experiments have been performed with Hamming distance threshold respectively equal to 0.001 and 0.0001, while taxonomical similarity function threshold has been kept equal to 0.4. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 19
  • 20. Reasoning ‘by association’ Breadth-First Search Given two nodes (concepts), a Breadth-First Search starts from both nodes, the former searches the latter's frontier and vice versa, until the two frontiers meet by common nodes. Then the path is restored going backward to the roots in both directions. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 20
  • 21. Reasoning ‘by association’ Evaluations The table below shows a sample of possible outcomes. E.g., an interpretation of case 5 can be: “the adults write about freedom and use platform, that is recognized as a technology, as well as the internet”. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 21
  • 22. Conclusions This work proposes an approach to extract automatic conceptual taxonomy from natural language texts. It works mixing different techniques in order to: ● identify relevant terms/concepts in text; ● generalize similar concepts; ● perform some kind of reasoning “by association”. Preliminary experiments show that this approach can be viable although extensions and refinements are needed. A reliable outcome might help users in understanding the text content and machines to automatically perform some kind of reasoning on the taxonomy. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 22
  • 23. Future works 1. Extending the knowledge representation formalism to express negation. 2. Defining a strategy to make a better choice of weights in Relevance Weight computation. 3. Enriching the adjacency matrix to improve concept descriptions. 4. ODD alternatives exploration, to overcome its limits. 5. Taxonomical similarity measures take into account only the hypernym relation, while a more accurate similarity can be obtained adding other relations. 6. Define a strategy to prefer one verb rather than keeping all of them, in reasoning ‘by association’ phase. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 23
  • 24. References [1] Dan Klein and Christopher D. Manning. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2003. [2] Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating typed dependency parses from phrase structure trees. In LREC, 2006. [3] Sang Ok Koo, Soo Yeon Lim, and Sang-Jo Lee. Constructing an ontology based on hub words. In ISMIS’03, pages 93–97, 2003. [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten. The weka data mining software: an update. SIGKDD Explorations, 11(1):10–18,2009. Cooperating Techniques for Extracting Conceptual Taxonomies from Text - S. Ferilli, F. Leuzzi, F. Rotella 24