SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Use of Ontologies in Natural
Language Processing
Athman Hajhamou
Computer and Modeling Laboratory –
USMBA- FSDM – Fès




                                     1
Summary
 Limitations of classical approaches
 Use of Ontology
 State of the Art.




                                        2
Limitations of classical
approaches
   The huge number of available
    documents on the Web makes finding
    relevant ones a challenging task. Full-
    text search that is still the most
    popular form of search provided by the
    most used services such as Google, is
    very useful to retrieve documents, but
    it is normally not suitable to find not
    yet seen relevant documents for a
    specic topic.

                                          3
Limitations of classical
approaches
   The major reasons why purely text-based search fails to
    find some of the relevant documents are the following:

    Vagueness       of     natural    language       :
    synonyms,homographs and inflection of words can all
    fool algorithms which see search terms only as a
    sequence of characters.

   High-level, vague concepts: High-level, vaguely defined
    abstract concepts like the Kosovo conict, Industrial
    Revolution or the Iraq War are often not mentioned
    explicitly in relevant documents, therefore present
    search engines cannot find those documents.




                                                          4
Limitations of classical
approaches
 Semantic    relations, like the partOf relation,
  cannot be exploited. For example, if users
  search for the Great Maghreb, they will not
  find relevant documents mentioning only
  Rabat or Morocco.
 Time     dimension:       for  handling    time
  specifications, keyword matching is not
  adequate. If we search documents about the
  “XX century” using exactly this phrase,
  relevant resources containing the character
  sequences like 1945 or 1956 will not be
  found by simple keyword matching.

                                                 5
Limitations of classical
approaches
   Most of the present systems can successfully
    handle various inflection forms of words using
    stemming algorithms, it seems that the lots of
    heuristics and ranking formulas using text-
    based statistics that were developed during
    classical IR research in the last decades
    cannot master the other mentioned issues.
    One of the reasons is that term co-
    occurrence that is used by most statistical
    methods to measure the strength of the
    semantic relation between words, is not valid
    from a linguistic-semantical point of view.

                                                 6
Limitations of classical
approaches
   Most of the present systems can successfully
    handle various inflection forms of words using
    stemming algorithms, it seems that the lots of
    heuristics and ranking formulas using text-
    based statistics that were developed during
    classical IR research in the last decades
    cannot master the other mentioned issues.
    One of the reasons is that term co-
    occurrence that is used by most statistical
    methods to measure the strength of the
    semantic relation between words, is not valid
    from a linguistic-semantical point of view.

                                                 7
Limitations of classical
approaches
   Besides term co-occurrence-based statistics
    another way to improve search effectiveness
    is to incorporate background knowledge into
    the search process. The IR community
    concentrated so far on using background
    knowledge expressed in the form of thesauri.
    Thesauri define a set of standard terms that
    can be used to index and search a document
    collection (controlled vocabulary) and a set of
    linguistic relations between those terms, thus
    promise a solution for the vagueness of
    natural language, and partially for the
    problem of high-level concepts.

                                                  8
Limitations of classical
approaches
   while intuitively one would expect to see significant
    gains in retrieval effectiveness with the use of thesauri,
    experience shows that this is usually not true.

 One of the major cause is the “noise” of thesaurus
  relations between thesaurus terms. Linguistic relations,
  such as synonyms are normally valid only between a
  specific meaning of two words, but thesauri represent
  those relations on a syntactic level.
 Another big problem is that the manual creation of
  thesauri and the annotation of documents with
  thesaurus terms is very expensive. As a result,
  annotations often incomplete or erroneous, resulting in
  decreased search performance.



                                                             9
Use of Ontology
 Ontologies       form     the      basic
  infrastructure of the Semantic Web.
 As    ontology we consider any
  formalism      with    a    well-defined
  mathematical interpretation which is
  capable at least to represent a
  subconcept        taxonomy,     concept
  instances and user defined relations
  between concepts.

                                         10
Use of Ontology
 Such formalisms allow a much more
  sophisticated representation of background
  knowledge than classical thesauri. They
  represent knowledge on the semantic level,
  i.e., they contain semantic entities(concepts,
  relations and instances) instead of simple
  words, which eliminates the mentioned noise
  from the relations.
 They allow specifying custom semantic
  relations between entities, and also to store
  well-known facts and axioms about a
  knowledge domain (including temporal
  information).

                                               11
Use of Ontology
   Based on that, ontologies theoretically
    solve all of the mentioned problems of
    full   text    search.   Unfortunately,
    ontologies and semantic annotations
    using them are hardly ever perfect for
    the same reasons that were described
    at thesauri. Indeed, presently good
    quality ontologies and semantic
    annotations are a very scarce
    resource.

                                          12
State Of the Art


 Ontologies as Background Knowledge
    to Explore Document Collections

   Nathalie Aussenac-Gilles & Josiane Mothe


   Institut de Recherche en Informatique de Toulouse




                                                       13
Ontologies as Background Knowledge to
Explore Document Collections

   An alternative way to go beyond bags of words could be
    to organise indexing terms into a more complex
    structure than "bags", such as a hierarchy or an
    ontology. Texts would be indexed by concepts that
    reflect their meaning rather than words considered as
    chart lists with all the ambiguity that they convey.

   Nathalie A. & Josiane M. promote an approach where
    information search and exploration take place in a
    domain-dependant semantic context which is described
    through its controlled vocabulary organized along
    hierarchies which are all extracted from a single and
    unifying domain ontology. Each hierarchy reveals a
    given point of view on the domain, that is to say a
    dimension.


                                                         14
Ontologies as Background Knowledge to
Explore Document Collections

   In this approach, the ontology and derived hierarchies
    provide the query language for users. Not only can the
    concept hierarchies be browsed by the user, who can
    select the terms he wants to add to his query, but they
    also allow them to explore the information space
    according to different points of view, through the domain
    vocabulary and its structure.

   Given a domain, a use defines its own information
    space. It is composed of a selection of hierarchies or
    dimensions among the set of possible ones. This
    selection depicts his focus of interest, and lead to
    identify the associated documents.




                                                            15
Ontologies as Background Knowledge to
Explore Document Collections

   Dimensions and their visualization
    define a novel way to provide the
    users      with   global   views    and
    knowledge of the document collection.
    A key component of this approach is
    that the domain ontology allows to
    define a visual presentation of the
    entire collection or of a sub-collection
    based on multi-dimensional analysis,
    as it is done in OLAP systems.

                                           16
Ontologies as Background Knowledge to
Explore Document Collections




                                        17
Ontologies as Background Knowledge to
Explore Document Collections




                                        18
Ontologies as Background Knowledge to
Explore Document Collections

   Strengths :
     with the help of the ontology, users should express
     their needs more easily.
     documents can be seen under many dimensions (or
     points of view) that could be used in order to extract
     some knowledge from their content.
    For the document categorization task, q concept from
     an ontology can be viewed as a category.
   Weaknesses :
    building an ontology is a complex and time-
     consuming task: experts (domain and ontology
     experts) often manually do it.
    the evolution of domain knowledge is problematic, for
     example new terms appear, other terms are no longer
     used.

                                                          19
State Of the Art


   Ontological Profiles as Semantic
       Domain Representations

      Geir Solskinnsbakk & Jon Atle Gulla

    Norwegian University of Science and Technology




                                                     20
Ontological Profiles as Semantic Domain
Representations

 Ontologies for query disambiguation or reformulation
  seem more promising, though there is a fundamental
  problem with comparing ontology concepts with query
  or document terms. Concepts are abstract notions that
  are not necessarily linked to a particular term. Some
  times there may be a number of terms that refer to the
  same concepts, and some times a specific term may be
  realizations of different concepts depending on the
  context.
 Using conceptual structures to index or retrieve
  document text requires that there is something bridging
  the conceptual and real world.
 Research indicates that ontologies are of little use if
  they are not aligned with the documents indexed by the
  search application.


                                                        21
Ontological Profiles as Semantic Domain
Representations

   Geir S. & Jon A. G. present an ontology
    enrichment approach that both bridges
    the conceptual and real world and
    ensures that the ontology is well adapted
    to the documents at hand.

   The idea is to provide contextual concept
    characterizations that reveal how the
    concepts are referred to semantically in
    the document collection.

                                            22
Ontological Profiles as Semantic Domain
Representations

   An ontological profile is an extension of a domain
    ontology. The ontology is extended with semantically
    related terms. These terms are added as vectors for
    each of the concepts of the ontology.

   This means that in the ontological profile each concept
    is associated with a vector of semantically related terms
    (concept vector). The terms are given weights to reflect
    the importance of the semantic relation between the
    concept and the terms.

   The concept vectors typically contain terms that are
    synonyms to the concept.




                                                            23
Ontological Profiles as Semantic Domain
Representations




                                          24
Ontological Profiles as Semantic Domain
Representations

   The construction of these ontological profiles is
    based on three different aspects of the content of
    the documents used.
    The first is that we apply statistical techniques,
     counting the frequency of the terms in the documents.
     Terms that co-occur with a concept more frequent are
     hypothesized to be more relevant for a concept than
     terms that do not co-occur as frequently.
    The second is that we apply linguistic techniques, i.e.
     stemming, to collapse certain terms into a single
     form.
    The third aspect is that we use a proximity analysis of
     the text. The assumption that lies behind the
     proximity analysis is that the closer terms are found in
     the text, the more semantically related they are.


                                                            25
Ontological Profiles as Semantic Domain
Representations




                                          26
Ontological Profiles as Semantic Domain
Representations

   We give the highest weight to terms that are found in
    the same sentence as the concept name phrase (the
    highest semantic coherence), terms found in the same
    paragraph as the concept are given lower weight than
    sentence-terms, and higher than document terms.

   The basis for the weight calculation is the term
    frequency for each term found in the relevant
    documents.

   Applying the familiar tf*idf score to the frequencies we
    get closer to the final representation of the vectors. The
    idf factor gives more importance to terms that are found
    in few documents across the document collection.



                                                             27
Ontological Profiles as Semantic Domain
Representations


       is the term frequency for term i in
    concept vector j, is the term frequency
    for term i in document vector k, D, P, and
    S are the possibly empty sets of relevant
    documents, paragraph documents and
    sentence documents as signed to j, and
    a=01, b=10, and c=100 are the constant
    modifiers for documents, paragraph
    documents, and sentence documents,
    respectively.
                                             28
Ontological Profiles as Semantic Domain
Representations



       is the tfidf score for term i in
    concept vector j,        is the term
    frequency for term i in concept vector
    j,    is the frequency of the most
    frequent occurring term i in concept
    vector j, N is the number of concept
    vectors, and n is the number of
    concept vectors containing term i.

                                          29
Ontological Profiles as Semantic Domain
Representations

   Strengths :
     This approach based on ontological profile is used as a
      tool for semantic reformulation of queries on top of a
      standard vector space based search engine (Appach
      Lucene), using the reformulated query as a query into the
      index. This approach lets the system hide from the user the
      fact that an ontology is used, and the user is only faced
      with entering familiar keyword queries.
   Weaknesses :
     In this approach the concept name is considered as a
      phrase query into the three indexes, and all documents
      containing the phrase are assigned to the concept as
      relevant. Of course, using the concept name as a phrase
      query into the three indexes imposes a challenge; some of
      the concept names are artificial in their construction or are
      not used in the form given in the concept. This means that
      many of the concepts are not found during the assignment
      of documents to the concepts.

                                                                  30
State Of the Art


    An Ontology-Based Information
            Retrieval Model

 David Vallet, Miriam Fernández & Pablo Castells

           Universidad Autónoma de Madrid




                                                   31
An Ontology-Based Information Retrieval
Model

   David V, Miriam F. & Pablo C. propose an ontology-based
    retrieval model meant for the exploitation of full-fledged
    domain ontologies and knowledge bases, to support semantic
    search in document repositories. In contrast to boolean
    semantic search systems, in this perspective full documents,
    rather than specific ontology values from a KB, are returned
    in response to user information needs. The search system
    takes advantage of both detailed instance-level knowledge
    available in the KB, and topic taxonomies for classification.

   This approach includes an ontology-based scheme for the
    semi-automatic annotation of documents, and a retrieval
    system. The retrieval model is based on an adaptation of the
    classic vector-space model, including an annotation weighting
    algorithm, and a ranking algorithm.




                                                                32
An Ontology-Based Information Retrieval
Model

   David V, Miriam F. & Pablo C. propose an ontology-based
    retrieval model meant for the exploitation of full-fledged
    domain ontologies and knowledge bases, to support semantic
    search in document repositories. In contrast to boolean
    semantic search systems, in this perspective full documents,
    rather than specific ontology values from a KB, are returned
    in response to user information needs. The search system
    takes advantage of both detailed instance-level knowledge
    available in the KB, and topic taxonomies for classification.

   This approach includes an ontology-based scheme for the
    semi-automatic annotation of documents, and a retrieval
    system. The retrieval model is based on an adaptation of the
    classic vector-space model, including an annotation weighting
    algorithm, and a ranking algorithm.




                                                                33
An Ontology-Based Information Retrieval
Model

   The system requires that the knowledge base be
    constructed from three main base classes:
    DomainConcept, Taxonomy, and Document.
    DomainConcept should be the root of all domain
     classes that can be used (directly or after
     subclassing) to create instances that describe specific
     entities referred to in the documents.
    Document is used to create instances that act as
     proxies of documents from the in-formation source to
     be searched upon.
    Taxonomy is the root for class hierarchies that are
     merely used as classification schemes, and are never
     instantiated. These taxonomies are expected to be
     used as a terminology to annotate documents and
     concept classes, using them as values of dedicated
     properties.

                                                           34
An Ontology-Based Information Retrieval
Model

   The predefined base ontology classes described above are
    complemented with an annotation ontology that provides the
    basis for the semantic indexing of documents with non-
    embedded annotations.

   Documents are annotated with concept instances from the
    KB by creating instances of the Annotation class, provided for
    this    purpose.     Annotation     has     two      relational
    properties, instance and document, by which concepts and
    documents            are           related            together.
    Reciprocally, DomainConcept and Document have a
    multivalued annotation property.

   Annotations can be created manually by a domain expert, or
    semi-automatically. The subclasses ManualAnnotation and
    AutomaticAnnotation are used respectively



                                                                  35
An Ontology-Based Information Retrieval
Model

   DomainConcept instances use a label property to
    store the most usual text form of the concept
    class or instance. This property is multivalued,
    since instances may have several textual lexical
    variants.

   Whenever the label of an instance is found, an
    annotation is created between the instance and
    the document. In the system, documents can be
    annotated with classes as well, by assigning
    labels to concept classes.

   The annotations are used by the retrieval and
    ranking module

                                                   36
An Ontology-Based Information Retrieval
Model

   In the classic vector-space model, keywords
    appearing in a document are assigned weights
    reflecting that some words are better at
    discriminating between documents than others.

   In this approach similarly annotations are
    assigned a weight that reflects how relevant the
    instance is considered to be for the document
    meaning.

   Weights are computed automatically by an
    adaptation of the TF-IDF algorithm based on the
    frequency of occurrence of the instances in each
    document.

                                                   37
An Ontology-Based Information Retrieval
Model



    wij is the weight of instance Ii for
    document Dj,      is the number of
    occurrences of Ii in Dj,       is the
    frequency of the most repeated
    instance in Dj, ni is the number of
    documents annotated with Ii, and N is
    the total number of documents in the
    search space.

                                          38
An Ontology-Based Information Retrieval
Model

   The system takes as input a formal RDQL query.
    This query could be generated from a keyword
    query, a natural language query, a form-based
    interface where the user can explicitly select
    ontology classes and enter property values, or
    more sophisticated search interfaces.

   The RDQL query is executed against the
    knowledge base, which returns a list of instance
    tuples that satisfy the query and the documents
    that are annotated with these instances are
    retrieved, ranked, and presented to the user.



                                                   39
An Ontology-Based Information Retrieval
Model




                                          40
An Ontology-Based Information Retrieval
Model

   Strengths :
    Better recall when querying for class
     instances and using class hierarchies and
     rules.
    Better precision by using query weights and
     structured semantic queries.
   Weaknesses :
    The degree of improvement of this semantic
     retrieval model depends on the completeness
     and quality of the ontology, the KB, and the
     concept labels.

                                                41
State Of the Art


    Improving information retrieval
     effectiveness by using domain
     knowledge stored in ontologies

               Gabor Nagypal

        University of Karlsruhe, Germany




                                           42
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies

   The quality of results that traditional full-text search engines
    provide is still not optimal for many types of user queries.
    Especially the vagueness of natural languages, abstract
    concepts, semantic relations and temporal issues are
    handled inadequately by full-text search. Ontologies and
    semantic metadata can provide a solution for these problems.

   The goal of this thesis is to examine and validate whether and
    how ontologies can help improving retrieval effectiveness in
    information systems, considering the inherent imperfection of
    ontology-based domain models and annotations.

   This work examines how ontologies can be optimally
    exploited during the information retrieval process, and
    proposes a general framework which is based on ontology-
    supported semantic metadata generation and ontology-based
    query expansion.


                                                                   43
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies

   This research evaluates the following hypotheses :

    Ontologies allow to store domain knowledge in a much
     more sophisticated form than thesauri. We therefore
     assume that by using ontologies in IR systems a significant
     gain in retrieval effectiveness can be measured.
    The better (more precise) an ontology models the
     application domain, the more gain is achieved in retrieval
     effectiveness.
    It is possible to diminish the negative effect of ontology
     imperfection on search results by combining different
     ontology-based heuristics during the search process.
    It is a well-known fact that there is a trade-of between
     algorithm complexity and performance. This insight is also
     true for ontologies. Still, assumption of this approach is that
     by combining ontologies with traditional IR methods, it is
     possible to provide results with acceptable performance.

                                                                   44
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies

   Background knowledge stored in the form of
    ontologies can be used at practically every step
    of the IR process.

   In this work, solutions are there fore provided for
    the issues of ontology based query extension,
    ontology-supported query formulation and
    ontology-supported       metadata       generation
    (indexing).

   This leads to a conceptual system architecture
    where the Ontology Manager component has a
    central role, and it is extensively used by the
    Indexer, Search Engine and GUI components .

                                                         45
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies




                                                         46
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies

   The information model defines how documents and the
    user query are represented in the system. The model
    used in this work represents the content of a resource
    as a weighted set of instances (bag of ontology
    instances) from a suitable domain ontology (the
    conceptual part) together with a weighted set temporal
    intervals (the temporal part).

   The representation of the conceptual part is practically
    identical with the information model used by classical IR
    engines built on the vector space model, with the
    difference that vector terms are ontology instances
    instead of words in a natural language.




                                                            47
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies

   Time as a continuous phenomenon has different
    characteristics than the discrete conceptual part
    of the information model. The first question
    according time is how to define similarity among
    weighted sets of time intervals.

   A possible solution which is being considered, is
    to use the temporal vector space model. The
    main idea of the model is that if we choose a
    discrete time representation, the lowest level of
    granules can be viewed as terms and the vector
    space model is applicable also for the time
    dimension.


                                                         48
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies

   During query formulation we use the ontology only to
    disambiguate queries specified in textual form. By
    running classical full-text search on ontology labels,
    users only have to choose the proper term
    interpretation.

    Query process applies various ontology-based
    heuristics one-by-one to create separate queries which
    are executed independently using a traditional full-text
    engine. The ranked results are then combined together
    to form the final ranked result list. The combination of
    results is based on the belief network model which
    allows the combination of various evidences using
    Bayesian inference.



                                                           49
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies




                                                         50
Improving information retrieval effectiveness by using
domain knowledge stored in ontologies

   Strengths :
     This work validate that the proposed solution significantly
      improves retrieval effectiveness of information systems and
      thus provides a strong motivation for developing ontologies
      and semantic metadata.
     The gradual approach described allows a smooth transition
      from classical text-based systems to ontology-based ones.

   Weaknesses :
     A problem with the temporal vector space approach is the
      potentially huge number of time granules which are
      generated for big time intervals. E.g. to represent the
      existence time of concepts such as the Middle Ages,
      potentially many tens of thousand terms are needed if we
      use days as granules.



                                                                51

Weitere ähnliche Inhalte

Was ist angesagt?

Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignmentOntology engineering: Ontology alignment
Ontology engineering: Ontology alignment
Guus Schreiber
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
Chapter 1 semantic web
Chapter 1 semantic webChapter 1 semantic web
Chapter 1 semantic web
R A Akerkar
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 

Was ist angesagt? (20)

Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Information architecture
Information architectureInformation architecture
Information architecture
 
Ontology engineering: Ontology alignment
Ontology engineering: Ontology alignmentOntology engineering: Ontology alignment
Ontology engineering: Ontology alignment
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Information Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slidesInformation Retrieval Models for Recommender Systems - PhD slides
Information Retrieval Models for Recommender Systems - PhD slides
 
Altis: AWS Snowflake Practice
Altis: AWS Snowflake PracticeAltis: AWS Snowflake Practice
Altis: AWS Snowflake Practice
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
 
IR
IRIR
IR
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Chapter 1 semantic web
Chapter 1 semantic webChapter 1 semantic web
Chapter 1 semantic web
 
Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining technique
 
Benefits of Taxonomies
Benefits of TaxonomiesBenefits of Taxonomies
Benefits of Taxonomies
 
The semantic web
The semantic web The semantic web
The semantic web
 
ONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESSONTOLOGY BASED DATA ACCESS
ONTOLOGY BASED DATA ACCESS
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
 
Signature files
Signature filesSignature files
Signature files
 
UX: internal search for e-commerce
UX: internal search for e-commerceUX: internal search for e-commerce
UX: internal search for e-commerce
 
Taxonomies and Metadata
Taxonomies and MetadataTaxonomies and Metadata
Taxonomies and Metadata
 

Andere mochten auch

Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
Sadaf Rafiq
 
Ontological Analysis and Conceptual Modelling: Achievements and Perspectives
Ontological Analysis and Conceptual Modelling: Achievements and PerspectivesOntological Analysis and Conceptual Modelling: Achievements and Perspectives
Ontological Analysis and Conceptual Modelling: Achievements and Perspectives
Nicola Guarino
 
KR Workshop 1 - Ontologies
KR Workshop 1 - OntologiesKR Workshop 1 - Ontologies
KR Workshop 1 - Ontologies
Michele Pasin
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
Atul Shridhar
 
Intriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformIntriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platform
toncho11
 

Andere mochten auch (20)

Using ontology for natural language processing
Using ontology for natural language processingUsing ontology for natural language processing
Using ontology for natural language processing
 
SAS University Edition - Getting Started
SAS University Edition - Getting StartedSAS University Edition - Getting Started
SAS University Edition - Getting Started
 
Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
 
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
Talent Sourcing and Matching - Artificial Intelligence and Black Box Semantic...
 
Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...
 
Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD
 
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud WebinarBoost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
 
Semantic Matching of Components at Run-Time in Distributed Environments
Semantic Matching of Components at Run-Time in Distributed EnvironmentsSemantic Matching of Components at Run-Time in Distributed Environments
Semantic Matching of Components at Run-Time in Distributed Environments
 
Information Retrieval Using an Ontological Web-Trading Model
Information Retrieval Using an Ontological Web-Trading ModelInformation Retrieval Using an Ontological Web-Trading Model
Information Retrieval Using an Ontological Web-Trading Model
 
Ontological Analysis and Conceptual Modelling: Achievements and Perspectives
Ontological Analysis and Conceptual Modelling: Achievements and PerspectivesOntological Analysis and Conceptual Modelling: Achievements and Perspectives
Ontological Analysis and Conceptual Modelling: Achievements and Perspectives
 
KR Workshop 1 - Ontologies
KR Workshop 1 - OntologiesKR Workshop 1 - Ontologies
KR Workshop 1 - Ontologies
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Semantic Search Engines
Semantic Search EnginesSemantic Search Engines
Semantic Search Engines
 
Intriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platformIntriduction to Ontotext's KIM platform
Intriduction to Ontotext's KIM platform
 
Adding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to DeliveryAdding Semantic Edge to Your Content – From Authoring to Delivery
Adding Semantic Edge to Your Content – From Authoring to Delivery
 
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog TextTwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
TwitIE: An Open-Source Information Extraction Pipeline for Microblog Text
 
Ontological approach for improving semantic web search results
Ontological approach for improving semantic web search resultsOntological approach for improving semantic web search results
Ontological approach for improving semantic web search results
 
A Taxonomy of Semantic Web data Retrieval Techniques
A Taxonomy of Semantic Web data Retrieval TechniquesA Taxonomy of Semantic Web data Retrieval Techniques
A Taxonomy of Semantic Web data Retrieval Techniques
 
In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?In Search of a Semantic Book Search Engine: Are We There Yet?
In Search of a Semantic Book Search Engine: Are We There Yet?
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 

Ähnlich wie Use of ontologies in natural language processing

Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object model
Mihika Shah
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
IJwest
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
cscpconf
 
Nguyen
NguyenNguyen
Nguyen
anesah
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
eswcsummerschool
 

Ähnlich wie Use of ontologies in natural language processing (20)

SWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professionalSWSN UNIT-3.pptx we can information about swsn professional
SWSN UNIT-3.pptx we can information about swsn professional
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONSONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
ONTOLOGICAL MODEL FOR CHARACTER RECOGNITION BASED ON SPATIAL RELATIONS
 
Swoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic SearchSwoogle: Showcasing the Significance of Semantic Search
Swoogle: Showcasing the Significance of Semantic Search
 
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
A Corpus-based Analysis of the Terminology of the Social Sciences and Humanit...
 
Ontology
OntologyOntology
Ontology
 
Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...Building an Ontology in Educational Domain Case Study for the University of P...
Building an Ontology in Educational Domain Case Study for the University of P...
 
Representation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object modelRepresentation of ontology by Classified Interrelated object model
Representation of ontology by Classified Interrelated object model
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
Association Rule Mining Based Extraction of Semantic Relations Using Markov L...
 
A Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital LibrariesA Domain Based Approach to Information Retrieval in Digital Libraries
A Domain Based Approach to Information Retrieval in Digital Libraries
 
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...Association Rule Mining Based Extraction of  Semantic Relations Using Markov ...
Association Rule Mining Based Extraction of Semantic Relations Using Markov ...
 
SMalL - Semantic Malware Log Based Reporter
SMalL  - Semantic Malware Log Based ReporterSMalL  - Semantic Malware Log Based Reporter
SMalL - Semantic Malware Log Based Reporter
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
What is What, When?
What is What, When?What is What, When?
What is What, When?
 
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
 
Nguyen
NguyenNguyen
Nguyen
 
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using OntologiesESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
ESWC SS 2012 - Tuesday Tutorial Elena Simperl: Creating and Using Ontologies
 
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits A Natural Logic for Artificial Intelligence, and its Risks and Benefits
A Natural Logic for Artificial Intelligence, and its Risks and Benefits
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
 
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITSA NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
A NATURAL LOGIC FOR ARTIFICIAL INTELLIGENCE, AND ITS RISKS AND BENEFITS
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Use of ontologies in natural language processing

  • 1. Use of Ontologies in Natural Language Processing Athman Hajhamou Computer and Modeling Laboratory – USMBA- FSDM – Fès 1
  • 2. Summary  Limitations of classical approaches  Use of Ontology  State of the Art. 2
  • 3. Limitations of classical approaches  The huge number of available documents on the Web makes finding relevant ones a challenging task. Full- text search that is still the most popular form of search provided by the most used services such as Google, is very useful to retrieve documents, but it is normally not suitable to find not yet seen relevant documents for a specic topic. 3
  • 4. Limitations of classical approaches  The major reasons why purely text-based search fails to find some of the relevant documents are the following:  Vagueness of natural language : synonyms,homographs and inflection of words can all fool algorithms which see search terms only as a sequence of characters.  High-level, vague concepts: High-level, vaguely defined abstract concepts like the Kosovo conict, Industrial Revolution or the Iraq War are often not mentioned explicitly in relevant documents, therefore present search engines cannot find those documents. 4
  • 5. Limitations of classical approaches  Semantic relations, like the partOf relation, cannot be exploited. For example, if users search for the Great Maghreb, they will not find relevant documents mentioning only Rabat or Morocco.  Time dimension: for handling time specifications, keyword matching is not adequate. If we search documents about the “XX century” using exactly this phrase, relevant resources containing the character sequences like 1945 or 1956 will not be found by simple keyword matching. 5
  • 6. Limitations of classical approaches  Most of the present systems can successfully handle various inflection forms of words using stemming algorithms, it seems that the lots of heuristics and ranking formulas using text- based statistics that were developed during classical IR research in the last decades cannot master the other mentioned issues. One of the reasons is that term co- occurrence that is used by most statistical methods to measure the strength of the semantic relation between words, is not valid from a linguistic-semantical point of view. 6
  • 7. Limitations of classical approaches  Most of the present systems can successfully handle various inflection forms of words using stemming algorithms, it seems that the lots of heuristics and ranking formulas using text- based statistics that were developed during classical IR research in the last decades cannot master the other mentioned issues. One of the reasons is that term co- occurrence that is used by most statistical methods to measure the strength of the semantic relation between words, is not valid from a linguistic-semantical point of view. 7
  • 8. Limitations of classical approaches  Besides term co-occurrence-based statistics another way to improve search effectiveness is to incorporate background knowledge into the search process. The IR community concentrated so far on using background knowledge expressed in the form of thesauri. Thesauri define a set of standard terms that can be used to index and search a document collection (controlled vocabulary) and a set of linguistic relations between those terms, thus promise a solution for the vagueness of natural language, and partially for the problem of high-level concepts. 8
  • 9. Limitations of classical approaches  while intuitively one would expect to see significant gains in retrieval effectiveness with the use of thesauri, experience shows that this is usually not true.  One of the major cause is the “noise” of thesaurus relations between thesaurus terms. Linguistic relations, such as synonyms are normally valid only between a specific meaning of two words, but thesauri represent those relations on a syntactic level.  Another big problem is that the manual creation of thesauri and the annotation of documents with thesaurus terms is very expensive. As a result, annotations often incomplete or erroneous, resulting in decreased search performance. 9
  • 10. Use of Ontology  Ontologies form the basic infrastructure of the Semantic Web.  As ontology we consider any formalism with a well-defined mathematical interpretation which is capable at least to represent a subconcept taxonomy, concept instances and user defined relations between concepts. 10
  • 11. Use of Ontology  Such formalisms allow a much more sophisticated representation of background knowledge than classical thesauri. They represent knowledge on the semantic level, i.e., they contain semantic entities(concepts, relations and instances) instead of simple words, which eliminates the mentioned noise from the relations.  They allow specifying custom semantic relations between entities, and also to store well-known facts and axioms about a knowledge domain (including temporal information). 11
  • 12. Use of Ontology  Based on that, ontologies theoretically solve all of the mentioned problems of full text search. Unfortunately, ontologies and semantic annotations using them are hardly ever perfect for the same reasons that were described at thesauri. Indeed, presently good quality ontologies and semantic annotations are a very scarce resource. 12
  • 13. State Of the Art Ontologies as Background Knowledge to Explore Document Collections Nathalie Aussenac-Gilles & Josiane Mothe Institut de Recherche en Informatique de Toulouse 13
  • 14. Ontologies as Background Knowledge to Explore Document Collections  An alternative way to go beyond bags of words could be to organise indexing terms into a more complex structure than "bags", such as a hierarchy or an ontology. Texts would be indexed by concepts that reflect their meaning rather than words considered as chart lists with all the ambiguity that they convey.  Nathalie A. & Josiane M. promote an approach where information search and exploration take place in a domain-dependant semantic context which is described through its controlled vocabulary organized along hierarchies which are all extracted from a single and unifying domain ontology. Each hierarchy reveals a given point of view on the domain, that is to say a dimension. 14
  • 15. Ontologies as Background Knowledge to Explore Document Collections  In this approach, the ontology and derived hierarchies provide the query language for users. Not only can the concept hierarchies be browsed by the user, who can select the terms he wants to add to his query, but they also allow them to explore the information space according to different points of view, through the domain vocabulary and its structure.  Given a domain, a use defines its own information space. It is composed of a selection of hierarchies or dimensions among the set of possible ones. This selection depicts his focus of interest, and lead to identify the associated documents. 15
  • 16. Ontologies as Background Knowledge to Explore Document Collections  Dimensions and their visualization define a novel way to provide the users with global views and knowledge of the document collection. A key component of this approach is that the domain ontology allows to define a visual presentation of the entire collection or of a sub-collection based on multi-dimensional analysis, as it is done in OLAP systems. 16
  • 17. Ontologies as Background Knowledge to Explore Document Collections 17
  • 18. Ontologies as Background Knowledge to Explore Document Collections 18
  • 19. Ontologies as Background Knowledge to Explore Document Collections  Strengths :  with the help of the ontology, users should express their needs more easily.  documents can be seen under many dimensions (or points of view) that could be used in order to extract some knowledge from their content. For the document categorization task, q concept from an ontology can be viewed as a category.  Weaknesses : building an ontology is a complex and time- consuming task: experts (domain and ontology experts) often manually do it. the evolution of domain knowledge is problematic, for example new terms appear, other terms are no longer used. 19
  • 20. State Of the Art Ontological Profiles as Semantic Domain Representations Geir Solskinnsbakk & Jon Atle Gulla Norwegian University of Science and Technology 20
  • 21. Ontological Profiles as Semantic Domain Representations  Ontologies for query disambiguation or reformulation seem more promising, though there is a fundamental problem with comparing ontology concepts with query or document terms. Concepts are abstract notions that are not necessarily linked to a particular term. Some times there may be a number of terms that refer to the same concepts, and some times a specific term may be realizations of different concepts depending on the context.  Using conceptual structures to index or retrieve document text requires that there is something bridging the conceptual and real world.  Research indicates that ontologies are of little use if they are not aligned with the documents indexed by the search application. 21
  • 22. Ontological Profiles as Semantic Domain Representations  Geir S. & Jon A. G. present an ontology enrichment approach that both bridges the conceptual and real world and ensures that the ontology is well adapted to the documents at hand.  The idea is to provide contextual concept characterizations that reveal how the concepts are referred to semantically in the document collection. 22
  • 23. Ontological Profiles as Semantic Domain Representations  An ontological profile is an extension of a domain ontology. The ontology is extended with semantically related terms. These terms are added as vectors for each of the concepts of the ontology.  This means that in the ontological profile each concept is associated with a vector of semantically related terms (concept vector). The terms are given weights to reflect the importance of the semantic relation between the concept and the terms.  The concept vectors typically contain terms that are synonyms to the concept. 23
  • 24. Ontological Profiles as Semantic Domain Representations 24
  • 25. Ontological Profiles as Semantic Domain Representations  The construction of these ontological profiles is based on three different aspects of the content of the documents used. The first is that we apply statistical techniques, counting the frequency of the terms in the documents. Terms that co-occur with a concept more frequent are hypothesized to be more relevant for a concept than terms that do not co-occur as frequently. The second is that we apply linguistic techniques, i.e. stemming, to collapse certain terms into a single form. The third aspect is that we use a proximity analysis of the text. The assumption that lies behind the proximity analysis is that the closer terms are found in the text, the more semantically related they are. 25
  • 26. Ontological Profiles as Semantic Domain Representations 26
  • 27. Ontological Profiles as Semantic Domain Representations  We give the highest weight to terms that are found in the same sentence as the concept name phrase (the highest semantic coherence), terms found in the same paragraph as the concept are given lower weight than sentence-terms, and higher than document terms.  The basis for the weight calculation is the term frequency for each term found in the relevant documents.  Applying the familiar tf*idf score to the frequencies we get closer to the final representation of the vectors. The idf factor gives more importance to terms that are found in few documents across the document collection. 27
  • 28. Ontological Profiles as Semantic Domain Representations  is the term frequency for term i in concept vector j, is the term frequency for term i in document vector k, D, P, and S are the possibly empty sets of relevant documents, paragraph documents and sentence documents as signed to j, and a=01, b=10, and c=100 are the constant modifiers for documents, paragraph documents, and sentence documents, respectively. 28
  • 29. Ontological Profiles as Semantic Domain Representations  is the tfidf score for term i in concept vector j, is the term frequency for term i in concept vector j, is the frequency of the most frequent occurring term i in concept vector j, N is the number of concept vectors, and n is the number of concept vectors containing term i. 29
  • 30. Ontological Profiles as Semantic Domain Representations  Strengths :  This approach based on ontological profile is used as a tool for semantic reformulation of queries on top of a standard vector space based search engine (Appach Lucene), using the reformulated query as a query into the index. This approach lets the system hide from the user the fact that an ontology is used, and the user is only faced with entering familiar keyword queries.  Weaknesses :  In this approach the concept name is considered as a phrase query into the three indexes, and all documents containing the phrase are assigned to the concept as relevant. Of course, using the concept name as a phrase query into the three indexes imposes a challenge; some of the concept names are artificial in their construction or are not used in the form given in the concept. This means that many of the concepts are not found during the assignment of documents to the concepts. 30
  • 31. State Of the Art An Ontology-Based Information Retrieval Model David Vallet, Miriam Fernández & Pablo Castells Universidad Autónoma de Madrid 31
  • 32. An Ontology-Based Information Retrieval Model  David V, Miriam F. & Pablo C. propose an ontology-based retrieval model meant for the exploitation of full-fledged domain ontologies and knowledge bases, to support semantic search in document repositories. In contrast to boolean semantic search systems, in this perspective full documents, rather than specific ontology values from a KB, are returned in response to user information needs. The search system takes advantage of both detailed instance-level knowledge available in the KB, and topic taxonomies for classification.  This approach includes an ontology-based scheme for the semi-automatic annotation of documents, and a retrieval system. The retrieval model is based on an adaptation of the classic vector-space model, including an annotation weighting algorithm, and a ranking algorithm. 32
  • 33. An Ontology-Based Information Retrieval Model  David V, Miriam F. & Pablo C. propose an ontology-based retrieval model meant for the exploitation of full-fledged domain ontologies and knowledge bases, to support semantic search in document repositories. In contrast to boolean semantic search systems, in this perspective full documents, rather than specific ontology values from a KB, are returned in response to user information needs. The search system takes advantage of both detailed instance-level knowledge available in the KB, and topic taxonomies for classification.  This approach includes an ontology-based scheme for the semi-automatic annotation of documents, and a retrieval system. The retrieval model is based on an adaptation of the classic vector-space model, including an annotation weighting algorithm, and a ranking algorithm. 33
  • 34. An Ontology-Based Information Retrieval Model  The system requires that the knowledge base be constructed from three main base classes: DomainConcept, Taxonomy, and Document. DomainConcept should be the root of all domain classes that can be used (directly or after subclassing) to create instances that describe specific entities referred to in the documents. Document is used to create instances that act as proxies of documents from the in-formation source to be searched upon. Taxonomy is the root for class hierarchies that are merely used as classification schemes, and are never instantiated. These taxonomies are expected to be used as a terminology to annotate documents and concept classes, using them as values of dedicated properties. 34
  • 35. An Ontology-Based Information Retrieval Model  The predefined base ontology classes described above are complemented with an annotation ontology that provides the basis for the semantic indexing of documents with non- embedded annotations.  Documents are annotated with concept instances from the KB by creating instances of the Annotation class, provided for this purpose. Annotation has two relational properties, instance and document, by which concepts and documents are related together. Reciprocally, DomainConcept and Document have a multivalued annotation property.  Annotations can be created manually by a domain expert, or semi-automatically. The subclasses ManualAnnotation and AutomaticAnnotation are used respectively 35
  • 36. An Ontology-Based Information Retrieval Model  DomainConcept instances use a label property to store the most usual text form of the concept class or instance. This property is multivalued, since instances may have several textual lexical variants.  Whenever the label of an instance is found, an annotation is created between the instance and the document. In the system, documents can be annotated with classes as well, by assigning labels to concept classes.  The annotations are used by the retrieval and ranking module 36
  • 37. An Ontology-Based Information Retrieval Model  In the classic vector-space model, keywords appearing in a document are assigned weights reflecting that some words are better at discriminating between documents than others.  In this approach similarly annotations are assigned a weight that reflects how relevant the instance is considered to be for the document meaning.  Weights are computed automatically by an adaptation of the TF-IDF algorithm based on the frequency of occurrence of the instances in each document. 37
  • 38. An Ontology-Based Information Retrieval Model  wij is the weight of instance Ii for document Dj, is the number of occurrences of Ii in Dj, is the frequency of the most repeated instance in Dj, ni is the number of documents annotated with Ii, and N is the total number of documents in the search space. 38
  • 39. An Ontology-Based Information Retrieval Model  The system takes as input a formal RDQL query. This query could be generated from a keyword query, a natural language query, a form-based interface where the user can explicitly select ontology classes and enter property values, or more sophisticated search interfaces.  The RDQL query is executed against the knowledge base, which returns a list of instance tuples that satisfy the query and the documents that are annotated with these instances are retrieved, ranked, and presented to the user. 39
  • 40. An Ontology-Based Information Retrieval Model 40
  • 41. An Ontology-Based Information Retrieval Model  Strengths : Better recall when querying for class instances and using class hierarchies and rules. Better precision by using query weights and structured semantic queries.  Weaknesses : The degree of improvement of this semantic retrieval model depends on the completeness and quality of the ontology, the KB, and the concept labels. 41
  • 42. State Of the Art Improving information retrieval effectiveness by using domain knowledge stored in ontologies Gabor Nagypal University of Karlsruhe, Germany 42
  • 43. Improving information retrieval effectiveness by using domain knowledge stored in ontologies  The quality of results that traditional full-text search engines provide is still not optimal for many types of user queries. Especially the vagueness of natural languages, abstract concepts, semantic relations and temporal issues are handled inadequately by full-text search. Ontologies and semantic metadata can provide a solution for these problems.  The goal of this thesis is to examine and validate whether and how ontologies can help improving retrieval effectiveness in information systems, considering the inherent imperfection of ontology-based domain models and annotations.  This work examines how ontologies can be optimally exploited during the information retrieval process, and proposes a general framework which is based on ontology- supported semantic metadata generation and ontology-based query expansion. 43
  • 44. Improving information retrieval effectiveness by using domain knowledge stored in ontologies  This research evaluates the following hypotheses : Ontologies allow to store domain knowledge in a much more sophisticated form than thesauri. We therefore assume that by using ontologies in IR systems a significant gain in retrieval effectiveness can be measured. The better (more precise) an ontology models the application domain, the more gain is achieved in retrieval effectiveness. It is possible to diminish the negative effect of ontology imperfection on search results by combining different ontology-based heuristics during the search process. It is a well-known fact that there is a trade-of between algorithm complexity and performance. This insight is also true for ontologies. Still, assumption of this approach is that by combining ontologies with traditional IR methods, it is possible to provide results with acceptable performance. 44
  • 45. Improving information retrieval effectiveness by using domain knowledge stored in ontologies  Background knowledge stored in the form of ontologies can be used at practically every step of the IR process.  In this work, solutions are there fore provided for the issues of ontology based query extension, ontology-supported query formulation and ontology-supported metadata generation (indexing).  This leads to a conceptual system architecture where the Ontology Manager component has a central role, and it is extensively used by the Indexer, Search Engine and GUI components . 45
  • 46. Improving information retrieval effectiveness by using domain knowledge stored in ontologies 46
  • 47. Improving information retrieval effectiveness by using domain knowledge stored in ontologies  The information model defines how documents and the user query are represented in the system. The model used in this work represents the content of a resource as a weighted set of instances (bag of ontology instances) from a suitable domain ontology (the conceptual part) together with a weighted set temporal intervals (the temporal part).  The representation of the conceptual part is practically identical with the information model used by classical IR engines built on the vector space model, with the difference that vector terms are ontology instances instead of words in a natural language. 47
  • 48. Improving information retrieval effectiveness by using domain knowledge stored in ontologies  Time as a continuous phenomenon has different characteristics than the discrete conceptual part of the information model. The first question according time is how to define similarity among weighted sets of time intervals.  A possible solution which is being considered, is to use the temporal vector space model. The main idea of the model is that if we choose a discrete time representation, the lowest level of granules can be viewed as terms and the vector space model is applicable also for the time dimension. 48
  • 49. Improving information retrieval effectiveness by using domain knowledge stored in ontologies  During query formulation we use the ontology only to disambiguate queries specified in textual form. By running classical full-text search on ontology labels, users only have to choose the proper term interpretation.  Query process applies various ontology-based heuristics one-by-one to create separate queries which are executed independently using a traditional full-text engine. The ranked results are then combined together to form the final ranked result list. The combination of results is based on the belief network model which allows the combination of various evidences using Bayesian inference. 49
  • 50. Improving information retrieval effectiveness by using domain knowledge stored in ontologies 50
  • 51. Improving information retrieval effectiveness by using domain knowledge stored in ontologies  Strengths :  This work validate that the proposed solution significantly improves retrieval effectiveness of information systems and thus provides a strong motivation for developing ontologies and semantic metadata.  The gradual approach described allows a smooth transition from classical text-based systems to ontology-based ones.  Weaknesses :  A problem with the temporal vector space approach is the potentially huge number of time granules which are generated for big time intervals. E.g. to represent the existence time of concepts such as the Middle Ages, potentially many tens of thousand terms are needed if we use days as granules. 51