SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
International Journal of Engineering Research and Development
ISSN: 2278-067X, Volume 1, Issue 11 (July 2012), PP.01-10
www.ijerd.com

  Design and Development of Integrated Biomedical Ontology for
         Information Extraction from Medline Abstracts
                                      Dr.B.LShivakumar1, R.Porkodi2
             1
                Professor and Head, Department of Computer Applications, SNR Sons College,Coimbatore -6
            2
                Assistant Professor, Department of Computer Science, Bharathair University, Coimbatore – 46


Abstract––Due to the ever-increasing amount of scientific articles in the bio-medical domain, Information Extraction in Text
Mining has been recognized as one of the key technologies for future bio-medical research. The Information extraction from
these biomedical domains plays a vital role in bioinformatics field. Thus, bioinformatics researchers extend their work in
both information extraction and construction of biomedical knowledge sources. In knowledge extraction, researchers are
involved to develop efficient and effective technique by combining Natural Language Processing (NLP) and text mining
techniques to find out and extract information and significant associations among the extracted information. In the other
side bioinformatics researchers are busy with the construction of knowledge sources or repositories related to biomedical
domain which simplify the work of the researchers in knowledge extraction process. This paper presents a semiautomatic
framework that integrates the well-known two ontologies Gene Ontology (GO) and Medical Subject Heading (MeSH)
ontology by adding of semantic mappings or relations between GO terms, Gene names and MESH keywords related to a
particular disease (Alzheimer disease). The integrated ontology has validated in all three aspects such as structural,
syntactic and semantic validation measures. This framework is used to discover significant associations or relationships
between proteins and genes related to Alzheimer disease that are extracted from Medline abstracts.

Keywords––Ontology, Alzheimer Disease, GO, MeSH, Stemming, Tagging.

                                            I.          INTRODUCTION
          NCBI is a fast growing knowledge source for bioinformatics community, which has Medline database [1] that
currently contains over 15 million citations of biological abstracts and it is growing by more than 40,000 abstracts per
month. To extract desired information directly from biological literature is a challenging problem in text mining and Natural
Language Processing (NLP). Many biomedical information sources have been developed and used in extraction process.

         Most text mining methods use vector space model to represent a document. The vector space model represents a
document as a feature vector of terms contained in it. Each feature vector contains term weights and similarity between
documents is computed using various similarity measures. This approach not considered the semantic relations of terms in
documents. The ontology approach represents an effective knowledge representation within controlled vocabulary. The
Wordnet ontology [14] is a lexical database for general English covering most of the general English concepts. In biomedical
domain, the Unified Medical Language System (UMLS) framework [13] includes much biomedical ontology.

          This paper integrates the Gene ontology (GO), Medical Subject Headings (MeSH) and all human genes which
include genes that cause Alzheimer disease in human. Alzheimer‟s disease (AD) is the most common cause of progressive
decline of cognitive function in aged humans, and it is characterized by the presence of numerous senile plaques and
neurofibrillary tangles accompanied by neuronal loss. The integrated ontology is developed in protégé tool which is the
famous tool for designing ontology. The protégé tool provides facilities such as visualization of concepts which clearly show
the semantic relations of a concept, query the some results based on the concepts, object properties and data properties, etc.

           The paper is organized as follows: Review of literature related to this work is presented in section 2. In section 3,
brief introduction on biomedical ontologies are presented and section 4 the ontology based framework is elaborately
discussed. The experimental design and results discussion is presented in section 5. Finally, this paper is concluded in
section 6.

                                          II.           RELATED WORK
          A lot of NLP based works have been reported for the past decades related to concept extraction [2], association
rule discovery [3, 4] and extracting relationships among various concepts [5, 6]. Many approaches have been developed for
extracting significant associations and interactions among various biological entities [5, 6, and 7] and discovering protein-
disease associations. However, these approaches have not been produced promising results, due to inconsistencies prevailed
in gene names. Related to gene names extractions, paper [8] has presented the extraction of gene names from articles‟ titles
and abstracts and identified genes related to colon cancer disease. The paper [6] has presented a statistical approach for
discovering group of genes related to breast cancer disease. In paper [5], author constructed a relationships network among
biomedical entities which are extracted from Medline abstracts.

          In paper [9] the authors proposed new text mining approach which utilizes the concept of expectation, evidence a
Z-score in determining significant associations between genes and Alzheimer disease. In paper [10], researchers expressed

                                                               1
Design and Development of Integrated Biomedical Ontology for Information Extraction from …

the method    using association and functional relationship discovery algorithm in extracting gene relations from Medline
abstracts.

          Recent works have been reported that ontology is a useful tool to improve the performance of any text mini ng
tasks such as text clustering and association rule mining. In text clustering the paper [11] uses conceptual features that are
extracted from text using ontology and prove that ontology could improve the performance of text clustering. The paper
[12] shows the case study on the integration of biomedical information in to ontology. In paper [21] author proposed bio
ontology methodology and compared this with other bio-ontologies. The limitations and benefits of GO ontology are
expressed in paper [22]. The author had studied the strength and limitation of biomedicine ontologies based on its text and
concept representation [23]

          This paper presents a new ontology that integrates the famous two ontologies such as Gene Ontology (GO) and
MeSH by adding of semantic mappings or relations between GO terms, Gene names and MESH keywords related to a
particular disease (Alzheimer disease). Finally the integrated ontology has validated based on syntactic, structural and
semantic validation measures in order to prove its correctness and validity.

                                III.          BIOMEDICAL ONTOLOGIES
          The Molecular Biology Ontology (MBO) [16] was the first attempt to begin to define the entities in the domain to
promote consistent interpretation across resources. A second phase saw the adoption of ontology by the biological
community itself. Pre-eminent among these is the Gene Ontology (GO) [15]. The Microarray Gene Expression Data
(MGED) ontology [18] provides a vocabulary for describing a biological sample used in an experiment, the treatment that
the sample receives in the experiment and the microarray chip technology used in the experiment. The Functional Genomics
Ontology (FUGO) [17] is another type of ontology in the field of bioinformatics. The next popular ontology MeSH thesaurus
is the NLM's controlled vocabulary for subject indexing in MEDLINE. It is structured in a hierarchy of descriptors, with
each descriptor including a set of concepts, and each concept itself containing a set of terms, which are synonyms and lexical
variants.

         The next coming sections give a brief explanation on Gene Ontology (GO) and MeSH ontology that are integrated
in our work.


3.1 GO
          Biological knowledge is most often represented in „bio-ontologies‟ that are formal representations of knowledge
areas in which the essential concepts are combined with properties that describe relationships between concepts. Bio-
ontologies are constructed according to textual descriptions of biological activities. One of the most popular bio-ontology is
Gene Ontology (GO) [15] that contains more than 18 thousands terms. The GO ontology is a controlled vocabulary of gene
and protein roles in cells, addressing the need for consistent description of gene products. This is mainly used in almost all
biological researches and to predict the gene functions based on patterns of annotation. The GO describes the molecular
function of a gene product, the biological process in which the gene product participates, and the cellular component where
the gene product can be found.

3.2 MeSH
          Medical Subject Headings (MeSH) [24] is another popular ontology designed by the National Library of Medicine
which mainly consists of the controlled vocabulary and a MeSH Tree. The controlled vocabulary contains several different
types of terms, such as Descriptor, Qualifiers, Scope note, Tree number and Entry terms. Descriptor terms are main
concepts or main headings. Entry terms are the synonyms or the related terms to descriptors. For example, “Amyloid beta-
Protein Precursor” as a descriptor has the following entry terms “Amyloid A4 Protein Precursor”, “Amyloid beta Precursor
Protein”, “Amyloid Protein Precursor”, etc. MeSH descriptors are organized in a MeSH Tree, which can be seen as a MeSH
Concept Hierarchy. In the MeSH Tree there are 15 categories (e.g. category A for anatomic terms) and each category is
further divided into subcategories.

         For example, the MeSH tree structure of Alzheimer Disease is shown in Fig. 1. For each subcategory,
corresponding descriptors are hierarchically arranged from most general to most specific. In addition to its ontology role,
MeSH descriptors are originally used to index MEDLINE articles.




                                                              2
Design and Development of Integrated Biomedical Ontology for Information Extraction from …




                                  Fig. 1 MeSH Tree Structure for Alzheimer Disease

                                 IV.           PROPOSED FRAMEWORK
          The proposed framework shown in Fig. 2 integrates the two popular ontologies GO and MeSH by mapping with
Gene details used to extract significant associations among concepts from Medline abstracts related to Alzheimer disease.
The main objective of the integration of two ontologies is making use of semantic relations among the concepts in Medline
abstracts using MeSH ontology terms. The gene products of genes are referred from GO in order to find out the associations
of gene products related to Alzheimer disease genes. The integrated ontology consists of MeSH concepts related to
Alzheimer disease, linking of Alzheimer disease MeSH concepts to proteins that cause this Alzheimer disease, linking of
Alzheimer disease proteins to genes that inhibit Alzheimer disease and finally linking of Alzheimer disease genes to
respective gene products in which we identify the exact molecular functions that result in Alzheimer disease, biological
processes in which the gene product participates to result in Alzheimer disease and the cellular component where the gene
product of Alzheimer disease can be found. The components of this framework are explained below.




                                       Fig. 2 The Proposed Ontology Framework

          The first step in this work is to do preprocessing to transform Medline abstracts, which typically are strings of
characters into a suitable representation.

    a. Removal of stop-words: The stop-words are high frequent words that carry no information (i.e. pronouns,
         prepositions, conjunctions etc.). Removal of stop-words improves clustering results [19].
    b. Stemming: By word stemming it means the process of suffix removal to generate word stems. The Porter stemmer
         [20] which is a well-known algorithm is used for this task.
    c. Filtering: Domain vocabulary V in ontology is used for filtering. By filtering, document is considered with related
         domain words (term). It can reduce the documents dimensions. The filtering task used in our work filters the
         documents related to Alzheimer disease.
    d. Tagging: The concepts in Medline abstracts are identified using Genia tagger [26] and the identified concepts are
         mapped with concepts related to the categories specified in the proposed ontology. The categories used in the
         proposed ontology are gene, MeSH and GO.



                                                            3
Design and Development of Integrated Biomedical Ontology for Information Extraction from …

    e. Semantic analysis & Concept mapping:
     Class/Concept                                                 Description
  GO_0003674             This is the base class for molecular function. All molecular functions are the subclass of
                         GO_0003674 and the respective molecular function class for a gene is mapped with a
                         particular gene.
  GO_0005575             This is the base class for cellular components. All cellular components are the subclass of
                         GO_0005575 and the respective cellular component for a gene is mapped with a particular
                         gene.
  GO_0008150             This is the base class for biological process. All biological process are the subclass of
                         GO_0008150 and the respective biological process for a gene is mapped with a particular
                         gene.
  GO_Functionality       This class specifies the three GO functionalities as class such as cellular component, molecular
                         function and biological process and these classes are mapped with the above three classes.
  Gene                   This specifies all human genes.
  Gene_Type              This specifies the gene type for a gene such as protein coding, pseudo coding and unknown.
  Mesh                   This specifies the MeSH keywords that include proteins, disease level, etc. for Alzheimer
                         disease.
                                     Table 1. Main Concepts in Proposed Ontology

           After preprocessing, the extracted concepts in Medline abstracts are analyzed in terms of semantic meaning and
added to ontology, if it is not available. The concepts or classes used in this integrated ontology are shown in Table 1. The
first step for adding concept is to find out the equivalent concept from the ontology and add the concept and possible
semantic relations which includes object properties and data properties into ontology, if it is not found. Some of the
important object properties and data properties created in the integrated ontology shown in Table 2 and 3.

           For example, the gene name “A2MP1” is extracted from Medline abstract and this term or concept is to be added
to the ontology. If there is an equivalent gene “A2M” is already in the ontology, add “A2MP1” to ontology and assign is-a
relationship along with other possible properties such as has_go, has_synonym, has_genetype_as, has_inducing_protien,
inhibits, etc. between “A2M” and “A2MP1”.

                Object properties                                         Description
            belongs_to                     This property is used to map GO classes.
            curated_GO_References          This property is used to map genes referred in Pub Med literature
            has_gene                       This is used to relate gene with GO
            has_genetype_as                The gene types protein coding, pseudo coding and unknown is
                                           mapped with genes.
            has_go                         This is used to all applicable GO concepts are mapped with gene at all
                                           functional level.

            has_inducing_protien           This is used to map disease with respective proteins.
            has _synonym                   This is used to specify all possible synonyms for a gene.
            Inhibits                       This property is used to map gene with disease.
            is_found_in                    This is used to map disease with gene and it has transitive relationship
                                           with inhibits property.


                                   Table 2. List of Object Properties in Proposed Ontology

              Data Properties                                         Description
            gene_annotations        This property is used to specify different data base reference for a particular
                                    gene such as ENSENBL, HNGC, HPRD, MIM and UNIPROT.
            gene_descriptions       This is used to specify the alternate name and full name for gene.
            has_gene_id             This is used to specify the gene identifier for gene
            is_in_chromosome        This property is used to specify the chromosome map location and
                                    chromosome number.
                                    Table 3. List of Data Properties in Proposed Ontology

                           V.           EXPERIMENTAL DESIGN & RESULTS
          The integrated ontology consists of three different main concepts or classes which are GO term functionalities, all
human genes with or without related to a particular disease (in this ontology all human genes with or without related to
Alzheimer disease are considered) and MeSH terms related to a particular disease. The sub classes for GO term
functionalities class are all possible GO functionalities for the genes that are added into the ontology and the Gene main
class consists of all human genes with or without related to a particular disease. The subclasses created for MeSH class
includes all disease branches in which Alzheimer disease is derived, amino acids, peptides and proteins related to a particular
disease. The integrated ontology provides all types of information related to GO terms, genes and a particular disease details.
                                                              4
Design and Development of Integrated Biomedical Ontology for Information Extraction from …

The ontology can be manipulated in different ways in which the most important manipulation techniques are using OntGraf
and DL query. The structural evaluation is necessary for ontology to verify the consistency, if it is not structurally evaluated,
it may produce some wrong results or inconsistent results when we manipulate information from the ontology. The
visualization of concepts with its semantic relations is experimented using OntGraf tool. The important subclasses for MeSH
main concepts or classes are shown in Fig. 3. The visualization of human genes related to Alzheimer disease is shown in Fig.
4. The visualization of proteins that are inducing Alzheimer disease is shown in Fig. 5.




                                           Fig. 3 The Overview of MeSH Concept

         Another way to manipulate the ontology is using DL query tool. This is an effective tool to retrieve any kind of
semantic related information from the given ontology. Some of the information retrieval queries and results are shown in
below figures. The Fig. 6 shows the extraction of Alzheimer Disease




                                         Fig. 4 Genes Related to Alzheimer Disease


                                                               5
Design and Development of Integrated Biomedical Ontology for Information Extraction from …

related human genes with the respective DL query and the Fig. 7 shows the extraction of protein names that inducing
Alzheimer disease. Finally the Fig. 8 shows the extraction of human genes that inhibits Alzheimer disease in particular
chromosome level, in our data set there are 3 genes (gene identifiers mapped with genes are shown in Fig. 8) that inhibits
Alzheimer disease in chromosome level “10”.




                                     Fig. 5 Proteins Related to Alzheimer Disease




                  Fig. 6 Genes Extracted by DL query “Gene and inhibits some Alzheimer_Disease”




                                                            6
Design and Development of Integrated Biomedical Ontology for Information Extraction from …




         Fig. 7 Proteins Extracted by DL query “Proteins and has_inducing_disease some Alzheimer_Disease




                               Fig. 8 Mapping Identifiers for Genes Extracted by DL query

5.1 Validating the Ontology
           The integrated ontology has to be validated to check the correctness. This section explores the evaluation methods
used for validating our proposed ontology framework. The proposed ontology is syntactically verified for its consistency
using FACT ++ reasoner available in protégé tool. The next validation method is semantic validation and the semantic
validation of the ontology is verified by the domain experts. This ontology is validated by domain experts in biological field.
Another evaluation method to validate the ontology is structural validation. The structural validation is performed by the
different metrics defined in paper [25] that are class match measure, density measure, betweeness measure and semantic
similarity measure. The ontology concept is ranked based on the total score of all the four metrics. The weights are assigned
based on the concept representation and the weights are assigned in such a way the overall score lies between 0 and 1.

Class Match Measure (CMM) – This measure evaluates the ontology for the specified concepts. The specified concepts are
searched in the ontology to determine the occurrence of it. If it occurs directly as a concept, the maximum weight will be
given for the specified concept. If it partially occurs as instances of any class, then the 50% of maximum weight may be
assigned. The CMM evaluates the concepts either as exact match or partial match found in the ontology.

Density Measure (DEM) - The DEM evaluates the ontology based on the degree of richness of attributes of a specified
concept and includes the details of subclasses, inner attributes, siblings and relations with other classes in the ontology. The
weight may be assigned based on the degree of richness of attributes of a concept.

Betweenness Measure (BEM) – This measure evaluates the ontology based on centrality of a specified concept in the
ontology. The centrality of a concept is computed using the count of shortest path between the specified concept and other
concepts in the ontology. Based on the shortest path, weight may be assigned.


                                                               7
Design and Development of Integrated Biomedical Ontology for Information Extraction from …

Semantic Similarity Measure (SSM) – The SSM evaluates the ontology based on the proximity of classes in the ontology
the specified concept matches, that is the count of links the specified concept has to map with the existing concepts in the
ontology.

5.2 Dataset
           We framed 4 corpuses from our integrated ontology to validate it and each corpus consists of concepts of ontology
and its important properties. Each corpus is the superset of the previous one. The corpus C1 consists of main concepts that
include more subclasses and have rich relations or links with other sub concepts. The corpus C2 consists of sub concepts in
C1 and other concepts in the ontology. The corpus C3 is the subset of C1 and has concepts in C1 and two important
properties related to those concepts. The corpus C4 contains concepts in C1 and three important properties related to those
concepts. All 4 corpuses framed from the ontology shown in Table 4 and the overall score is computed as follows from the
above mentioned measures. Let O be the set of corpuses framed from the proposed ontology; Let wi be a weight factor and M
be the different similarity metrics such as CMM, DEM, SSM and BEM.




          From the overall score it is found that the corpus C1 has the maximum score as it considered concepts as direct
match. The score may be lesser when the concept with partial match is found. The corpus C2 is found to have less score and
ranked as 4 due to the DEM and BEM measure score values. The DEM and BEM measure gives lowest score, because
concepts in C2 have no related inner attributes and links with other concepts in C2. The corpus C3 is found to have second
highest score due to the CMM and SSM score values, since the concepts in C2 are direct concepts and have good number of
links among concepts in C3. The corpus C4 is found to have third highest score due to the CMM and SSM score values and
also C4 is the sub set of C3.

          All the four metrics are provided with equal weights and we found that some of the corpuses may produce low
score due to the DEM and BEM measures. The DEM and BEM score values may be increased when we use different
weights. In our proposed ontology, we found that the concepts and its relations are linked correctly and further some of the
missing relations may be added in future as to produce more promising results.

                                  Corpus(constructed from        Score        Rank
                                        Ontology)
                                             C1                       0.79         1
                                             C2                       0.39         4
                                             C3                       0.46         2
                                             C4                       0.41         3
                                    Table 4. Overall Scores and Ranks for Corpuses

           Finally the class match measure produces high score when there is an exact match found in the ontology. This
score may decrease when there is a partial match found in the ontology. The density measure score found to be good when
more relations exists among concepts. The betweenness measure found to be good when the concepts related with more
other concepts in the ontology. The semantic similarity measure is found to be good when the concept have more synonyms
and its relations. The corpuses Vs metrics score is represented in a bar chart is shown in Fig. 9.




                               Fig. 9 The Bar chart of Corpus Vs Similarity Metric Score




                                                             8
Design and Development of Integrated Biomedical Ontology for Information Extraction from …

                                            VI.           CONCLUSION
           We studied almost all biomedical ontologies and identified all their merits and demerits. In consideration with this
in mind the integrated ontology has proposed by accumulating the essential features represented in all specified ontologies.
The integrated ontology is implemented in protégé tool that consists of three main concepts namely GO term functionalities,
all human genes with or without related to Alzheimer Disease and MeSH terms related to the same disease. This frame work
also addresses the problems of GO ontology, in which all information are given as annotations and that are not directly
accessible by the user, because information of these kind are given as http links. The MeSH ontology represents the entry
terms for the particular term and associated links in various repositories such as Pub Med, Medline, MIM, etc. In our work,
all associated information of GO functionalities, genes are specified directly in our ontology, not as links. This ontology
gives all possible semantic relations applicable for all concepts defined in the ontology. The ontology is also evaluated for its
correctness and validity using various metrics. In the results of the experiments, we found that the ontology is modeled
correctly by providing necessary concepts and relations. The ontology may be further improved by adding more relations to
the existing concepts of gene and MeSH to get a higher score.

          Further the integrated ontology to be used in association rule mining to extract the significant associations among
the proteins, genes that are related to Alzheimer Disease. Instead of using simple vector space model to calculate the term
frequency and inverse document frequency from the Medline abstracts, we decide to use ontology approach to consider the
semantic relationships of terms that appear in the Medline abstracts may give better results.

                                     VII.           ACKNOWLEDGMENT
         This work was performed as part of the Minor Research Project, which is supported and funded by University
Grants Commission, New Delhi, India.

                                                     REFERENCES
  [1].     NCBI PubMed, http://www.ncbi.nlm.nih. Gov /entrez/query.fcgi
  [2].     Uramoto, N., H. Matsuzawa, T. Nagano, A. Murami and H. Takeuchi, 2004. A text-mining system for knowledge
           discovery from biomedical documents.
  [3].     Hristovski, D., J. Stare, B. Peterlin and S. Dzeroski, 2001. Supporting discovery in medicine by association rule
           mining in Medline and UMLS. Proc. MedInfo Conf., London, England, Sep. 2-5, 10: 1344-1348.
  [4].     Creighton, C. and S. Hanash, 2003. Mining gene expression databases for association rules. Bioinformatics, 19-1:
           79-86.
  [5].     Wren, J.D., R. Bekeredjian, J.A. Stewart, R.V. Shohet and H.R. Garner, 2004. Knowledge discovery by automated
           identification and ranking of implicit relationships. Bioinformatics, 20: 3.
  [6].     Adamic, L.A., D. Wilkinson, B.A. Huberman and E. Adar, 2002. A literature based method for identifying gene-
           disease connections. IEEE Computer Soc. Bioinformatics Conf.
  [7].     Palakal, M., M. Stephens, S. Mukhopadhay, R. Raje and S. Rhodes, 2002. A Multi-level Text Mining Method to
           Extract Biological Relationships. Proc. IEEE Computer Soc. Bioinformatics (CSB) Conf., pp: 97-108.
  [8].     Wilkinson, D.M. and B.A. Huberman, 2004. A method for finding communities of related genes. Proc. Natl. Acad.
           Sci. U.S.A., 101 Suppl. 1: 5241- 5248.
  [9].     Hisham Al-Mubaid and Rajit K Singh, 2005. A New Text Mining Approach for Finding Protein-to-Disease
           Associations, American Journal of Biochemistry and Biotechnology 1 (3): 145-152, ISSN 1553-3668.
  [10].    M. Stephens, M. Palakal, S. Mukhopadhyay, R. Raje, 2001. Detecting Gene Relations From Medline Abstracts,
           Pacific Symposium on Biocomputing 6:483-496.
  [11].    A. Hotho,A.Maedche and S.Staab, “Ontology-based text document clustering”[A], Proc. of the Conf. on Intelligent
           Information Systems[C],2003.
  [12].    Paulo Gottgtroy1, Prof. Nik Kasabov1, Stephen MacDonell1, 2004. An ontology driven approach for
           knowledge discovery in Biomedicine.
  [13].    R. Kleinsorge, C. Tilley, and J.Willis. (2000). Unified Medical Language System (UMLS) Basics [Online].
           Available: http://www.nlm.nih.gov/ research/umls/pdf/UMLS_Basics.pdf.
  [14].    G. A. Miller, “WordNet: A lexical database for English,” Commun. ACM, vol. 38, pp. 39–41, 1995.
  [15].    http://www.geneontology.org/
  [16].    Schulze-Kremer S. Adding semantics to genome databases: Towards ontology for molecular biology. In:
           Proceedings of the Fifth International Conference for Intelligent Systems for Molecular Biology Conference
           (ISMB), 1997; pp. 272–5.
  [17].    http://www.fugo.org
  [18].    Whetzel PL, Parkinson H, Causton HC, et al. The MGED Ontology: a resource for semantics-based description of
           microarray experiments. Bioinformatics 2006;22:866–73.
  [19].    Mark Sinka and David Corne, “A Large Benchmark Dataset for Web Document Clustering”, In Soft Computering
           Sytems:Design,Management And Application, Vol.87 of Frontiers in Artifical Intelligence and Applications, pages
           881-890,2002.
  [20].    M.F.Porter, “An Algorithm for Suffix Stripping”, Program 14(3), July 1980, pp.130-137.
  [21].    Robert Stevebs et.al., “Ontology based knowledge representation for bioinformatics”, published in briefings in
           Bioinformatics, 2000.
  [22].    Barry Smith, et. Al., “The Ontology of the Gene Ontology”, Proceedings of AMIA Symposium 2003.


                                                               9
Design and Development of Integrated Biomedical Ontology for Information Extraction from …

[23].    Olivier Corby et.al., “Searching the Semantic Web: Approximate Query processing based on bio ontologies”.
         Published in IEEE Computer Society, 2006.
[24].    http://www.ncbi.nlm.nih.gov/mesh/ meshhome.html
[25].    Amal Zouaq, Roger Nkambou, “Building Domain Ontologies from Text for Educational purposes”. IEEE
         Transactions on Learning Technologies, Vol.1, No.1, Jan-Mar 2008.
[26].    http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/




                                                       10

Weitere ähnliche Inhalte

Was ist angesagt?

Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...CSCJournals
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourseijitcs
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...ijitcs
 
Ontologies for big data
Ontologies for big dataOntologies for big data
Ontologies for big dataYu Lin
 
Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...IAEME Publication
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformaticsijtsrd
 
Knowledge Driven User Interfaces for Complex Biological Queries
Knowledge Driven User Interfaces for Complex Biological QueriesKnowledge Driven User Interfaces for Complex Biological Queries
Knowledge Driven User Interfaces for Complex Biological Queriesalexander garcia
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachijseajournal
 
Inference Networks for Molecular Database Similarity Searching
Inference Networks for Molecular Database Similarity SearchingInference Networks for Molecular Database Similarity Searching
Inference Networks for Molecular Database Similarity SearchingCSCJournals
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsPragya Pai
 
Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Akash Arora
 
The physics behind systems biology
The physics behind systems biologyThe physics behind systems biology
The physics behind systems biologyImam Rosadi
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...ijcsit
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformaticsAbhishek Vatsa
 
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discoveryLars Juhl Jensen
 

Was ist angesagt? (18)

Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (2) Issue...
 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
 
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
A Review of Various Methods Used in the Analysis of Functional Gene Expressio...
 
Ontologies for big data
Ontologies for big dataOntologies for big data
Ontologies for big data
 
Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...Construction of phylogenetic tree from multiple gene trees using principal co...
Construction of phylogenetic tree from multiple gene trees using principal co...
 
Computational of Bioinformatics
Computational of BioinformaticsComputational of Bioinformatics
Computational of Bioinformatics
 
AI for drug discovery
AI for drug discoveryAI for drug discovery
AI for drug discovery
 
Knowledge Driven User Interfaces for Complex Biological Queries
Knowledge Driven User Interfaces for Complex Biological QueriesKnowledge Driven User Interfaces for Complex Biological Queries
Knowledge Driven User Interfaces for Complex Biological Queries
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approach
 
Inference Networks for Molecular Database Similarity Searching
Inference Networks for Molecular Database Similarity SearchingInference Networks for Molecular Database Similarity Searching
Inference Networks for Molecular Database Similarity Searching
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research
 
The physics behind systems biology
The physics behind systems biologyThe physics behind systems biology
The physics behind systems biology
 
Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...Improving the effectiveness of information retrieval system using adaptive ge...
Improving the effectiveness of information retrieval system using adaptive ge...
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Ai and biology
Ai and biologyAi and biology
Ai and biology
 
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discovery
 

Andere mochten auch

Andere mochten auch (7)

335 340
335 340335 340
335 340
 
22 27
22 2722 27
22 27
 
Seminar mol biol_1_spring_2013
Seminar mol biol_1_spring_2013Seminar mol biol_1_spring_2013
Seminar mol biol_1_spring_2013
 
521 524
521 524521 524
521 524
 
119 128
119 128119 128
119 128
 
3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree3.[18 22]hybrid association rule mining using ac tree
3.[18 22]hybrid association rule mining using ac tree
 
Historia De Java
Historia De JavaHistoria De Java
Historia De Java
 

Ähnlich wie www.ijerd.com

Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseasesNlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseaseseSAT Journals
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Chimezie Ogbuji
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...ijcsa
 
An Automatic Approach for Bilingual Tuberculosis Ontology Based on Ontology D...
An Automatic Approach for Bilingual Tuberculosis Ontology Based on Ontology D...An Automatic Approach for Bilingual Tuberculosis Ontology Based on Ontology D...
An Automatic Approach for Bilingual Tuberculosis Ontology Based on Ontology D...TELKOMNIKA JOURNAL
 
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASESDOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASEScscpconf
 
Domain ontology development for communicable diseases
Domain ontology development for communicable diseasesDomain ontology development for communicable diseases
Domain ontology development for communicable diseasescsandit
 
4.1 introduction to bioinformatics
4.1 introduction to bioinformatics4.1 introduction to bioinformatics
4.1 introduction to bioinformaticsPrabhakar Pawar
 
Research proposal sjtu
Research proposal sjtuResearch proposal sjtu
Research proposal sjtuAqsa Qambrani
 
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 UpdateBioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Updatedolleyj
 

Ähnlich wie www.ijerd.com (20)

Bio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anweshaBio ontology drtc-seminar_anwesha
Bio ontology drtc-seminar_anwesha
 
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseasesNlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseases
 
Bio informatics
Bio informaticsBio informatics
Bio informatics
 
Bio informatics
Bio informaticsBio informatics
Bio informatics
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...
 
An Automatic Approach for Bilingual Tuberculosis Ontology Based on Ontology D...
An Automatic Approach for Bilingual Tuberculosis Ontology Based on Ontology D...An Automatic Approach for Bilingual Tuberculosis Ontology Based on Ontology D...
An Automatic Approach for Bilingual Tuberculosis Ontology Based on Ontology D...
 
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASESDOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES
DOMAIN ONTOLOGY DEVELOPMENT FOR COMMUNICABLE DISEASES
 
Domain ontology development for communicable diseases
Domain ontology development for communicable diseasesDomain ontology development for communicable diseases
Domain ontology development for communicable diseases
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
4.1 introduction to bioinformatics
4.1 introduction to bioinformatics4.1 introduction to bioinformatics
4.1 introduction to bioinformatics
 
Research proposal sjtu
Research proposal sjtuResearch proposal sjtu
Research proposal sjtu
 
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 UpdateBioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
BioCuration 2019 - Evidence and Conclusion Ontology 2019 Update
 

Mehr von IJERD Editor

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEIJERD Editor
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignIJERD Editor
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationIJERD Editor
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string valueIJERD Editor
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingIJERD Editor
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart GridIJERD Editor
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
 

Mehr von IJERD Editor (20)

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACE
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding Design
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and Verification
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive Manufacturing
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string value
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF Dxing
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart Grid
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powder
 

Kürzlich hochgeladen

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 

Kürzlich hochgeladen (20)

INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 

www.ijerd.com

  • 1. International Journal of Engineering Research and Development ISSN: 2278-067X, Volume 1, Issue 11 (July 2012), PP.01-10 www.ijerd.com Design and Development of Integrated Biomedical Ontology for Information Extraction from Medline Abstracts Dr.B.LShivakumar1, R.Porkodi2 1 Professor and Head, Department of Computer Applications, SNR Sons College,Coimbatore -6 2 Assistant Professor, Department of Computer Science, Bharathair University, Coimbatore – 46 Abstract––Due to the ever-increasing amount of scientific articles in the bio-medical domain, Information Extraction in Text Mining has been recognized as one of the key technologies for future bio-medical research. The Information extraction from these biomedical domains plays a vital role in bioinformatics field. Thus, bioinformatics researchers extend their work in both information extraction and construction of biomedical knowledge sources. In knowledge extraction, researchers are involved to develop efficient and effective technique by combining Natural Language Processing (NLP) and text mining techniques to find out and extract information and significant associations among the extracted information. In the other side bioinformatics researchers are busy with the construction of knowledge sources or repositories related to biomedical domain which simplify the work of the researchers in knowledge extraction process. This paper presents a semiautomatic framework that integrates the well-known two ontologies Gene Ontology (GO) and Medical Subject Heading (MeSH) ontology by adding of semantic mappings or relations between GO terms, Gene names and MESH keywords related to a particular disease (Alzheimer disease). The integrated ontology has validated in all three aspects such as structural, syntactic and semantic validation measures. This framework is used to discover significant associations or relationships between proteins and genes related to Alzheimer disease that are extracted from Medline abstracts. Keywords––Ontology, Alzheimer Disease, GO, MeSH, Stemming, Tagging. I. INTRODUCTION NCBI is a fast growing knowledge source for bioinformatics community, which has Medline database [1] that currently contains over 15 million citations of biological abstracts and it is growing by more than 40,000 abstracts per month. To extract desired information directly from biological literature is a challenging problem in text mining and Natural Language Processing (NLP). Many biomedical information sources have been developed and used in extraction process. Most text mining methods use vector space model to represent a document. The vector space model represents a document as a feature vector of terms contained in it. Each feature vector contains term weights and similarity between documents is computed using various similarity measures. This approach not considered the semantic relations of terms in documents. The ontology approach represents an effective knowledge representation within controlled vocabulary. The Wordnet ontology [14] is a lexical database for general English covering most of the general English concepts. In biomedical domain, the Unified Medical Language System (UMLS) framework [13] includes much biomedical ontology. This paper integrates the Gene ontology (GO), Medical Subject Headings (MeSH) and all human genes which include genes that cause Alzheimer disease in human. Alzheimer‟s disease (AD) is the most common cause of progressive decline of cognitive function in aged humans, and it is characterized by the presence of numerous senile plaques and neurofibrillary tangles accompanied by neuronal loss. The integrated ontology is developed in protégé tool which is the famous tool for designing ontology. The protégé tool provides facilities such as visualization of concepts which clearly show the semantic relations of a concept, query the some results based on the concepts, object properties and data properties, etc. The paper is organized as follows: Review of literature related to this work is presented in section 2. In section 3, brief introduction on biomedical ontologies are presented and section 4 the ontology based framework is elaborately discussed. The experimental design and results discussion is presented in section 5. Finally, this paper is concluded in section 6. II. RELATED WORK A lot of NLP based works have been reported for the past decades related to concept extraction [2], association rule discovery [3, 4] and extracting relationships among various concepts [5, 6]. Many approaches have been developed for extracting significant associations and interactions among various biological entities [5, 6, and 7] and discovering protein- disease associations. However, these approaches have not been produced promising results, due to inconsistencies prevailed in gene names. Related to gene names extractions, paper [8] has presented the extraction of gene names from articles‟ titles and abstracts and identified genes related to colon cancer disease. The paper [6] has presented a statistical approach for discovering group of genes related to breast cancer disease. In paper [5], author constructed a relationships network among biomedical entities which are extracted from Medline abstracts. In paper [9] the authors proposed new text mining approach which utilizes the concept of expectation, evidence a Z-score in determining significant associations between genes and Alzheimer disease. In paper [10], researchers expressed 1
  • 2. Design and Development of Integrated Biomedical Ontology for Information Extraction from … the method using association and functional relationship discovery algorithm in extracting gene relations from Medline abstracts. Recent works have been reported that ontology is a useful tool to improve the performance of any text mini ng tasks such as text clustering and association rule mining. In text clustering the paper [11] uses conceptual features that are extracted from text using ontology and prove that ontology could improve the performance of text clustering. The paper [12] shows the case study on the integration of biomedical information in to ontology. In paper [21] author proposed bio ontology methodology and compared this with other bio-ontologies. The limitations and benefits of GO ontology are expressed in paper [22]. The author had studied the strength and limitation of biomedicine ontologies based on its text and concept representation [23] This paper presents a new ontology that integrates the famous two ontologies such as Gene Ontology (GO) and MeSH by adding of semantic mappings or relations between GO terms, Gene names and MESH keywords related to a particular disease (Alzheimer disease). Finally the integrated ontology has validated based on syntactic, structural and semantic validation measures in order to prove its correctness and validity. III. BIOMEDICAL ONTOLOGIES The Molecular Biology Ontology (MBO) [16] was the first attempt to begin to define the entities in the domain to promote consistent interpretation across resources. A second phase saw the adoption of ontology by the biological community itself. Pre-eminent among these is the Gene Ontology (GO) [15]. The Microarray Gene Expression Data (MGED) ontology [18] provides a vocabulary for describing a biological sample used in an experiment, the treatment that the sample receives in the experiment and the microarray chip technology used in the experiment. The Functional Genomics Ontology (FUGO) [17] is another type of ontology in the field of bioinformatics. The next popular ontology MeSH thesaurus is the NLM's controlled vocabulary for subject indexing in MEDLINE. It is structured in a hierarchy of descriptors, with each descriptor including a set of concepts, and each concept itself containing a set of terms, which are synonyms and lexical variants. The next coming sections give a brief explanation on Gene Ontology (GO) and MeSH ontology that are integrated in our work. 3.1 GO Biological knowledge is most often represented in „bio-ontologies‟ that are formal representations of knowledge areas in which the essential concepts are combined with properties that describe relationships between concepts. Bio- ontologies are constructed according to textual descriptions of biological activities. One of the most popular bio-ontology is Gene Ontology (GO) [15] that contains more than 18 thousands terms. The GO ontology is a controlled vocabulary of gene and protein roles in cells, addressing the need for consistent description of gene products. This is mainly used in almost all biological researches and to predict the gene functions based on patterns of annotation. The GO describes the molecular function of a gene product, the biological process in which the gene product participates, and the cellular component where the gene product can be found. 3.2 MeSH Medical Subject Headings (MeSH) [24] is another popular ontology designed by the National Library of Medicine which mainly consists of the controlled vocabulary and a MeSH Tree. The controlled vocabulary contains several different types of terms, such as Descriptor, Qualifiers, Scope note, Tree number and Entry terms. Descriptor terms are main concepts or main headings. Entry terms are the synonyms or the related terms to descriptors. For example, “Amyloid beta- Protein Precursor” as a descriptor has the following entry terms “Amyloid A4 Protein Precursor”, “Amyloid beta Precursor Protein”, “Amyloid Protein Precursor”, etc. MeSH descriptors are organized in a MeSH Tree, which can be seen as a MeSH Concept Hierarchy. In the MeSH Tree there are 15 categories (e.g. category A for anatomic terms) and each category is further divided into subcategories. For example, the MeSH tree structure of Alzheimer Disease is shown in Fig. 1. For each subcategory, corresponding descriptors are hierarchically arranged from most general to most specific. In addition to its ontology role, MeSH descriptors are originally used to index MEDLINE articles. 2
  • 3. Design and Development of Integrated Biomedical Ontology for Information Extraction from … Fig. 1 MeSH Tree Structure for Alzheimer Disease IV. PROPOSED FRAMEWORK The proposed framework shown in Fig. 2 integrates the two popular ontologies GO and MeSH by mapping with Gene details used to extract significant associations among concepts from Medline abstracts related to Alzheimer disease. The main objective of the integration of two ontologies is making use of semantic relations among the concepts in Medline abstracts using MeSH ontology terms. The gene products of genes are referred from GO in order to find out the associations of gene products related to Alzheimer disease genes. The integrated ontology consists of MeSH concepts related to Alzheimer disease, linking of Alzheimer disease MeSH concepts to proteins that cause this Alzheimer disease, linking of Alzheimer disease proteins to genes that inhibit Alzheimer disease and finally linking of Alzheimer disease genes to respective gene products in which we identify the exact molecular functions that result in Alzheimer disease, biological processes in which the gene product participates to result in Alzheimer disease and the cellular component where the gene product of Alzheimer disease can be found. The components of this framework are explained below. Fig. 2 The Proposed Ontology Framework The first step in this work is to do preprocessing to transform Medline abstracts, which typically are strings of characters into a suitable representation. a. Removal of stop-words: The stop-words are high frequent words that carry no information (i.e. pronouns, prepositions, conjunctions etc.). Removal of stop-words improves clustering results [19]. b. Stemming: By word stemming it means the process of suffix removal to generate word stems. The Porter stemmer [20] which is a well-known algorithm is used for this task. c. Filtering: Domain vocabulary V in ontology is used for filtering. By filtering, document is considered with related domain words (term). It can reduce the documents dimensions. The filtering task used in our work filters the documents related to Alzheimer disease. d. Tagging: The concepts in Medline abstracts are identified using Genia tagger [26] and the identified concepts are mapped with concepts related to the categories specified in the proposed ontology. The categories used in the proposed ontology are gene, MeSH and GO. 3
  • 4. Design and Development of Integrated Biomedical Ontology for Information Extraction from … e. Semantic analysis & Concept mapping: Class/Concept Description GO_0003674 This is the base class for molecular function. All molecular functions are the subclass of GO_0003674 and the respective molecular function class for a gene is mapped with a particular gene. GO_0005575 This is the base class for cellular components. All cellular components are the subclass of GO_0005575 and the respective cellular component for a gene is mapped with a particular gene. GO_0008150 This is the base class for biological process. All biological process are the subclass of GO_0008150 and the respective biological process for a gene is mapped with a particular gene. GO_Functionality This class specifies the three GO functionalities as class such as cellular component, molecular function and biological process and these classes are mapped with the above three classes. Gene This specifies all human genes. Gene_Type This specifies the gene type for a gene such as protein coding, pseudo coding and unknown. Mesh This specifies the MeSH keywords that include proteins, disease level, etc. for Alzheimer disease. Table 1. Main Concepts in Proposed Ontology After preprocessing, the extracted concepts in Medline abstracts are analyzed in terms of semantic meaning and added to ontology, if it is not available. The concepts or classes used in this integrated ontology are shown in Table 1. The first step for adding concept is to find out the equivalent concept from the ontology and add the concept and possible semantic relations which includes object properties and data properties into ontology, if it is not found. Some of the important object properties and data properties created in the integrated ontology shown in Table 2 and 3. For example, the gene name “A2MP1” is extracted from Medline abstract and this term or concept is to be added to the ontology. If there is an equivalent gene “A2M” is already in the ontology, add “A2MP1” to ontology and assign is-a relationship along with other possible properties such as has_go, has_synonym, has_genetype_as, has_inducing_protien, inhibits, etc. between “A2M” and “A2MP1”. Object properties Description belongs_to This property is used to map GO classes. curated_GO_References This property is used to map genes referred in Pub Med literature has_gene This is used to relate gene with GO has_genetype_as The gene types protein coding, pseudo coding and unknown is mapped with genes. has_go This is used to all applicable GO concepts are mapped with gene at all functional level. has_inducing_protien This is used to map disease with respective proteins. has _synonym This is used to specify all possible synonyms for a gene. Inhibits This property is used to map gene with disease. is_found_in This is used to map disease with gene and it has transitive relationship with inhibits property. Table 2. List of Object Properties in Proposed Ontology Data Properties Description gene_annotations This property is used to specify different data base reference for a particular gene such as ENSENBL, HNGC, HPRD, MIM and UNIPROT. gene_descriptions This is used to specify the alternate name and full name for gene. has_gene_id This is used to specify the gene identifier for gene is_in_chromosome This property is used to specify the chromosome map location and chromosome number. Table 3. List of Data Properties in Proposed Ontology V. EXPERIMENTAL DESIGN & RESULTS The integrated ontology consists of three different main concepts or classes which are GO term functionalities, all human genes with or without related to a particular disease (in this ontology all human genes with or without related to Alzheimer disease are considered) and MeSH terms related to a particular disease. The sub classes for GO term functionalities class are all possible GO functionalities for the genes that are added into the ontology and the Gene main class consists of all human genes with or without related to a particular disease. The subclasses created for MeSH class includes all disease branches in which Alzheimer disease is derived, amino acids, peptides and proteins related to a particular disease. The integrated ontology provides all types of information related to GO terms, genes and a particular disease details. 4
  • 5. Design and Development of Integrated Biomedical Ontology for Information Extraction from … The ontology can be manipulated in different ways in which the most important manipulation techniques are using OntGraf and DL query. The structural evaluation is necessary for ontology to verify the consistency, if it is not structurally evaluated, it may produce some wrong results or inconsistent results when we manipulate information from the ontology. The visualization of concepts with its semantic relations is experimented using OntGraf tool. The important subclasses for MeSH main concepts or classes are shown in Fig. 3. The visualization of human genes related to Alzheimer disease is shown in Fig. 4. The visualization of proteins that are inducing Alzheimer disease is shown in Fig. 5. Fig. 3 The Overview of MeSH Concept Another way to manipulate the ontology is using DL query tool. This is an effective tool to retrieve any kind of semantic related information from the given ontology. Some of the information retrieval queries and results are shown in below figures. The Fig. 6 shows the extraction of Alzheimer Disease Fig. 4 Genes Related to Alzheimer Disease 5
  • 6. Design and Development of Integrated Biomedical Ontology for Information Extraction from … related human genes with the respective DL query and the Fig. 7 shows the extraction of protein names that inducing Alzheimer disease. Finally the Fig. 8 shows the extraction of human genes that inhibits Alzheimer disease in particular chromosome level, in our data set there are 3 genes (gene identifiers mapped with genes are shown in Fig. 8) that inhibits Alzheimer disease in chromosome level “10”. Fig. 5 Proteins Related to Alzheimer Disease Fig. 6 Genes Extracted by DL query “Gene and inhibits some Alzheimer_Disease” 6
  • 7. Design and Development of Integrated Biomedical Ontology for Information Extraction from … Fig. 7 Proteins Extracted by DL query “Proteins and has_inducing_disease some Alzheimer_Disease Fig. 8 Mapping Identifiers for Genes Extracted by DL query 5.1 Validating the Ontology The integrated ontology has to be validated to check the correctness. This section explores the evaluation methods used for validating our proposed ontology framework. The proposed ontology is syntactically verified for its consistency using FACT ++ reasoner available in protégé tool. The next validation method is semantic validation and the semantic validation of the ontology is verified by the domain experts. This ontology is validated by domain experts in biological field. Another evaluation method to validate the ontology is structural validation. The structural validation is performed by the different metrics defined in paper [25] that are class match measure, density measure, betweeness measure and semantic similarity measure. The ontology concept is ranked based on the total score of all the four metrics. The weights are assigned based on the concept representation and the weights are assigned in such a way the overall score lies between 0 and 1. Class Match Measure (CMM) – This measure evaluates the ontology for the specified concepts. The specified concepts are searched in the ontology to determine the occurrence of it. If it occurs directly as a concept, the maximum weight will be given for the specified concept. If it partially occurs as instances of any class, then the 50% of maximum weight may be assigned. The CMM evaluates the concepts either as exact match or partial match found in the ontology. Density Measure (DEM) - The DEM evaluates the ontology based on the degree of richness of attributes of a specified concept and includes the details of subclasses, inner attributes, siblings and relations with other classes in the ontology. The weight may be assigned based on the degree of richness of attributes of a concept. Betweenness Measure (BEM) – This measure evaluates the ontology based on centrality of a specified concept in the ontology. The centrality of a concept is computed using the count of shortest path between the specified concept and other concepts in the ontology. Based on the shortest path, weight may be assigned. 7
  • 8. Design and Development of Integrated Biomedical Ontology for Information Extraction from … Semantic Similarity Measure (SSM) – The SSM evaluates the ontology based on the proximity of classes in the ontology the specified concept matches, that is the count of links the specified concept has to map with the existing concepts in the ontology. 5.2 Dataset We framed 4 corpuses from our integrated ontology to validate it and each corpus consists of concepts of ontology and its important properties. Each corpus is the superset of the previous one. The corpus C1 consists of main concepts that include more subclasses and have rich relations or links with other sub concepts. The corpus C2 consists of sub concepts in C1 and other concepts in the ontology. The corpus C3 is the subset of C1 and has concepts in C1 and two important properties related to those concepts. The corpus C4 contains concepts in C1 and three important properties related to those concepts. All 4 corpuses framed from the ontology shown in Table 4 and the overall score is computed as follows from the above mentioned measures. Let O be the set of corpuses framed from the proposed ontology; Let wi be a weight factor and M be the different similarity metrics such as CMM, DEM, SSM and BEM. From the overall score it is found that the corpus C1 has the maximum score as it considered concepts as direct match. The score may be lesser when the concept with partial match is found. The corpus C2 is found to have less score and ranked as 4 due to the DEM and BEM measure score values. The DEM and BEM measure gives lowest score, because concepts in C2 have no related inner attributes and links with other concepts in C2. The corpus C3 is found to have second highest score due to the CMM and SSM score values, since the concepts in C2 are direct concepts and have good number of links among concepts in C3. The corpus C4 is found to have third highest score due to the CMM and SSM score values and also C4 is the sub set of C3. All the four metrics are provided with equal weights and we found that some of the corpuses may produce low score due to the DEM and BEM measures. The DEM and BEM score values may be increased when we use different weights. In our proposed ontology, we found that the concepts and its relations are linked correctly and further some of the missing relations may be added in future as to produce more promising results. Corpus(constructed from Score Rank Ontology) C1 0.79 1 C2 0.39 4 C3 0.46 2 C4 0.41 3 Table 4. Overall Scores and Ranks for Corpuses Finally the class match measure produces high score when there is an exact match found in the ontology. This score may decrease when there is a partial match found in the ontology. The density measure score found to be good when more relations exists among concepts. The betweenness measure found to be good when the concepts related with more other concepts in the ontology. The semantic similarity measure is found to be good when the concept have more synonyms and its relations. The corpuses Vs metrics score is represented in a bar chart is shown in Fig. 9. Fig. 9 The Bar chart of Corpus Vs Similarity Metric Score 8
  • 9. Design and Development of Integrated Biomedical Ontology for Information Extraction from … VI. CONCLUSION We studied almost all biomedical ontologies and identified all their merits and demerits. In consideration with this in mind the integrated ontology has proposed by accumulating the essential features represented in all specified ontologies. The integrated ontology is implemented in protégé tool that consists of three main concepts namely GO term functionalities, all human genes with or without related to Alzheimer Disease and MeSH terms related to the same disease. This frame work also addresses the problems of GO ontology, in which all information are given as annotations and that are not directly accessible by the user, because information of these kind are given as http links. The MeSH ontology represents the entry terms for the particular term and associated links in various repositories such as Pub Med, Medline, MIM, etc. In our work, all associated information of GO functionalities, genes are specified directly in our ontology, not as links. This ontology gives all possible semantic relations applicable for all concepts defined in the ontology. The ontology is also evaluated for its correctness and validity using various metrics. In the results of the experiments, we found that the ontology is modeled correctly by providing necessary concepts and relations. The ontology may be further improved by adding more relations to the existing concepts of gene and MeSH to get a higher score. Further the integrated ontology to be used in association rule mining to extract the significant associations among the proteins, genes that are related to Alzheimer Disease. Instead of using simple vector space model to calculate the term frequency and inverse document frequency from the Medline abstracts, we decide to use ontology approach to consider the semantic relationships of terms that appear in the Medline abstracts may give better results. VII. ACKNOWLEDGMENT This work was performed as part of the Minor Research Project, which is supported and funded by University Grants Commission, New Delhi, India. REFERENCES [1]. NCBI PubMed, http://www.ncbi.nlm.nih. Gov /entrez/query.fcgi [2]. Uramoto, N., H. Matsuzawa, T. Nagano, A. Murami and H. Takeuchi, 2004. A text-mining system for knowledge discovery from biomedical documents. [3]. Hristovski, D., J. Stare, B. Peterlin and S. Dzeroski, 2001. Supporting discovery in medicine by association rule mining in Medline and UMLS. Proc. MedInfo Conf., London, England, Sep. 2-5, 10: 1344-1348. [4]. Creighton, C. and S. Hanash, 2003. Mining gene expression databases for association rules. Bioinformatics, 19-1: 79-86. [5]. Wren, J.D., R. Bekeredjian, J.A. Stewart, R.V. Shohet and H.R. Garner, 2004. Knowledge discovery by automated identification and ranking of implicit relationships. Bioinformatics, 20: 3. [6]. Adamic, L.A., D. Wilkinson, B.A. Huberman and E. Adar, 2002. A literature based method for identifying gene- disease connections. IEEE Computer Soc. Bioinformatics Conf. [7]. Palakal, M., M. Stephens, S. Mukhopadhay, R. Raje and S. Rhodes, 2002. A Multi-level Text Mining Method to Extract Biological Relationships. Proc. IEEE Computer Soc. Bioinformatics (CSB) Conf., pp: 97-108. [8]. Wilkinson, D.M. and B.A. Huberman, 2004. A method for finding communities of related genes. Proc. Natl. Acad. Sci. U.S.A., 101 Suppl. 1: 5241- 5248. [9]. Hisham Al-Mubaid and Rajit K Singh, 2005. A New Text Mining Approach for Finding Protein-to-Disease Associations, American Journal of Biochemistry and Biotechnology 1 (3): 145-152, ISSN 1553-3668. [10]. M. Stephens, M. Palakal, S. Mukhopadhyay, R. Raje, 2001. Detecting Gene Relations From Medline Abstracts, Pacific Symposium on Biocomputing 6:483-496. [11]. A. Hotho,A.Maedche and S.Staab, “Ontology-based text document clustering”[A], Proc. of the Conf. on Intelligent Information Systems[C],2003. [12]. Paulo Gottgtroy1, Prof. Nik Kasabov1, Stephen MacDonell1, 2004. An ontology driven approach for knowledge discovery in Biomedicine. [13]. R. Kleinsorge, C. Tilley, and J.Willis. (2000). Unified Medical Language System (UMLS) Basics [Online]. Available: http://www.nlm.nih.gov/ research/umls/pdf/UMLS_Basics.pdf. [14]. G. A. Miller, “WordNet: A lexical database for English,” Commun. ACM, vol. 38, pp. 39–41, 1995. [15]. http://www.geneontology.org/ [16]. Schulze-Kremer S. Adding semantics to genome databases: Towards ontology for molecular biology. In: Proceedings of the Fifth International Conference for Intelligent Systems for Molecular Biology Conference (ISMB), 1997; pp. 272–5. [17]. http://www.fugo.org [18]. Whetzel PL, Parkinson H, Causton HC, et al. The MGED Ontology: a resource for semantics-based description of microarray experiments. Bioinformatics 2006;22:866–73. [19]. Mark Sinka and David Corne, “A Large Benchmark Dataset for Web Document Clustering”, In Soft Computering Sytems:Design,Management And Application, Vol.87 of Frontiers in Artifical Intelligence and Applications, pages 881-890,2002. [20]. M.F.Porter, “An Algorithm for Suffix Stripping”, Program 14(3), July 1980, pp.130-137. [21]. Robert Stevebs et.al., “Ontology based knowledge representation for bioinformatics”, published in briefings in Bioinformatics, 2000. [22]. Barry Smith, et. Al., “The Ontology of the Gene Ontology”, Proceedings of AMIA Symposium 2003. 9
  • 10. Design and Development of Integrated Biomedical Ontology for Information Extraction from … [23]. Olivier Corby et.al., “Searching the Semantic Web: Approximate Query processing based on bio ontologies”. Published in IEEE Computer Society, 2006. [24]. http://www.ncbi.nlm.nih.gov/mesh/ meshhome.html [25]. Amal Zouaq, Roger Nkambou, “Building Domain Ontologies from Text for Educational purposes”. IEEE Transactions on Learning Technologies, Vol.1, No.1, Jan-Mar 2008. [26]. http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/ 10