SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Presented by 
Sarwat Bashir 
(Bioinformatics 8th semester ) 
Shaheed Binazeer Bhutto Women University of Peshawar 
Shaheed Benazir Bhutto University Peshawar
SECONDARY DATABASES IN BIOINFORMATICS 
Those data that are derived from the analysis or 
treatment of primary data such as secondary 
structures, hydrophobicity plots, and domain are 
stored in secondary databases. 
http://www.imb-jena.de/~rake/Bioinformatics_WEB/databases_classification.html
THE BIOINFORMATICS SECONDARY 
DATABASES 
 Secondary databases are further divided into four 
categories according to the information they contain : 
 Sequence-related Information 
 Genome-related Information 
 Structure-related Information 
 Pathway Information 
http://www.imb-jena.de/~rake/Bioinformatics_WEB/databases_classification.html
Metabolic Pathway and Protein 
Function Databases 
 A pathway database (DB) is a DB that describes 
biochemical pathways, reactions, and enzymes. For 
the modeling and simulation of a biopathway. 
http://www.imb-jena.de/~rake/Bioinformatics_WEB/databases_classification.html
GENOME DATABASES 
 These databases collect organism genome sequences, 
annotate (add description ) and analyze them, and 
provide public access. 
 Add some of experimental literature to improve 
computed annotations. 
 These databases may hold many species genomes, or a 
single model organism genome. 
http://www.imb-jena.de/~rake/Bioinformatics_WEB/databases_classification.html
PAGED: a pathway and gene-set enrichment 
database to enable molecular phenotype 
discoveries 
 Abstract: 
 Background: Pathway and gene-set enrichment analysis has evolved into 
the study of high-throughput functional genomics form past decade. 
 Researchers have begun to combine pathway and gene-set enrichment 
analysis as well as network module-based approaches to identify crucial 
relationships between different molecular mechanisms. 
 Methods: To meet the new challenge of molecular phenotype discovery, in 
this work, they developed an integrated following methods : 
 Online database, the Pathway And Gene Enrichment Database (PAGED), 
to enable comprehensive searches for disease-specific pathways, gene 
signatures, microRNA targets, and network modules by integrating gene-set- 
based prior knowledge as molecular patterns from multiple levels: the 
genome, transcriptome, posttranscriptome, and proteome. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Cont.… 
 Results: The online database we developed, PAGED 
http://bio.informatics.iupui.edu/PAGED is by far the 
most comprehensive public compilation of gene sets. 
 In its current release, PAGED contains a total of 25,242 
gene sets, 61,413 genes, 20 organisms, and 1,275,560 
records from five major categories. 
 Beyond its size, the advantage of PAGED lies in the 
explorations of relationships between gene sets as 
gene-set association networks (GSANs). 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 
2105/13/S15/S2`
Introduction to PAGED 
 Biological pathways have provided natural sources of molecular 
mechanisms to develop diagnosis, treatment, and prevention strategies 
for complex diseases. 
 Gene-set enrichment methods analyzed the activity of thousands of 
genes effectively instead of individual gene analysis . 
 The analysis reveal accusations between the genotypes and 
phenotypes, which are simply called molecular profiling or molecular 
phenotypes. 
 The other biological pathway databases are heterogeneous and lack of 
annotations. 
 Unlike candidate pathway analysis, genome-wide pathway analysis 
does not require prior biological knowledge. 
 PAGED can reveal the interaction a cross the different databases. 
 Gene signature data from the transcriptome level offers a 
complementary source of information to complete pathway knowledge. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
The division of pathway analysis 
 Pathway analysis are divided into three generation of 
approaches: 
 First generation: over representation analysis (ORA) approach 
 Second generation: functional class sorting (FCS) approach. 
 Third generation: pathway topology (PT) approach. 
 Multi-level, multi-scale, knowledge-guided enrichment analysis 
can enable molecular phenotype discovery for specific human 
diseases. 
 The acquisition of prior knowledge and systems modeling poses 
a challenge for developing tools that go beyond third-generation 
pathway analysis for disease-specific molecular profiling. 
 To meet the new challenges of molecular phenotype discovery, 
the Pathway And Gene Enrichment Database (PAGED) are 
developed. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
The benefits of integrated 
database (PAGED) 
 This new database can provide the following benefits to biological researchers. 
 First, this database consists of disease-gene association data, curated and 
integrated from Online Mendelian Inheritance in Man (OMIM) database and 
the Genetic Association Database (GAD) therefore, it has the potential to assist 
human disease studies. 
 Second, as contains all current compiled gene signatures in Molecular 
Signature Database (MSigDB) and Gene Signatures Database (GeneSigDB. 
 Third, it further integrates with microRNA-targets from miRecords database, 
signaling pathways, protein interaction networks, and transcription 
factor/gene regulatory networks, partially based on data integrated from the 
Human Pathway Database (HPD) and the Human Annotated and Predicted 
Protein Interaction (HAPPI) database.. 
 It provide integrated the following version of the database OMIM (Feb. 2012), 
GAD (Aug. 2011), GeneSigDB (v. 4.0, Sept. 2011), MSigDB (v. 3.0. Sept. 2010), 
HPD (2009), HAPPI (v. 1.4)and miRecords (Nov. 2010), which are the latest 
versions available. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 
2105/13/S15/S2`
The advantages of this Research 
 The advantage of this work is the relationship between 
pathways, gene signatures, microRNA targets, and/or 
network modules. 
 These gene-set-based relationships can be visualized 
as a gene-set association network (GSAN), which 
provides a “roadmap” for molecular phenotype 
discovery for specific human diseases. 
 It demonstrate how to query PAGED to discover 
crucial pathways, gene signatures, and gene network 
modules specific to disease genome . 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Methods 
 Data sources: The overview of the data integration process in Figure 1. 
 Gene-set data were collected, extracted, and integrated from five major 
categories. 
 The pathway data sources were from HPD , which has integrated 999 
human biological pathway data from five curated sources: KEGG, PID, 
BioCarta, Reactome, and Protein Lounge. 
 The genome-level disease gene relationships were from OMIM and 
GAD. 
 The transcriptome- level gene signatures were from MSigDB and 
GeneSigDB. 
 The post-transcriptome-level microRNA data were from miRecords. 
 The proteome level data was from an integrated protein interaction 
database. 
 HAPPI, which has integrated HPRD, BIND, MINT, STRING, and 
OPHID databases. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Gene-set data integration: 
 Treat gene sets as all groups of genes, including disease associated 
genes, pathway genes, gene signatures, microRNA-targeted genes, and 
PPI sub-network modules. 
 The raw files are curated from those data sources have various formats 
including plaintext, XML, and table. 
 It have to written Perl/Java parsers to convert them into a common tab 
delimited textual format to ensure syntactic level data compatibility. 
 To integrate across different databases, they mapped the gene/protein 
IDs in all databases to official gene symbols. The gene-set gene data is 
stored in our backend ORACLE11g relational database. 
 All recodes of gene set members are represented by the official gene 
symbols. 
 All PAGED gene sets were assigned unique PAGED-specific identifiers 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Online software designing 
 The PAGED platform follows a multi-tiered design 
architecture. 
 The backend was implemented as PL/SQL packages 
on an Oracle 11g database server. The PAGED 
application middleware was implemented on the 
Oracle Application Express (APEX) server, which 
bridged between the Apache webserver and the Oracle 
database server. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Gene-set similarity measurement 
 The similarity score Si, j of two different gene sets is defined by the following 
formula: 
 Here, Pi and Pj denote two different gene sets, while |Pi| and |Pj| are the 
number of genes in each of these two gene sets. 
 Their intersection Pi∩Pj denotes a common set of genes, while their union 
Pi∪Pj is calculated as |Pi| + |Pj| - |Pi∩Pj|. 
 Here, α is a weight coefficient among [0, 1], which is used to count varying 
degree of contributions from calculations based both on the overlap (left item 
SL) and the cover (right item SR). 
 SL is well-known as the Jaccard coefficient which is often used to evaluate the 
similarity between two sets . 
 When a larger gene set covers a smaller one, it is expected that their similarity 
score to be high enough to identify them. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Microarray data 
 For gene expression data analysis they show how to 
discover crucial pathways, gene signatures, and gene 
network modules specific to disease functional 
genomics. 
 To downloaded a microarray dataset from Gene 
Expression Omnibus, GEO 
http://www.ncbi.nlm.nih.gov/geo/. 
 This microarray dataset compared the transcriptome 
data of expected information collected adenomas with 
those of the normal from the same individuals. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 
2105/13/S15/S2`
Differential gene-set expressions 
 Use ABS_FC to denote the absolute value of fold 
change for each gene. Then define differential gene set 
expressions. 
 NORM_ABS_FC: The p*-norm of ABS_FC of all the 
available differential gene expressions in a gene set. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable 
molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Gene-set association network 
(GSAN) construction 
 To visualize the relationships between gene sets, we define 
a gene-set association network (GSAN) as a network of 
associations between different gene sets, in which the 
network element representation is as follows: 
• Node: Gene set 
• Edge: Association between two gene sets 
• Node size: Gene-set scale (Counting genes in each gene set) 
• Node color: Differential gene-set expression 
(NORM_ABS_FC) 
• Node line color: Gene-set data source 
• Edge width: Similarity score 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Results 
 Database content statistics: 
 Table 1 lists the detailed statistics for each data source 
and the overlap between each pair. For example, 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 
2105/13/S15/S2`
Gene-set scale distributions 
 Gene-set scale distributions for PAGED molecule data. 
 A gene-set scale refers to the number of molecules 
(i.e., genes) involved in a given gene set. 
 The distributions are plotted under log scale for both 
the x-axis and y-axis. 
 The linear trend line in red represents linear 
regression of PAGED distribution. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 
2105/13/S15/S2`
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Cont …. 
 An overview for the core functionality of the online PAGED 
website. 
 (A) The PAGED home page providing search by either 
disease name or gene list; 
 (B) a webpage containing the list of gene sets retrieved as a 
result of a disease query; 
 (C) a webpage containing the list of gene sets retrieved as a 
result of a gene list query; 
 (D) an advanced search page in which the user can either 
search disease name or upload a gene-list to search; 
 (E) a browse page listing the gene sets, their data source 
and number of genes. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
Discussion 
 In the near future, the improved gene-set similarity algorithms 
will be introduced by using a global PPI network to calculate 
their distance. 
 This would provide a more robust measurement for web 
interface development, and the plan is to add a disease browsing 
function based on disease ontology and a network visualization 
function to show the gene-set association dynamically. 
 The final goal is to perform multi-scale network modeling for 
molecular phenotype discoveries by integrating differential 
expressions with pathway and network topologies. 
 The current release of PAGED provides a solid foundation for us 
to develop third-generation pathway analysis tools. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries 
BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
Conclusions 
 The development of PAGED, an online database that provides 
the most comprehensive public compilation of gene sets. 
 In the current release, PAGED contains a total of 25,242 gene 
sets, 61,413 genes, 20 organisms, and 1,275,560 records from five 
major categories: 
 The pathway data from HPD, genome-level disease data from 
OMIM and 
 GAD, transcriptome-level gene signatures from MSigDB and 
GeneSigDB, the post-transcriptomemicroRNA data from 
miRecords, and proteome-level data from HAPPI. 
 The number of overlapping genes between each data source, 
gene-set scale distribution, and case study in colorectal cancer. 
 The current PAGED software can help users address a wide range 
of gene-set-related questions in human disease biology studies. 
Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular 
phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 
http://www.biomedcentral.com/1471-2105/13/S15/S2`
MGD: the Mouse Genome 
Database 
 ABSTRACT 
 The Mouse Genome Database (MGD) (http://www.informatics.jax.org) is one 
component of a community database resource for the laboratory mouse, a key 
model organism for interpreting the human genome and for under standing 
human biology. 
 MGD strives to provide an extensively integrated information resource with 
experimental details annotated from both literature and on-line genomic data 
sources. 
 MGD presents the consensus representation of genotype (sequence) to 
phenotype information including highly detailed information about genes and 
gene products. 
 Primary foci of integration are through representations of relationships 
between genes, sequences and phenotypes. 
 MGD collaborates with other bioinformatics groups to curate a definitive set of 
information about the laboratory mouse. 
 Recent developments include a general implementation of database structures 
for controlled vocabularies and the integration of a phenotype classification 
system. 
Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, 
No. 1 193–195DOI: 10.1093/nar/gkg047
INTRODUCTION 
 The Mouse Genome Database (MGD) provides an 
integrated information on mouse genes, genetic markers 
and genomic features as well as information on molecular 
segments ( probes, primers, cDNA clones, BACs and YACs) 
mutant phenotypes, comparative mapping data, graphical 
displays of linkage, cytogenetic and physical maps, 
experimental mapping data, as well as strain distribution 
patterns for recombinant inbred strains (RIs) and cross 
haplotypes. 
 MGD is updated daily . It providing several new data 
manipulation and display tools. 
 MGD is one component of the Mouse Genome Informatics 
(MGI) database resource (http://www.informatics.jax.org) 
located at The Jackson Laboratory (http://www.jax.org). 
Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 
1 193–195DOI: 10.1093/nar/gkg047
IMPROVEMENTS DURING 2002 
 Implementation of phenotype classifications 
 A broad, high-level set of phenotype terms have been developed and employed 
to classify phenotype data in MGD. 
 This defined vocabulary of 105 terms can be used to search, group, compare 
and analyze phenotypes. 
 These phenotype classification terms appear on the Alleles and Phenotypes 
Query Form (Fig. 1), and on the Genes and Marker Query Form. 
 The complete list of terms and their accession IDs is also available by FTP. 
 On each form, there is a link to the phenotype classification terms, complete 
with definitions and examples. 
 Users of the MGI database can select one or more terms from the list to search 
for records associated with a particular phenotype, in combination with many 
other parameters on the forms. 
 In addition, text based searches for more specific phenotypic terms remain 
available. 
Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, 
No. 1 193–195DOI: 10.1093/nar/gkg047
Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, 
Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
Improvements to the MGI:GO 
browser 
 The MGI GO Browser 
(http://www.informatics.jax.org/searches/GO_form.sht 
ml) allows database users to access genes in MGI using 
functional annotation terms from the GO. 
 This Browser was developed in conjunction with the 
GXD. (Gene Expression Database ) 
 The GO Browser can be accessed from gene detail or 
query pages as well as directly from the MGI menus. 
Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, 
Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
Availability of MGI:GO files in 
various formats 
 MGI gene-to-GO annotations are updated daily. 
 Various files for the MGI gene/markers with the GO associations 
are publicly available. 
 These files are updated each time MGI submits a new gene 
association file to the GO web site (http:// 
www.geneontology.org) and can be accessed on the MGI FTP 
server (ftp://www.informatics.jax.org/pub/informatics/reports/ 
gene association.mgi). 
 A file of all the GO terms used by MGI in the annotation of genes 
and gene products is also available. MGI also provides a file to 
the GO database of MGI Gene : SWISS-PROT associations. 
 This information is incorporated into the GO database and thus 
enables users to recover mouse sequence data as a result of a 
semantic search against the GO database 
(http://www.godatabase.org/cgi-bin/go.cgi 
Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, 
No. 1 193–195DOI: 10.1093/nar/gkg047
IMPLEMENTATION 
 MGD is implemented in the Sybase relational database 
system, version 12.5. 
 A large set of CGI scripts and Java Servlets mediate the 
user’s interaction with the database. 
 For computational users, direct SQL access can be 
requested through User Support. 
 User-requested database reports and a number of 
widely used data files (generated daily) are available 
on the FTP site (ftp://ftp.informatics.jax.org). 
Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 
193–195DOI: 10.1093/nar/gkg047
CITING MGD 
 The following citation format is suggested when 
referring to datasets specific to the MGD component 
of MGI : 
 Mouse Genome Database (MGD), Mouse Genome 
Informatics, The Jackson Laboratory, Bar Harbor, 
Maine (URL: http://www.informatics.jax.org). 
Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, 
No. 1 193–195DOI: 10.1093/nar/gkg047

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Prosite
PrositeProsite
Prosite
 
Kegg
KeggKegg
Kegg
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Scop database
Scop databaseScop database
Scop database
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Protein database
Protein databaseProtein database
Protein database
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
TrEMBL
TrEMBLTrEMBL
TrEMBL
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 

Andere mochten auch

CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...Surya Saha
 
GMOD 2014 MAKER Lecture
GMOD 2014 MAKER LectureGMOD 2014 MAKER Lecture
GMOD 2014 MAKER Lecturebarrymoore
 
Biocuration2012 Eugeni Belda
Biocuration2012 Eugeni BeldaBiocuration2012 Eugeni Belda
Biocuration2012 Eugeni Beldaeugenibc
 
Novel methods and materials in bioseparation 2015
Novel methods and materials in bioseparation 2015Novel methods and materials in bioseparation 2015
Novel methods and materials in bioseparation 2015N Poorin
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomesavrilcoghlan
 
Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources innocent87
 
Protein-protein interaction
Protein-protein interactionProtein-protein interaction
Protein-protein interactionsigma-tau
 
Protein protein interaction basic
Protein protein interaction basicProtein protein interaction basic
Protein protein interaction basicAyesha Aftab
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Sai Ram
 
Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...naveed ul mushtaq
 
Protein Microarrays: Approaches to Printing
Protein Microarrays: Approaches to PrintingProtein Microarrays: Approaches to Printing
Protein Microarrays: Approaches to PrintingSCHOTT
 
Ch. 7 (microbial metabolism)
Ch. 7 (microbial metabolism)Ch. 7 (microbial metabolism)
Ch. 7 (microbial metabolism)Valentina Nuzzi
 
Protein-protein interaction (PPI)
Protein-protein interaction (PPI)Protein-protein interaction (PPI)
Protein-protein interaction (PPI)N Poorin
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactionsPrianca12
 

Andere mochten auch (20)

CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 
GMOD 2014 MAKER Lecture
GMOD 2014 MAKER LectureGMOD 2014 MAKER Lecture
GMOD 2014 MAKER Lecture
 
Biocuration2012 Eugeni Belda
Biocuration2012 Eugeni BeldaBiocuration2012 Eugeni Belda
Biocuration2012 Eugeni Belda
 
Novel methods and materials in bioseparation 2015
Novel methods and materials in bioseparation 2015Novel methods and materials in bioseparation 2015
Novel methods and materials in bioseparation 2015
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Kegg database resources
Kegg database resources Kegg database resources
Kegg database resources
 
Protein-protein interaction
Protein-protein interactionProtein-protein interaction
Protein-protein interaction
 
Protein protein interaction basic
Protein protein interaction basicProtein protein interaction basic
Protein protein interaction basic
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)Protein-Protein Interactions (PPIs)
Protein-Protein Interactions (PPIs)
 
Protein dna interaction
Protein dna interactionProtein dna interaction
Protein dna interaction
 
Ph electrodes
Ph electrodesPh electrodes
Ph electrodes
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...Protein microarray Preparation of protein microarray Different methods of arr...
Protein microarray Preparation of protein microarray Different methods of arr...
 
Protein Microarrays: Approaches to Printing
Protein Microarrays: Approaches to PrintingProtein Microarrays: Approaches to Printing
Protein Microarrays: Approaches to Printing
 
Microbial metabolism
Microbial metabolismMicrobial metabolism
Microbial metabolism
 
Testppt
TestpptTestppt
Testppt
 
Ch. 7 (microbial metabolism)
Ch. 7 (microbial metabolism)Ch. 7 (microbial metabolism)
Ch. 7 (microbial metabolism)
 
Protein-protein interaction (PPI)
Protein-protein interaction (PPI)Protein-protein interaction (PPI)
Protein-protein interaction (PPI)
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 

Ähnlich wie Pathways and genomes databases in bioinformatics

Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Algal Functional Annotation Tool
Algal Functional Annotation ToolAlgal Functional Annotation Tool
Algal Functional Annotation ToolSarah Adams
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsRemzi Çelebi
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Yasel Cruz
 
Pathway resources at the rat genome database
Pathway resources at the rat genome databasePathway resources at the rat genome database
Pathway resources at the rat genome databaseJennifer Smith
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology Bin Chen
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databasesSangeeta Das
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primersijdmtaiir
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsAmit Sheth
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptxAshuAsh15
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisationBiogeeks
 
JEVBase: An Interactive Resource for Protein Annotationof JE Virus
JEVBase: An Interactive Resource for Protein Annotationof JE VirusJEVBase: An Interactive Resource for Protein Annotationof JE Virus
JEVBase: An Interactive Resource for Protein Annotationof JE VirusCSCJournals
 

Ähnlich wie Pathways and genomes databases in bioinformatics (20)

Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Algal Functional Annotation Tool
Algal Functional Annotation ToolAlgal Functional Annotation Tool
Algal Functional Annotation Tool
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Pathway resources at the rat genome database
Pathway resources at the rat genome databasePathway resources at the rat genome database
Pathway resources at the rat genome database
 
Karyotype DAS client
Karyotype DAS clientKaryotype DAS client
Karyotype DAS client
 
Bind database
Bind databaseBind database
Bind database
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and PrimersGASCAN: A Novel Database for Gastric Cancer Genes and Primers
GASCAN: A Novel Database for Gastric Cancer Genes and Primers
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
Biological database
Biological databaseBiological database
Biological database
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
JEVBase: An Interactive Resource for Protein Annotationof JE Virus
JEVBase: An Interactive Resource for Protein Annotationof JE VirusJEVBase: An Interactive Resource for Protein Annotationof JE Virus
JEVBase: An Interactive Resource for Protein Annotationof JE Virus
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 

Kürzlich hochgeladen

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Kürzlich hochgeladen (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Pathways and genomes databases in bioinformatics

  • 1. Presented by Sarwat Bashir (Bioinformatics 8th semester ) Shaheed Binazeer Bhutto Women University of Peshawar Shaheed Benazir Bhutto University Peshawar
  • 2. SECONDARY DATABASES IN BIOINFORMATICS Those data that are derived from the analysis or treatment of primary data such as secondary structures, hydrophobicity plots, and domain are stored in secondary databases. http://www.imb-jena.de/~rake/Bioinformatics_WEB/databases_classification.html
  • 3. THE BIOINFORMATICS SECONDARY DATABASES  Secondary databases are further divided into four categories according to the information they contain :  Sequence-related Information  Genome-related Information  Structure-related Information  Pathway Information http://www.imb-jena.de/~rake/Bioinformatics_WEB/databases_classification.html
  • 4. Metabolic Pathway and Protein Function Databases  A pathway database (DB) is a DB that describes biochemical pathways, reactions, and enzymes. For the modeling and simulation of a biopathway. http://www.imb-jena.de/~rake/Bioinformatics_WEB/databases_classification.html
  • 5. GENOME DATABASES  These databases collect organism genome sequences, annotate (add description ) and analyze them, and provide public access.  Add some of experimental literature to improve computed annotations.  These databases may hold many species genomes, or a single model organism genome. http://www.imb-jena.de/~rake/Bioinformatics_WEB/databases_classification.html
  • 6. PAGED: a pathway and gene-set enrichment database to enable molecular phenotype discoveries  Abstract:  Background: Pathway and gene-set enrichment analysis has evolved into the study of high-throughput functional genomics form past decade.  Researchers have begun to combine pathway and gene-set enrichment analysis as well as network module-based approaches to identify crucial relationships between different molecular mechanisms.  Methods: To meet the new challenge of molecular phenotype discovery, in this work, they developed an integrated following methods :  Online database, the Pathway And Gene Enrichment Database (PAGED), to enable comprehensive searches for disease-specific pathways, gene signatures, microRNA targets, and network modules by integrating gene-set- based prior knowledge as molecular patterns from multiple levels: the genome, transcriptome, posttranscriptome, and proteome. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 7. Cont.…  Results: The online database we developed, PAGED http://bio.informatics.iupui.edu/PAGED is by far the most comprehensive public compilation of gene sets.  In its current release, PAGED contains a total of 25,242 gene sets, 61,413 genes, 20 organisms, and 1,275,560 records from five major categories.  Beyond its size, the advantage of PAGED lies in the explorations of relationships between gene sets as gene-set association networks (GSANs). Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 2105/13/S15/S2`
  • 8. Introduction to PAGED  Biological pathways have provided natural sources of molecular mechanisms to develop diagnosis, treatment, and prevention strategies for complex diseases.  Gene-set enrichment methods analyzed the activity of thousands of genes effectively instead of individual gene analysis .  The analysis reveal accusations between the genotypes and phenotypes, which are simply called molecular profiling or molecular phenotypes.  The other biological pathway databases are heterogeneous and lack of annotations.  Unlike candidate pathway analysis, genome-wide pathway analysis does not require prior biological knowledge.  PAGED can reveal the interaction a cross the different databases.  Gene signature data from the transcriptome level offers a complementary source of information to complete pathway knowledge. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 9. The division of pathway analysis  Pathway analysis are divided into three generation of approaches:  First generation: over representation analysis (ORA) approach  Second generation: functional class sorting (FCS) approach.  Third generation: pathway topology (PT) approach.  Multi-level, multi-scale, knowledge-guided enrichment analysis can enable molecular phenotype discovery for specific human diseases.  The acquisition of prior knowledge and systems modeling poses a challenge for developing tools that go beyond third-generation pathway analysis for disease-specific molecular profiling.  To meet the new challenges of molecular phenotype discovery, the Pathway And Gene Enrichment Database (PAGED) are developed. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 10. The benefits of integrated database (PAGED)  This new database can provide the following benefits to biological researchers.  First, this database consists of disease-gene association data, curated and integrated from Online Mendelian Inheritance in Man (OMIM) database and the Genetic Association Database (GAD) therefore, it has the potential to assist human disease studies.  Second, as contains all current compiled gene signatures in Molecular Signature Database (MSigDB) and Gene Signatures Database (GeneSigDB.  Third, it further integrates with microRNA-targets from miRecords database, signaling pathways, protein interaction networks, and transcription factor/gene regulatory networks, partially based on data integrated from the Human Pathway Database (HPD) and the Human Annotated and Predicted Protein Interaction (HAPPI) database..  It provide integrated the following version of the database OMIM (Feb. 2012), GAD (Aug. 2011), GeneSigDB (v. 4.0, Sept. 2011), MSigDB (v. 3.0. Sept. 2010), HPD (2009), HAPPI (v. 1.4)and miRecords (Nov. 2010), which are the latest versions available. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 2105/13/S15/S2`
  • 11. The advantages of this Research  The advantage of this work is the relationship between pathways, gene signatures, microRNA targets, and/or network modules.  These gene-set-based relationships can be visualized as a gene-set association network (GSAN), which provides a “roadmap” for molecular phenotype discovery for specific human diseases.  It demonstrate how to query PAGED to discover crucial pathways, gene signatures, and gene network modules specific to disease genome . Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 12. Methods  Data sources: The overview of the data integration process in Figure 1.  Gene-set data were collected, extracted, and integrated from five major categories.  The pathway data sources were from HPD , which has integrated 999 human biological pathway data from five curated sources: KEGG, PID, BioCarta, Reactome, and Protein Lounge.  The genome-level disease gene relationships were from OMIM and GAD.  The transcriptome- level gene signatures were from MSigDB and GeneSigDB.  The post-transcriptome-level microRNA data were from miRecords.  The proteome level data was from an integrated protein interaction database.  HAPPI, which has integrated HPRD, BIND, MINT, STRING, and OPHID databases. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 13. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 14. Gene-set data integration:  Treat gene sets as all groups of genes, including disease associated genes, pathway genes, gene signatures, microRNA-targeted genes, and PPI sub-network modules.  The raw files are curated from those data sources have various formats including plaintext, XML, and table.  It have to written Perl/Java parsers to convert them into a common tab delimited textual format to ensure syntactic level data compatibility.  To integrate across different databases, they mapped the gene/protein IDs in all databases to official gene symbols. The gene-set gene data is stored in our backend ORACLE11g relational database.  All recodes of gene set members are represented by the official gene symbols.  All PAGED gene sets were assigned unique PAGED-specific identifiers Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 15. Online software designing  The PAGED platform follows a multi-tiered design architecture.  The backend was implemented as PL/SQL packages on an Oracle 11g database server. The PAGED application middleware was implemented on the Oracle Application Express (APEX) server, which bridged between the Apache webserver and the Oracle database server. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 16. Gene-set similarity measurement  The similarity score Si, j of two different gene sets is defined by the following formula:  Here, Pi and Pj denote two different gene sets, while |Pi| and |Pj| are the number of genes in each of these two gene sets.  Their intersection Pi∩Pj denotes a common set of genes, while their union Pi∪Pj is calculated as |Pi| + |Pj| - |Pi∩Pj|.  Here, α is a weight coefficient among [0, 1], which is used to count varying degree of contributions from calculations based both on the overlap (left item SL) and the cover (right item SR).  SL is well-known as the Jaccard coefficient which is often used to evaluate the similarity between two sets .  When a larger gene set covers a smaller one, it is expected that their similarity score to be high enough to identify them. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 17. Microarray data  For gene expression data analysis they show how to discover crucial pathways, gene signatures, and gene network modules specific to disease functional genomics.  To downloaded a microarray dataset from Gene Expression Omnibus, GEO http://www.ncbi.nlm.nih.gov/geo/.  This microarray dataset compared the transcriptome data of expected information collected adenomas with those of the normal from the same individuals. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 2105/13/S15/S2`
  • 18. Differential gene-set expressions  Use ABS_FC to denote the absolute value of fold change for each gene. Then define differential gene set expressions.  NORM_ABS_FC: The p*-norm of ABS_FC of all the available differential gene expressions in a gene set. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 19. Gene-set association network (GSAN) construction  To visualize the relationships between gene sets, we define a gene-set association network (GSAN) as a network of associations between different gene sets, in which the network element representation is as follows: • Node: Gene set • Edge: Association between two gene sets • Node size: Gene-set scale (Counting genes in each gene set) • Node color: Differential gene-set expression (NORM_ABS_FC) • Node line color: Gene-set data source • Edge width: Similarity score Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 20. Results  Database content statistics:  Table 1 lists the detailed statistics for each data source and the overlap between each pair. For example, Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 2105/13/S15/S2`
  • 21. Gene-set scale distributions  Gene-set scale distributions for PAGED molecule data.  A gene-set scale refers to the number of molecules (i.e., genes) involved in a given gene set.  The distributions are plotted under log scale for both the x-axis and y-axis.  The linear trend line in red represents linear regression of PAGED distribution. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471- 2105/13/S15/S2`
  • 22. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 23. Cont ….  An overview for the core functionality of the online PAGED website.  (A) The PAGED home page providing search by either disease name or gene list;  (B) a webpage containing the list of gene sets retrieved as a result of a disease query;  (C) a webpage containing the list of gene sets retrieved as a result of a gene list query;  (D) an advanced search page in which the user can either search disease name or upload a gene-list to search;  (E) a browse page listing the gene sets, their data source and number of genes. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 24. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 25. Discussion  In the near future, the improved gene-set similarity algorithms will be introduced by using a global PPI network to calculate their distance.  This would provide a more robust measurement for web interface development, and the plan is to add a disease browsing function based on disease ontology and a network visualization function to show the gene-set association dynamically.  The final goal is to perform multi-scale network modeling for molecular phenotype discoveries by integrating differential expressions with pathway and network topologies.  The current release of PAGED provides a solid foundation for us to develop third-generation pathway analysis tools. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 26. Conclusions  The development of PAGED, an online database that provides the most comprehensive public compilation of gene sets.  In the current release, PAGED contains a total of 25,242 gene sets, 61,413 genes, 20 organisms, and 1,275,560 records from five major categories:  The pathway data from HPD, genome-level disease data from OMIM and  GAD, transcriptome-level gene signatures from MSigDB and GeneSigDB, the post-transcriptomemicroRNA data from miRecords, and proteome-level data from HAPPI.  The number of overlapping genes between each data source, gene-set scale distribution, and case study in colorectal cancer.  The current PAGED software can help users address a wide range of gene-set-related questions in human disease biology studies. Huang et al. PAGED: a pathway and gene-set enrichment database to enable molecular phenotypeDiscoveries BMC Bioinformatics 2012, 13(Suppl 15):S2 http://www.biomedcentral.com/1471-2105/13/S15/S2`
  • 27. MGD: the Mouse Genome Database  ABSTRACT  The Mouse Genome Database (MGD) (http://www.informatics.jax.org) is one component of a community database resource for the laboratory mouse, a key model organism for interpreting the human genome and for under standing human biology.  MGD strives to provide an extensively integrated information resource with experimental details annotated from both literature and on-line genomic data sources.  MGD presents the consensus representation of genotype (sequence) to phenotype information including highly detailed information about genes and gene products.  Primary foci of integration are through representations of relationships between genes, sequences and phenotypes.  MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse.  Recent developments include a general implementation of database structures for controlled vocabularies and the integration of a phenotype classification system. Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
  • 28. INTRODUCTION  The Mouse Genome Database (MGD) provides an integrated information on mouse genes, genetic markers and genomic features as well as information on molecular segments ( probes, primers, cDNA clones, BACs and YACs) mutant phenotypes, comparative mapping data, graphical displays of linkage, cytogenetic and physical maps, experimental mapping data, as well as strain distribution patterns for recombinant inbred strains (RIs) and cross haplotypes.  MGD is updated daily . It providing several new data manipulation and display tools.  MGD is one component of the Mouse Genome Informatics (MGI) database resource (http://www.informatics.jax.org) located at The Jackson Laboratory (http://www.jax.org). Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
  • 29. IMPROVEMENTS DURING 2002  Implementation of phenotype classifications  A broad, high-level set of phenotype terms have been developed and employed to classify phenotype data in MGD.  This defined vocabulary of 105 terms can be used to search, group, compare and analyze phenotypes.  These phenotype classification terms appear on the Alleles and Phenotypes Query Form (Fig. 1), and on the Genes and Marker Query Form.  The complete list of terms and their accession IDs is also available by FTP.  On each form, there is a link to the phenotype classification terms, complete with definitions and examples.  Users of the MGI database can select one or more terms from the list to search for records associated with a particular phenotype, in combination with many other parameters on the forms.  In addition, text based searches for more specific phenotypic terms remain available. Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
  • 30. Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
  • 31. Improvements to the MGI:GO browser  The MGI GO Browser (http://www.informatics.jax.org/searches/GO_form.sht ml) allows database users to access genes in MGI using functional annotation terms from the GO.  This Browser was developed in conjunction with the GXD. (Gene Expression Database )  The GO Browser can be accessed from gene detail or query pages as well as directly from the MGI menus. Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
  • 32. Availability of MGI:GO files in various formats  MGI gene-to-GO annotations are updated daily.  Various files for the MGI gene/markers with the GO associations are publicly available.  These files are updated each time MGI submits a new gene association file to the GO web site (http:// www.geneontology.org) and can be accessed on the MGI FTP server (ftp://www.informatics.jax.org/pub/informatics/reports/ gene association.mgi).  A file of all the GO terms used by MGI in the annotation of genes and gene products is also available. MGI also provides a file to the GO database of MGI Gene : SWISS-PROT associations.  This information is incorporated into the GO database and thus enables users to recover mouse sequence data as a result of a semantic search against the GO database (http://www.godatabase.org/cgi-bin/go.cgi Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
  • 33. IMPLEMENTATION  MGD is implemented in the Sybase relational database system, version 12.5.  A large set of CGI scripts and Java Servlets mediate the user’s interaction with the database.  For computational users, direct SQL access can be requested through User Support.  User-requested database reports and a number of widely used data files (generated daily) are available on the FTP site (ftp://ftp.informatics.jax.org). Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047
  • 34. CITING MGD  The following citation format is suggested when referring to datasets specific to the MGD component of MGI :  Mouse Genome Database (MGD), Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine (URL: http://www.informatics.jax.org). Judith A. Blake et.al MGD: the Mouse Genome Database Nucleic Acids Research, 2003, Vol. 31, No. 1 193–195DOI: 10.1093/nar/gkg047

Hinweis der Redaktion

  1. tab delimited ::A text format that uses tab characters as separators between fields. Unlike comma delimited files, alphanumeric data are not surrounded by quotes.
  2. Jackson Laboratory The Jackson Laboratory is an independent, nonprofit biomedical research institution, dedicated to contributing to a future of better health care based on the unique genetic makeup of each individual.