1. Biological Database:
Biological databases are collection of files containing records of biological data in machine
readable form that can be accessed, added, retrieved, manipulated and modified. These
databases help to store, manage, connect and distribute data at a very large level. In any
biological database, data are arranged by sets of rules which are programmed into software that
manages the data called Database Management System or DBMS.
Types of Biological databases (BD):
BD can be classified into the following three types based on the type of data stored:
1. Primary Databases: They contain original data in the form of primary sequence data or
structural data as submitted by the scientific community.
2. Secondary Databases: They contain information that has been processed and derived from
the raw data available in primary database.eg: PROSITE, PRINTS, BLOCKS etc.
3. Composite Databases: They collect and present data after comparing and filtering them
from different primary databases and exhibit only the non- redundant sequences.
Few examples of Primary Databases are:
a. Nucleic acid databases: Gen Bank, EMBL, DDBJ
b. Protein sequence databases: PIR, Swiss-Prot, UNIPROT
c. Protein structure database: PDB
d. Metabolic databases: KEGG
Nucleic Acid Sequence Databases:
These are composed of a group of nucleotide sequence entries. Data repositories that accept
nucleic acid sequence data and make it freely available to the public. GenBank, EMBL, DDBJ
are principal nucleotide databases. All the three are members of the International Nucleotide
Sequence Database Consortium (INSDC) and interchange data.
A. GenBank of NCBI:
It is hosted by National Centre for Biotechnology Information (NCBI), situated at the campus
of US National Institute of Health, USA. Gen Bank offers all publicly available nucleotide
sequences, their protein translation, and their annotated informations. It also facilitates direct
submission of sequence data by a user friendly process. Researchers from anywhere can submit
their data to GenBank.
An accession number is given to the submitted sequence and then released to the public
database after the quality assurance check. This information can be retrieved using the Entrez
retrieval system.
One can access the data in NCBI over the internet through website,
http://www.ncbi.n/m.nih.gov/genbank
BOTMT:604
Bioinformatics and Biophysics
Prepared By-
Dr. Sangeeta Das.
Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.
2. B. NCBI:
http://www.ncbi.nlm.nih.gov/
The National Center for Biotechnology Information (NCBI) is part of the United States
National Library of Medicine, a branch of the National Institutes of Health. The NCBI is
located in Bethesda, Maryland, United States and was founded in 1988 through legislation
sponsored by Senator Claude Pepper.
Home page of NCBI is:
BOTMT:604
Bioinformatics and Biophysics
Prepared By-
Dr. Sangeeta Das.
Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.
3. The NCBI houses a series of databases relevant to biotechnology and biomedicine and is an
important resource for bioinformatics tools and services. Major databases include GenBank for
DNA sequences and PubMed, a bibliographic database for the biomedical literature. Other
databases include the NCBI Epigenomics database. All these databases are available online
through the Entrez search engine. NCBI was directed by David Lipman, one of the original
authors of the BLAST sequence alignment program and a widely respected figure in
bioinformatics. As a national resource for molecular biology information, NCBI's mission is to
develop new information technologies to aid in the understanding of fundamental molecular
and genetic processes that control health and diseases.
Protein sequence databases:
These include an array of amino acid sequence entries arranged according to the identification
number. Examples of protein sequence databases include Swiss-Prot, PIR and UNIPROT.
A. Swiss-Prot:
http://www.expasy.ch/sprot
This database was developed by the Swiss Institute of Bioinformatics (SIB) and European
Bioinformatics Institute(EBl). It is a high quality, manually annotated protein sequence
database created in 1986. It provides high level annotations with functions of protein and post
transcriptional modifications. It provides all known relevant information about a particular
protein. This database consists of two sections: - UniProt KB/ Swiss-Prot, which is manually
annotated and reviewed, and Uni is ProtKB/TrEMBL, which is automatically annotated and
not reviewed.
BOTMT:604
Bioinformatics and Biophysics
Prepared By-
Dr. Sangeeta Das.
Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.
4. B. Protein Information Resource (PIR) database:
https://proteininformationresource.org/
It was established in 1984, by National Biomedical Research Foundation (NBRF). It is an
integrated public bioinformatics resource that support genomic and proteomic research, and
scientific studies. It assists in the interpretation of protein sequence informations. PIR can be
searched for entries or sequence similarity searches.
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated
protein sequence database in the public domain, the PIR-International Protein Sequence
Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS)
and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site
allows sequence similarity and text searching of the Protein Sequence Database and auxiliary
databases. Several new web-based search engines combine searches of sequence similarity and
database annotation to facilitate the analysis and functional identification of proteins. New
capabilities for searching the PIR sequence databases include annotation-sorted search, domain
search, combined global and domain search, and interactive text searches. The PIR-
International databases and search tools are accessible on the PIR WWW site at
http://pir.georgetown.edu and at the MIPS WWW site at http://www.mips.biochem.mpg.de .
The PIR-International Protein Sequence Database and other files are also available by FTP.
Prepared By-
Dr. Sangeeta Das.
Assistant Professor, Department of Botany, Bahona College, Jorhat, Assam, India.
BOTMT:604
Bioinformatics and Biophysics