SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Bioinformatics
Dr. Shailendra Singh Khichi
What is Bioinformatics
• Bioinformatics is an interdisciplinary research
area at the interface between computer science
and biological science
• Luscombe et al. in defining bioinformatics as a
union of biology and informatics:
• Bioinformatics uses computers for storage,
retrieval, manipulation, and distribution of
information related to biological
macromolecules such as DNA, RNA, and
proteins
• Bioinformatics differs from a related field known as
computational biology.
• Bioinformatics is limited to sequence, structural, and
functional analysis of genes and genomes and their
corresponding products.
• However, computational biology encompasses all
biological areas that involve computation.
• For example, mathematical modeling of ecosystems,
population dynamics, application of the game theory in
behavioral studies, and phylogenetic construction using
fossil records all employ computational tools, but do not
necessarily involve biological macromolecules.
• The first sequence alignment algorithm was
developed by Needleman and Wunsch in 1970.
• This was a fundamental step in the development of
the field of bioinformatics, which paved the way for
the routine sequence comparisons and database
searching practiced by modern biologists.
• The first protein structure prediction algorithm was
developed by Chou and Fasman in 1974.
Goals
• The ultimate goal of bioinformatics is to better understand
a living cell and how it functions at the molecular level.
• By analyzing raw molecular sequence and structural data,
bioinformatics research can generate new insights and
provide a “global” perspective of the cell.
• Cellular functions are mainly performed by proteins whose
capabilities are ultimately determined by their sequences.
• Therefore, solving functional problems using sequence and
sometimes structural approaches has proved to be a fruitful
endeavor.
Scope
• Bioinformatics consists of two subfields:
1. The development of computational tools and
databases and
2. Application of these tools and databases in
generating biological knowledge to better
understand living systems.
• The tool development includes: writing software
for sequence, structural, and functional analysis, as
well as the construction and curating (organizing) of
biological databases.
• Tools are used in 3 areas of genomic and molecular biological
research:
1. molecular sequence analysis,
2. molecular structural analysis,
3. and molecular functional analysis.
1. Sequence analysis: sequence alignment, sequence database
searching, motif and pattern discovery, gene and promoter finding,
reconstruction of evolutionary relationships, and genome assembly
and comparison.
2. Structural analyses: protein and nucleic acid structure analysis,
comparison, classification, and prediction.
3. Functional analyses: gene expression profiling, protein–protein
interaction prediction, protein subcellular localization prediction,
metabolic pathway reconstruction, and simulation
• These 3 aspects are integrated with each other
• For eg: protein structure prediction depends on
sequence alignment data;
• clustering of gene expression profiles requires the use
of phylogenetic tree construction methods derived in
sequence analysis.
• Sequence-based promoter prediction is related to
functional analysis of coexpressed genes.
• Gene annotation include
1.Distinction between coding and noncoding sequences,
2. Identification of translated protein sequences,
3. Determination of the gene’s evolutionary relationship
with other known genes.
Applications
• knowledge-based drug design,
• forensic DNA analysis, and
• Agricultural biotechnology.
• Computational studies of protein–ligand interactions provide a rational basis
for the rapid identification of novel leads for synthetic drugs.
• Knowledge of the 3D structures of proteins allows molecules to be designed
that are capable of binding to the receptor site of a target protein with great
affinity and specificity.
• This informatics-based approach significantly reduces the tie and cost
necessary to develop drugs with higher poteny fewer side effects, and less
toxicity than using the traditional trial-and-error approach.
LIMITATIONS
• Bioinformatics depends on experimental science to
produce raw data for analysis.
• It, in turn, provides useful interpretation of
experimental data and important leads for further
experimental research.
• The quality of bioinformatics predictions depends on
the quality of data and the sophistication of the
algorithms being used
• Sequence data from high throughput analysis often
contain errors.
• If the sequences are wrong or annotations incorrect,
the results from the downstream analysis are
misleading as well.
Program Database Query Typical uses
BLASTN Nucleotide Nucleotide
Mapping oligonucleotides, cDNAs, and PCR
products to a genome; screening repetitive
elements; cross-species sequence exploration;
annotating genomic DNA; clustering sequencing
reads; vector clipping
BLASTP Protein Protein
Identifying common regions between proteins;
collecting related proteins for phylogenetic
analyses
BLASTX Protein
Nucleotide
translated into
protein
Finding protein-coding genes in genomic DNA;
determining if a cDNA corresponds to a known
protein
TBLASTN
Nucleotide
translated into
protein
Protein
Identifying transcripts, potentially from multiple
organisms, similar to a given protein; mapping a
protein to genomic DNA
TBLASTX
Nucleotide
translated into
protein
Nucleotide
translated into
protein
Cross-species gene prediction at the genome or
transcript level; searching for genes missed by
traditional methods or not yet in protein
databases
WHAT IS A DATABASE
A database is a computerized archive used to store and organize data in such a way
that information can be retrieved easily via a variety of search criteria.
Databases are composed of computer hardware and software for data management.
The objective of the development of a database is to organize data in a set of structured
records to enable easy retrieval of information.
Each record, also called an entry, should contain a number of fields that hold the actual
data items.
To retrieve a particular record from the database, a user can specify a particular piece of
information, called value, to be found in a particular field and expect the computer to
retrieve the whole data record. This process is called making a query.
Types of DATABASES Format
• Flat file
• Relational databases
• Object oriented database
1. Flat File format:
• Originally, databases all used a flat file format, which is a long text
file that contains many entries separated by a delimiter (І)
• The entire text file does not contain any hidden instructions for
computers to search for specific information or to create reports
based on certain fields from each record.
• The text file can be considered a single table
• To search a flat file for a particular piece of information, a computer
has to read through the entire file, an obviously inefficient process
• Relational Databases
• Instead of using a single table as in a flat file database,
relational databases use a set of tables to organize data.
• Each table, also called a relation, is made up of columns
and rows
• Columns represent individual fields. Rows represent values
in the fields of records.
• The columns in a table are indexed according to a common
feature called an attribute, so they can be cross-referenced
in other tables
• To execute a query in a relational database, the system
selects linked data items from different tables and
combines the information into one report.
• Therefore, specific information can be found more quickly
from a relational database than from a flat file database.
Relational Databases
• Object-Oriented Databases
• One of the problems with relational databases
is that the tables used do not describe
complex hierarchical relationships between
data items.
• To overcome the problem, object-oriented
databases have been developed that store
data as objects
• an object can be considered as a unit that
combines data and mathematical routines
that act on the data
Types of Biological Databases
• Biological databases can be roughly divided into three categories:
primary databases, secondary databases, and specialized databases.
Primary databases contain original biological data. They are archives
of raw sequence or structural data submitted by the scientific
community.
Eg: GenBank and Protein Data Bank (PDB)
Secondary databases contain computationally processed or manually
curated information, based on original information from primary
databases.
• Translated protein sequence databases containing functional
annotation belong to this category.
• Eg: SWISS-Prot and Protein Information Resources (PIR)
• Specialized databases are those that cater to a
particular research interest.
• For example, Flybase, HIV sequence database,
and Ribosomal Database Project are
databases that specialize in a particular
organism or a particular type of data.
Primary Databases
• There are 3 major public sequence databases that store raw nucleic acid
sequence data produced and submitted by researchers worldwide:
• GenBank,
• European Molecular Biology Laboratory (EMBL)
• DNA Data Bank of Japan (DDBJ)
• which are all freely available on the Internet.
• Most of the data in the databases are contributed directly by authors with
a minimal level of annotation.
• These three public databases closely collaborate and exchange new data
daily. They together constitute the International Nucleotide Sequence
Database Collaboration.
• Although the three databases all contain the same sets of raw data, each
of the individual databases has a slightly different kind of format to
represent the data.
• For the three-dimensional structures of biological macromolecules, there
is only one centralized database, the PDB.
• This database archives atomic coordinates of macromolecules (both
proteins and nucleic acids) determined by x-ray crystallography and NMR.
It uses a flat file format to represent protein name, authors, experimental
details, secondary structure, cofactors, and atomic coordinates.
• Secondary Databases
• Sequence annotation information in the primary database is often
minimal.
• To turn the raw sequence information into more sophisticated
biological knowledge, much post processing of the sequence
information is needed.
• which contain computationally processed sequence information
derived from the primary databases.
• some are simple archives of translated sequence data from
identified open reading frames in DNA, whereas others provide
additional annotation and information related to higher levels of
information regarding structure and functions.
• A prominent example of secondary databases is SWISS-PROT, which
provides detailed sequence annotation that includes structure,
function, and protein family assignment. The sequence data are
mainly derived from TrEMBL, a database of

Weitere ähnliche Inhalte

Was ist angesagt?

Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
biinoida
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
Abhishek Vatsa
 

Was ist angesagt? (20)

NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
SWISS-PROT
SWISS-PROTSWISS-PROT
SWISS-PROT
 
Protein Data Bank (PDB)
Protein Data Bank (PDB)Protein Data Bank (PDB)
Protein Data Bank (PDB)
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
BLAST
BLASTBLAST
BLAST
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Ddbj
DdbjDdbj
Ddbj
 
Introduction of bioinformatics
Introduction of bioinformaticsIntroduction of bioinformatics
Introduction of bioinformatics
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Structural databases
Structural databases Structural databases
Structural databases
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Protein database
Protein databaseProtein database
Protein database
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Application of Bioinformatics in different fields of sciences
Application of Bioinformatics in different fields of sciencesApplication of Bioinformatics in different fields of sciences
Application of Bioinformatics in different fields of sciences
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 

Ähnlich wie Bioinformatics

Ähnlich wie Bioinformatics (20)

Biological data base
Biological data baseBiological data base
Biological data base
 
Bioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptBioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.ppt
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Basics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptxBasics Of Bioinformatics .pptx
Basics Of Bioinformatics .pptx
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Biological databases
Biological databases Biological databases
Biological databases
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
BASIC OF BIOINFORMATICS.pptx
BASIC OF BIOINFORMATICS.pptxBASIC OF BIOINFORMATICS.pptx
BASIC OF BIOINFORMATICS.pptx
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 

Kürzlich hochgeladen

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Kürzlich hochgeladen (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Bioinformatics

  • 2. What is Bioinformatics • Bioinformatics is an interdisciplinary research area at the interface between computer science and biological science • Luscombe et al. in defining bioinformatics as a union of biology and informatics: • Bioinformatics uses computers for storage, retrieval, manipulation, and distribution of information related to biological macromolecules such as DNA, RNA, and proteins
  • 3. • Bioinformatics differs from a related field known as computational biology. • Bioinformatics is limited to sequence, structural, and functional analysis of genes and genomes and their corresponding products. • However, computational biology encompasses all biological areas that involve computation. • For example, mathematical modeling of ecosystems, population dynamics, application of the game theory in behavioral studies, and phylogenetic construction using fossil records all employ computational tools, but do not necessarily involve biological macromolecules.
  • 4. • The first sequence alignment algorithm was developed by Needleman and Wunsch in 1970. • This was a fundamental step in the development of the field of bioinformatics, which paved the way for the routine sequence comparisons and database searching practiced by modern biologists. • The first protein structure prediction algorithm was developed by Chou and Fasman in 1974.
  • 5. Goals • The ultimate goal of bioinformatics is to better understand a living cell and how it functions at the molecular level. • By analyzing raw molecular sequence and structural data, bioinformatics research can generate new insights and provide a “global” perspective of the cell. • Cellular functions are mainly performed by proteins whose capabilities are ultimately determined by their sequences. • Therefore, solving functional problems using sequence and sometimes structural approaches has proved to be a fruitful endeavor.
  • 6. Scope • Bioinformatics consists of two subfields: 1. The development of computational tools and databases and 2. Application of these tools and databases in generating biological knowledge to better understand living systems. • The tool development includes: writing software for sequence, structural, and functional analysis, as well as the construction and curating (organizing) of biological databases.
  • 7. • Tools are used in 3 areas of genomic and molecular biological research: 1. molecular sequence analysis, 2. molecular structural analysis, 3. and molecular functional analysis. 1. Sequence analysis: sequence alignment, sequence database searching, motif and pattern discovery, gene and promoter finding, reconstruction of evolutionary relationships, and genome assembly and comparison. 2. Structural analyses: protein and nucleic acid structure analysis, comparison, classification, and prediction. 3. Functional analyses: gene expression profiling, protein–protein interaction prediction, protein subcellular localization prediction, metabolic pathway reconstruction, and simulation
  • 8.
  • 9. • These 3 aspects are integrated with each other • For eg: protein structure prediction depends on sequence alignment data; • clustering of gene expression profiles requires the use of phylogenetic tree construction methods derived in sequence analysis. • Sequence-based promoter prediction is related to functional analysis of coexpressed genes. • Gene annotation include 1.Distinction between coding and noncoding sequences, 2. Identification of translated protein sequences, 3. Determination of the gene’s evolutionary relationship with other known genes.
  • 10. Applications • knowledge-based drug design, • forensic DNA analysis, and • Agricultural biotechnology. • Computational studies of protein–ligand interactions provide a rational basis for the rapid identification of novel leads for synthetic drugs. • Knowledge of the 3D structures of proteins allows molecules to be designed that are capable of binding to the receptor site of a target protein with great affinity and specificity. • This informatics-based approach significantly reduces the tie and cost necessary to develop drugs with higher poteny fewer side effects, and less toxicity than using the traditional trial-and-error approach.
  • 11. LIMITATIONS • Bioinformatics depends on experimental science to produce raw data for analysis. • It, in turn, provides useful interpretation of experimental data and important leads for further experimental research. • The quality of bioinformatics predictions depends on the quality of data and the sophistication of the algorithms being used • Sequence data from high throughput analysis often contain errors. • If the sequences are wrong or annotations incorrect, the results from the downstream analysis are misleading as well.
  • 12. Program Database Query Typical uses BLASTN Nucleotide Nucleotide Mapping oligonucleotides, cDNAs, and PCR products to a genome; screening repetitive elements; cross-species sequence exploration; annotating genomic DNA; clustering sequencing reads; vector clipping BLASTP Protein Protein Identifying common regions between proteins; collecting related proteins for phylogenetic analyses BLASTX Protein Nucleotide translated into protein Finding protein-coding genes in genomic DNA; determining if a cDNA corresponds to a known protein TBLASTN Nucleotide translated into protein Protein Identifying transcripts, potentially from multiple organisms, similar to a given protein; mapping a protein to genomic DNA TBLASTX Nucleotide translated into protein Nucleotide translated into protein Cross-species gene prediction at the genome or transcript level; searching for genes missed by traditional methods or not yet in protein databases
  • 13. WHAT IS A DATABASE A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria. Databases are composed of computer hardware and software for data management. The objective of the development of a database is to organize data in a set of structured records to enable easy retrieval of information. Each record, also called an entry, should contain a number of fields that hold the actual data items. To retrieve a particular record from the database, a user can specify a particular piece of information, called value, to be found in a particular field and expect the computer to retrieve the whole data record. This process is called making a query.
  • 14. Types of DATABASES Format • Flat file • Relational databases • Object oriented database 1. Flat File format: • Originally, databases all used a flat file format, which is a long text file that contains many entries separated by a delimiter (І) • The entire text file does not contain any hidden instructions for computers to search for specific information or to create reports based on certain fields from each record. • The text file can be considered a single table • To search a flat file for a particular piece of information, a computer has to read through the entire file, an obviously inefficient process
  • 15. • Relational Databases • Instead of using a single table as in a flat file database, relational databases use a set of tables to organize data. • Each table, also called a relation, is made up of columns and rows • Columns represent individual fields. Rows represent values in the fields of records. • The columns in a table are indexed according to a common feature called an attribute, so they can be cross-referenced in other tables • To execute a query in a relational database, the system selects linked data items from different tables and combines the information into one report. • Therefore, specific information can be found more quickly from a relational database than from a flat file database.
  • 17. • Object-Oriented Databases • One of the problems with relational databases is that the tables used do not describe complex hierarchical relationships between data items. • To overcome the problem, object-oriented databases have been developed that store data as objects • an object can be considered as a unit that combines data and mathematical routines that act on the data
  • 18.
  • 19. Types of Biological Databases • Biological databases can be roughly divided into three categories: primary databases, secondary databases, and specialized databases. Primary databases contain original biological data. They are archives of raw sequence or structural data submitted by the scientific community. Eg: GenBank and Protein Data Bank (PDB) Secondary databases contain computationally processed or manually curated information, based on original information from primary databases. • Translated protein sequence databases containing functional annotation belong to this category. • Eg: SWISS-Prot and Protein Information Resources (PIR)
  • 20. • Specialized databases are those that cater to a particular research interest. • For example, Flybase, HIV sequence database, and Ribosomal Database Project are databases that specialize in a particular organism or a particular type of data.
  • 21. Primary Databases • There are 3 major public sequence databases that store raw nucleic acid sequence data produced and submitted by researchers worldwide: • GenBank, • European Molecular Biology Laboratory (EMBL) • DNA Data Bank of Japan (DDBJ) • which are all freely available on the Internet. • Most of the data in the databases are contributed directly by authors with a minimal level of annotation. • These three public databases closely collaborate and exchange new data daily. They together constitute the International Nucleotide Sequence Database Collaboration. • Although the three databases all contain the same sets of raw data, each of the individual databases has a slightly different kind of format to represent the data. • For the three-dimensional structures of biological macromolecules, there is only one centralized database, the PDB. • This database archives atomic coordinates of macromolecules (both proteins and nucleic acids) determined by x-ray crystallography and NMR. It uses a flat file format to represent protein name, authors, experimental details, secondary structure, cofactors, and atomic coordinates.
  • 22. • Secondary Databases • Sequence annotation information in the primary database is often minimal. • To turn the raw sequence information into more sophisticated biological knowledge, much post processing of the sequence information is needed. • which contain computationally processed sequence information derived from the primary databases. • some are simple archives of translated sequence data from identified open reading frames in DNA, whereas others provide additional annotation and information related to higher levels of information regarding structure and functions. • A prominent example of secondary databases is SWISS-PROT, which provides detailed sequence annotation that includes structure, function, and protein family assignment. The sequence data are mainly derived from TrEMBL, a database of