3. Bioinformatics is the field of science in which
biology and computer science merge to solve
biological problems
Application of computational tools on molecular
data
3
4. 4
Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Computer scientists
(+Mathematicians, Statisticians,
etc.)Develop tools, software, algorithms to
store and analyze the data.
Bioinformaticians
Study biological questions by analyzing
molecular data
4
6. Bioinformatics makes it possible to analyze large
quantities of complex biological data and can be used
to
search biological databases
compare sequences
estimate molecular structures.
6
8. Gray wolf
Jack Russell
Toy Poodle
Labradoodle
Coyote
English Shepherd
Cocker Spaniel
Red fox
Image Source: Wikipedia Commons
9. Genetic Data
…TTCACCAACAGGCCCACA…
Extract DNA from
Cells.
Sequence DNA.
Compare
DNA
Sequences to
One Another.
Obtain Samples:
Blood , Saliva, Hair
Follicles, Feathers, Scales.
TTCAACAACAGGCCCAC
TTCACCAACAGGCCCAC
TTCATCAACAGGCCCAC
14. 4 FEB 2016 By- ANSHIKA BANSAL 14
IBM 7090 computer
In1960s: the birth of bioinformatics
Margaret Oakley Dayhoff created:
The first protein database
The first program for sequence assembly
There is a need for computers and algorithms that allow:
Access, processing, storing, sharing, retrieving, visualizing, annotating…
15. 15
Watson and Crick
DNA model
Sanger sequences
insulin protein
Sanger dideoxy
DNA sequencing
PCR (Polymerase
Chain Reaction)
1955
1960
1965
1970
1975
1980
1985
ARPANET
(early Internet)
PDB (Protein
Data Bank)
Sequence
alignment
GenBank
database
Dayhoff’s Atlas
17. Exponential growth in biological data.
Data (genomic sequences, 3D structures, 2D gel
analysis, Microarrays….) are no longer published
in a conventional manner, but directly submitted
to databases.
Essential tools for biological research. The only
way to publish massive amounts of data without
using all the paper in the world.
17
26. 26
Bookshelf: A collection of searchable biomedical books linked to
PubMed.
PubMed: Allows searching by author names, journal titles, and a
new Preview/Index option. PubMed database provides access to
over 12 million MEDLINE citations back to the mid-1960's. It
includes History and Clipboard options which may enhance your
search session.
PubMed Central: The U.S. National Library of Medicine digital
archive of life science journal literature.
OMIM: Online Mendelian Inheritance in Man is a database of
human genes and genetic disorders (also OMIA).
Literature Databases:
27. 27
DNA (nucleotide sequences) databases
They are big databases and searching either one should produce
similar results because they exchange information routinely.
-GenBank (NCBI): http://www.ncbi.nlm.nih.gov
-DDBJ (DNA DataBase of Japan):
http://www.ddbj.nig.ac.jp
-Yeast: http://yeastgenome.org
Specialized databases:Tissues, species…
-ESTs (Expressed Sequence Tags)
~at NCBI http://www.ncbi.nlm.nih.gov/dbEST
~at TIGR http://tigr.org/tdb/tgi
- ...many more!
28. 28
They are big databases too:
-Swiss-Prot (very high level of annotation)
http://au.expasy.org/
-PIR (protein identification resource) the world's most
comprehensive catalog of information on proteins
http://www.pir.uniprot.org/
Translated databases:
-TREMBL (translated EMBL): includes entries that have
not been annotated yet into Swiss-Prot.
http://www.ebi.ac.uk/trembl/access.html
-GenPept (translation of coding regions in GenBank)
-pdb (sequences derived from the 3D structure
Brookhaven PDB) http://www.rcsb.org/pdb/
Protein (amino acid) databases
29. BLAST-Basic Local Alignment Search Tool
CLUSTALW-Alignment Tool
GRAIL-Gene Recognition and Analysis
Internet Link
Primer 3-Primer designing Tool
29
31. 1.. Literature RetrievalLiterature Retrieval
2.2. Biological ApplicationBiological Application
Nucleotide ApplicationsNucleotide Applications
Information RetrievalInformation Retrieval
Sequence AnalysisSequence Analysis
Sequence TranslationSequence Translation
Protein ApplicationsProtein Applications
Information RetrievalInformation Retrieval
Protein AnalysisProtein Analysis
Structure AnalysisStructure Analysis
31
32. 1. Literature Retrieval
Use PubMed to search journals and other literature on any
biological or chemical item of interest. Full articles are
not provided in this database, only citations and
abstracts are available to view.
2. Biological Application-
The biological data is stored consistently and is easily
available to the scientific community, the requirement
is then to provide methods for extracting the
meaningful information from the mass of data.
32
33. 3. Nucleotide Applications
Information Retrieval
There are numerous databases around the world containing
information useful for computational biologists. The main
ones are: the National Center for Biotechnology Information
(NCBI), the European Bioinformatics Institute (EBI), and the
DNA Database of Japan (DDBJ).
33
34. Sequence Retrieval
Find the nucleotide sequence for a gene of interest
Eg. Mtb rpob gene
>gi|448814763:759807-763325 Mycobacterium tuberculosis H37Rv,
complete genome
TTGGCAGATTCCCGCCAGAGCAAAACAGCCGCTAGTCCTAG
TCCGAGTCGCCCGCAAAGTTCCTCGAATA
ACTCCGTACCCGGAGCGCCAAACCGGGTCTCCTTCGCTAAG
CTGCGCGAACCACTTGA…………………………………..
Sequence Identification -
Find function and possible origin of gene from a sequence
2.2 Sequence Analysis
With these applications we can align two sequences, align multiple
sequences, and perform phylogenic analyses. One reason we would
do this is to determine what parts of the sequences are conserved
from one species to the next. Another reason would be to see how
much an organism has diverged from other organisms simply by
comparing their DNA sequences. 34
35. Sequence Translation
Computational biologists need to analyze their nucleotide
sequences, and the best way to do that is to study the protein
product. The following programs will either convert your DNA
sequence into an amino acid (protein) sequence or it will take
your protein and convert it into its complimentary DNA (cDNA)
sequences. These protein and DNA sequences can then be
analyzed using other applications on this page.
Translation – Converts nucleotide sequences into protein
sequences.
Backtranslation – Converts protein sequences into nucleotide
sequences or complimentary DNA (cDNA).
35
36. Protein Applications:
Information Retrieval
The numerous information retrieval sites on the Internet can
give very valuable information concerning the sequence and
properties of a protein. Numerous databases exist and each
database is accessible through convenient search programs. This
section will introduce useful sites that provide database search
capabilities.
Protein Sequence Retrieval – Allows user to retrieve sequence
from protein name, accession number, or GI identification
number.
Protein Identification – Allows user to retrieve a protein name
or accession and GI numbers from polypeptide sequence.
36
37. Protein Analysis
After obtaining the identity or sequence of a protein, there are
several valuable tools that allow further analysis of the
protein. Information can be obtained concerning the
characteristic properties of the proteins from the
sequence. Another valuable tool is sequence alignment
applications that establish the degree of similarity between two
proteins or multiple proteins.
Determining Protein Sequence Properties – User can find
molecular weight (MW), isoelectric point (pI), titration curves,
hydrophobicity etc. for particular protein.
Protein Sequence Alignment – Align a single sequence to
sequences in a database.
Pair-wise Sequence Alignment – Align two protein sequences to
each other.
Multiple Sequence Alignment – Align many sequences against a37
38. Structure Analysis
Several programs have been created that give scientists the
ability to look at the three dimensional shape of proteins and
nucleotides. Examining a protein in 3D allows for greater
understanding of protein functions, as well as providing
students with a visual understanding that cannot always be
conveyed through still photographs or descriptions
RasMol, originally developed by Roger Sayle. To use this
program it must first be downloaded onto your computer.
Cn3D other 3D structure viewer
38