2. Bioinformatics is the discipline that uses computers
to store, retrieve, manipulate and distribute
information related to biological macromolecules
such as RNA, DNA and proteins
Computational biology encompasses all areas of
biology that involve computation
Goal
Better understand a living cell and how it functions
at a molecular level
Definition
3. Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, softwares, algorithms
to store and analyze the data.
Bioinformaticians
Study biological questions by
analyzing molecular data
The field of science in which biology, computer science and
information technology merge into a single discipline
3
4. 1. Development of computational tools and databases
•Software for sequence analysis
•Sequence alignment, sequence database searching,
motif and pattern discovery, gene and promoter finding,
reconstruction of evolutionary relationships, genome
assembly and comparison
•Software for structural analysis
•Protein and nucleic acid structural analysis,
comparison, classification and prediction
•Construction and curation of biological databases
2. Generate biological knowledge to better understand
living systems
•Knowledge-based drug design
•Reduces time and cost to develop drugs
•Agricultural biotechnology
•Plant genome databases
•New crop varieties
Two Major Fields
5. • Completely relying on the information is
dangerous if the info is inaccurate
• Quality of bioinformatics predictions
depends on
quality of the data and
sophistication of the algorithms
• Data (e.g. sequence, expression) may
contain errors
Limitations of bioinformatics
6. •A database is a computerized archive used to store and organize
data so that information can be retrieved by a variety of search
criteria
•A database can be thought as a stack or record cards, where each
record card contains defined items of information, say Name,
Address, Phone Number, Birth Date, etc.
•In a database, each such card is an entry, and each set
information item is a field
•Each field of each entry contains a value (can be NULL)
•Search all entries, retrieve entries that contain a specific value in a
field
•This process is called making a query
•Biological databases often have higher level requirements.
What is a database?
7. Biological databases
•Primary databases
•Raw sequence data
•GenBank
•PDB
•Secondary databases
•Computationally processed or curated database
•SWISS-PROT
•PIR (Protein Information Resource)
•Specialized databases
•For specific interest groups
•FlyBase
•SGD
8. Primary Databases
•Databases consisting of data derived experimentally
such as nucleotide sequences and three dimensional
structures are known as primary databases.
•Three major databases
1. GenBank (http://www.ncbi.nlm.nih.gov/Genbank/)
2. EMBL (European Molecular Biology Laboratory)
3. DDBJ (DNA Data Bank of Japan)
9. •Those data that are derived from the analysis or treatment
of primary data such as secondary structures, plots etc. are
stored in secondary databases.
•Three major databases
•SWISS-PROT
•tremble
•UniProt
•Carefully curated database
•High quality
•SWISS-PROT, trEMBL and PIR combined in UniProt
•BLOCKS – motifs and patterns
Secondary databases
10. •Often focused on a specific aspect of an
organism
•Curated by experts
•Highly annotated and processed data
•Three major dtabses
• SGD (Saccharomyces Genome Database)
• FlyBase(Insect Database-Drosophilidae)
• WormBase(Nematoda Database)
Specialized Databases
11. • Started 1986 (1990 formally) completed April 2003
• U.S. Department of Energy (DoE) and the National Institutes of
Health (NIH)
Goals:
■identify all the genes in human DNA,
■determine all the sequences of chemical base pairs
that make up human DNA
■store this information in databases,
■improve tools for data analysis,
■transfer related technologies to the private sector,
and
■address the ethical, legal, and social issues (ELSI)
that may arise from the project.
12. • Basic Level
• Organization of the collected data
• Maintenance: correction and update
• Very sophisticated databases are needed
• Second Level
• Development of tools and resources
• For analysis and interpretation of data
• More challenging task
• More important and interesting to
biologist
• One important task is searching for
similarity
Bioinformatics Application Levels
13. • Third Level
• Modeling and simulating different bio-modules
• Use system level analysis and interpretation
• Search the origin of life, rules of evolution
• Use the acquired knowledge for treating and
curing disease, aging
Bioinformatics Application Levels