Bioinformatics is an interdisciplinary field that combines biology, computer science, and information technology. It enables the discovery of new biological insights and unifying principles in biology through the merging of these disciplines. There are three main sub-disciplines: developing algorithms and statistics for analyzing large datasets, analyzing various types of biological data like sequences and structures, and developing tools for accessing and managing information.
2. “The field of science in which biology, computer science, and information
technology merge into a single discipline. The ultimate goal of the field is to
enable the discovery of new biological insights as well as to create a global
perspective from which unifying principles in biology can be discerned. There
are three important sub- disciplines within bioinformatics: the development
of new algorithms and statistics with which to assess relationships among
members of large data sets; the analysis and interpretation of various types of
data including nucleotide and amino acid sequences, protein domains, and
protein structures; and the development and implementation of tools that
enable efficient access and management of different types of information.
"Education" NCBI, 2003 http://www.ncbi.nlm.nih.gov/Education/index.html
M.Phil, Periyar University
3. Drug design or rational drug design, is the discover process of finding new medications based
on the knowledge of a biological target.
The drug is most commonly an organic small molecule that activates or inhibits the function
of a biomolecule such as a protein, which in turn results in a therapeutic benefit to the patient
& it is mostly involves the design of molecules that are complementary in shape and charge
to the biomolecular target with which they interact & therefore will bind to it & drug design
frequently but not necessarily relies on computer modeling technique.
This type of modeling is often referred to as CADD.
Finally drug design that relies on the knowledge of the 3D-Structure of the biomolecular
target is known as SBDD.
In addition to small molecules, biopharmaceutical & especially therapeutic antibodies are an
increasingly important class of drugs and computational method for improving the affinity,
selectivity & stability of these protein- based therapeutics have also been developed.
M.Phil, Periyar University
4. 2 MAJOR TYPES:
1. Ligand – based drug design:
molecules that bind with the target.
Eg., Ritonavir-antiretro viral drug.
2. Structure – based drug design:
3D Structure of molecules.
M.Phil, Periyar University
6. Quantitative Structure Activity Relationships (QSAR)
◦ Compute functional group in compound
◦ QSAR compute every possible number
◦ Enormous curve fitting to identify drug activity
◦ chemical modifications for synthesis and testing.
M.Phil, Periyar University
7. Identify disease
Isolate protein involved in disease (2-5 years)
Find a drug effective against disease protein (2-5 years)
Preclinical testing (1-3 years) Scale-up: using animal studies, formulation;
Human clinical trails(2-10 years)
FDA approval (2-3 years)
Drug.
Aim:
The diagnosis- determine the cause of disease.
Cure- relieve of the symptoms of a disease.
Migration –action of reducing the severity of a disease.
Treatment- Medical care.
Prevention of disease.
M.Phil, Periyar University
8. Identify disease
Isolate protein
involved in
disease (2-5 years)
Find a drug effective
against disease protein
(2-5 years)
Preclinical testing
(1-3 years)
Formulation
Human clinical trials
(2-10 years)
Scale-up
FDA approval
(2-3 years)
M.Phil, Periyar University
9. Identify disease
Isolate protein
Find drug
Preclinical testing
GENOMICS, PROTEOMICS & BIOPHARM.
HIGH THROUGHPUT SCREENING
MOLECULAR MODELING
VIRTUAL SCREENING
COMBINATORIAL CHEMISTRY
IN VITRO & IN SILICO ADME MODELS
Potentially producing many more targets
and “personalized” targets
Screening up to 100,000 compounds a
day for activity against a target protein
Using a computer to
predict activity
Rapidly producing vast numbers
of compounds
Computer graphics & models help improve activity
Tissue and computer models begin to replace animal testing
M.Phil, Periyar University
10. “Gene chips” allow us to look
for changes in protein
expression for different
people with a variety of
conditions, and to see if the
presence of drugs changes
that expression
Makes possible the design of
drugs to target different
phenotypes
compounds administered
people / conditions
e.g. obese, cancer, caucasian
expression profile
(screen for 35,000 genes)
M.Phil, Periyar University
11. Screening perhaps millions of compounds in a corporate collection to
see if any show activity against a certain disease protein
M.Phil, Periyar University
12. Drug companies now have millions of samples of chemical compounds
High-throughput screening can test 100,000 compounds a day for activity
against a protein target
Maybe tens of thousands of these compounds will show some activity for
the protein
The chemist needs to intelligently select the 2 - 3 classes of compounds
that show the most promise for being drugs to follow-up
M.Phil, Periyar University
13. Machine Learning Methods
◦ E.g. Neural nets, Bayesian nets, SVMs, Kahonen nets
◦ Train with compounds of known activity
◦ Predict activity of “unknown” compounds
Scoring methods
◦ Profile compounds based on properties related to target
Fast Docking
◦ Rapidly “dock” 3D representations of molecules into 3D representations of proteins,
and score according to how well they bind
M.Phil, Periyar University
14. • 3D Visualization of interactions between compounds and proteins
• “Docking” compounds into proteins computationally
M.Phil, Periyar University
15. X-ray crystallography and NMR Spectroscopy can reveal 3D structure
of protein and bound compounds
Visualization of these “complexes” of proteins and potential drugs can
help scientists understand the mechanism of action of the drug and to
improve the design of a drug
Visualization uses computational “ball and stick” model of atoms and
bonds, as well as surfaces
Stereoscopic visualization available
M.Phil, Periyar University
16. Traditionally, animals were used for pre-human testing. However,
animal tests are expensive, time consuming and ethically undesirable
ADME (Absorbtion, Distribution, Metabolism, Excretion) techniques
help model how the drug will likely act in the body
These methods can be experemental (in vitro) using cellular tissue, or
in silico, using computational models
M.Phil, Periyar University
17. Computational methods can predict compound properties important to
ADME, e.g.
◦ LogP, a lipophilicity measure
◦ Solubility
◦ Permeability
◦ Cytochrome p450 metabolism
Means estimates can be made for millions of compouds, helping
reduce “atrittion” – the failure rate of compounds in late stage
M.Phil, Periyar University
18. Millions of entries in databases
◦ CAS : 23 million
◦ GeneBank : 5 million
Total number of drugs worldwide: 60,000
Fewer than 500 characterized molecular targets
Potential targets : 5,000-10,000
M.Phil, Periyar University
19. • SWISS-PROT: Annotated Sequence Database
• TrEMBL: Database of EMBL nucleotide translated sequences
• InterPro:Integrated resource for protein families, domains
and functional sites.
• CluSTr:Offers an automatic classification of SWISS-PROT
and TrEMBL.
• IPI: A non-redundant human proteome set constructed from
SWISS-PROT, TrEMBL, Ensembl and RefSeq.
• GOA: Provides assignments of gene products to the Gene
Ontology (GO) resource.
• Proteome Analysis: Statistical and comparative analysis of
the predicted proteomes of fully sequenced organisms
• Protein Profiles: Tables of SWISS-PROT and TrEMBL entries
and alignments for the protein families of the Protein Profile.
• IntEnz: The Integrated relational Enzyme database (IntEnz) will
contain enzyme data approved by the Nomenclature Committee.
Reference site : www.ebi.ac.uk/Databases/protein.html
M.Phil, Periyar University
20. • MSD:The Macromolecular Structure Database –
A relational database representation of clean Protein Data Bank (PDB)
3DSeq: 3D sequence alignment server- Annotation of the
alignments between sequence database and the PDB
• FSSP: Based on exhaustive all-against-all 3D structure comparison of
protein structures currently in the Protein Data Bank (PDB)
• DALI: Fold Classification based on Structure-Structure
Assignments
• 3Dee: Database of protein domain definitions where in the domains have
been clustered on sequence and structural similarity
• NDB: Nucleic Acid Structure Database
M.Phil, Periyar University