1. 1
Central University of Bihar
BIS 553: protein modelling and simulation
Denovo structure prediction
Submitted to:- Submitted by:-
Dr. Durg Vijay Singh Shweta Kumari
Roll no- 21
2 nd semester
Central University of Bihar
Patna
2. 2
CONTENT
Sl. No Topic Page No.
1 Introduction 3
2 Need of Ab inito prediction 4
3 Challenges 4
4 Principle of ab inito method 4
5 Denovo Structure Prediction V/S Template
Based Structure Prediction
4-5
6 Successful De Novo Modeling Requirements 6
7 Results from abinitio 7
8 Domain prediction 7-8
9 Advantages of This Method 8-9
10 Complexity of abinitio methods 9
11 Ab initio methods have recently received
increased attention in the prediction of loops
10
12 Protein folding and de novo protein design for
biotechnological applications
10
13 Limitations of De novo Prediction Methods 11
14 CASPs 11-12
15 Application of Denovo structure prediction 12
16 List of de novo protein structure prediction
software
12-13
17 References 14
3. 3
Introduction:
Predicting the 3D structure without any “prior
knowledge”
•Predicting protein 3D structures from the amino acid sequence still remains
as an unsolved problem after five decades of efforts. If the target protein has a
homologue already solved, the task is relatively easy and high-resolution
models can be built by copying the framework of the solved structure.
•However, such a modelling procedure does not help answer the question of
how and why a protein adopts its specific structure. If structure homologues
(occasionally analogues) do not exist, or exist but cannot be identified, models
have to be constructed from scratch. This procedure, called ab initio
modelling.
•Ab initio modelling is essential for a complete solution to the protein
structure prediction problem; it can also help us understand the
physicochemical principle of how proteins fold in nature.
Thus,” In computational biology, de novo protein structure prediction refers to an
algorithmic process by which protein tertiary structure is predicted from its
amino acid primary sequence.”
4. 4
Need of Ab inito prediction:
• First, in some cases, even a remotely related structural homologue may not be
available.
• Second, new structure continue to be discovered which could not have been
identified by methods which rely on comparison to known structure.
• Third, knowledgebased methods have been criticized for predicting protein
protein structures without having to obtain a fundamental understanding of the
mechanisms and driving forces of structure formation. Ab initio methods, in
contrast, base their predictions on physical models for these mechanisms
Challenges:
• Energy functions that can reliable discriminat e native and nonnative
structures.
• Enormous amount of computations.
Principle of ab inito method:
•It is based on the ‘thermodynamic hypothesis’, which states that the native
structure of a protein is the one for which the free energy achieves the global
minimum.
•ANFINSEN (1973) showed that all the information necessary for a protein to fold
to the native state residue in the protein sequence.
•In the absence of large kinetic barriers in the force energy landscape, Anfinsen's
result and those of large numbers of researchers in the intervening year suggest that
the native confoermations of most proteins are the lowest free energy conformation
for their sequences.
Denovo Structure Prediction V/S Template Based
Structure Prediction:
•De novo protein structure modeling is distinguished from Template-based
modeling (TBM) by the fact that no solved homolog to the protein of interest is
known, making efforts to predict protein structure from amino acid sequence
exceedingly difficult.
5. 5
Sl.
No.
Homology
modelling
Fold Reconition De novo
prediction
1 templete based
modelling
templete based remote
homology modelling
templete free
modelling
2 applicable to sequence
having >= 30% homogy
on PDB database
applicable to sequence
having <20% homogy on
PDB database
applicable to any
sequence does not
having homologue
on PDB
3 length of sequence is not
limited
greater than 150AA
not applicable
4 limited search space search space greater than
homology modelling
very large search
space
5 more accurate structure generate less accurate
structure than homology
generate least
accurate structure
6 model quality at atomic
level
model quality at fold level model quality at
atomic level
7 computationally less
expensive
computationally more
expensive than homology
modelling
computationally
most expensive
8 applicable in Drug
designing, virtual
screening, designing site
directed mutagenisis,
characterization of active
site
applicable in Prediction of
protein family, functional
characterization by folding
assignment
applicable in
genome annotation,
domain prediction
and structural
genomics initiatives
7. 7
Results from abinitio:
•Average error 5 Å -Average error 5 Å - 10 Å10 Å
•Function cannot beFunction cannot be predictedpredicted
••Long simulationsLong simulations
fig:fig: Some protein from ESome protein from E.coli.coli predicted at 7.6 Åpredicted at 7.6 Å
(CASP3, H.Scheraga)(CASP3, H.Scheraga)
Domain prediction:
•Domain prediction is a critical pre resquisite to the structur prediction “As the size of
the protein increases, its conformational space also increases.”
•Current denovo methods are limited to protein domain of 150 amino acid domain
residue for alpha-beta protein.
•80 residue for beta folds and 150 for alpha fold only.
•To overcome this two approaches can apply-
1. Increase the size range of denovo structure prediction.
2. Dividing protein into domains prior to attempting two protein structure
prediction.
•"A domain is generally define as a portion of protein that folds independently of the
rest of the protein."
•So dividing a query sequence into their smallest component domain prior to folding
is straight forward way to increase the size of the predictio.
•For many proteins domains division can be easily found while several domain
remains beyond our ability to correctly detached them.
•The determination of domain, family membership and its boundries for multidomain
protein is a vital step in structure annotation/ prediction.
•In brief, most domain protein partial methods relay on hierarchy searching for
domains in query sequence with collection of primary sequence methods, domains
library search and matches to structural domains in the PDB.
8. 8
Advantages of This Method:
•The method is fully automated, and the methodology is the same regardless of the
existing homology between the query protein and the proteins in the structural
database. Thus, it can be easily applied to the structural annotation on a genomic
scale.
•A large success rate, which is competitive with other methods (a large fraction of
correct and accurate predictions), could be expected for the following types of
proteins.
The most advanced abinitio method is fragment assembly
•Consists by breaking up the sequence in small subsegments of 3 to 9 residues and
generating structure for these segments based on a large library of known fragments.
•Decoys are generated from all possible combinations of fragments.
•An energy minimization process is applied to all decoys.
10. 10
Ab initio methods have recently received increased
attention in the prediction of loops:
•Loops exhibit greater structural variability than Beta-sheets and Alpha helices.
•Loop structure therefore is considerably more difficult to predict than the structure
of the geometrically highly regular Beta-sheets and Alpha helices.
•Loops are often exposed to the surface of proteins and contribute to active and binding sites.
Consequently, loops are crucial for protein function.
Protein folding and de novo protein design for
biotechnological applications:
Advances and challenges in the fields of protein structure prediction and de novo
protein design focusing on the interplay necessary for success. schematically shows
the roadmap and key challenges in protein structure prediction and de novo protein
design. The past few years have shown impressive applications of computational
structure prediction and design to biotechnology, spanning peptide or antibody
therapeutics, novel biocatalysts, and self-assembling nanomaterials.
Fig: Roadmap of key challenges in understanding how to predict protein sequence to structure to
function and design. Structure prediction begins with a primary amino acid sequence
11. 11
Table. Summary of recent successful computational de novo designed and
redesigned systems and their biotechnological applications
source: http://www.sciencedirect.com/science/article/pii/S0167779913002266#
Limitations of De novo Prediction Methods:
•Pure abinitio modelling is still very costly and ineffective but hybrid
homology/ab initio methods such as fragment assembly have better performance
•A major limitation of de novo protein prediction methods is the extraordinary
amount of computer time required to successfully solve for the native confirmation of
a protein.
•Distributed methods, such as Rosetta@home, have attempted to ameliorate this by
recruiting individuals who then volunteer idle home computer time in order to
process data.
•Even these methods face challenges, however. For example, a distributed method
was utilized by a team of researchers at the University of Washington and the
Howard Hughes Medical Institute to predict the tertiary structure of the protein
T0283 from its amino acid sequence. In a blind test comparing the accuracy of this
distributed technique with the experimentally confirmed structure deposited within
the Protein Databank (PDB), the predictor produced excellent agreement with the
deposited structure.
•However, the time and number of computers required for this feat was enormous –
almost two years and approximately 70,000 home computers, respectively.
“One method proposed to overcome such limitations involves the use of Markov
models (see Markov chain Monte Carlo). One possibility is that such models could
be constructed in order to assist with free energy computation and protein
structure prediction, perhaps by refining computational simulations”
CASPs:
•“Progress for all variants of computational protein structure prediction methods is
assessed in the biannual, community wide Critical Assessment of Protein Structure
Prediction (CASP) experiments.
12. 12
•To assess the current status of protein structure prediction, John Moult proposed the
CASP (Critical Assessment of Techniques for Protein Structure Prediction)
communitywide protein structure prediction experiment.
•The idea is that experimentalists who are about to determine protein structures make
the sequences of the proteins available and then the protein structure prediction
community makes predictions that are then assessed by independent reviewers.
•Attendees tested recently developed ab initio protein structure predictions methods
during the CASP3 exercises, conducted in December 1998 in Asilomar, California.
•Among the best performing ab initio methods was the Rosetta method developed by
David Baker and coworkers.
Application of Denovo structure prediction:
Genome functional annotation and structure genomics initiate two areas of
research where ab initio protein structured prediction could take important
contributions.
1. Genome annotation:
a. The annotation of open reading frames lacking detectable sequence homology to protein of
known function represents a promising applicable for ab initio model.
Low resolution ab initio predicted structure and functional relationships between proteins not
apparent from sequence similarity alone.
Note:- This concept is well illustrated by some example of prediction from CASP4.
b. Ab initio structure could be probed for the presense of residue adopting conserved geometric
motifs (eg. Serin protease catalysis traids).
2) structural genomics initiatives:
a) an initio structure prediction can help guide target selection by focussing experimental structure
determination on those proteins likely to adopt novel folds or to be of particular biological
importance
b) an initio technique do not face the limitations which comes in homology modelling applied on
genomic scale ( need for at least one homologue of known structure with good coverage).
Thus, may be a valuable adjunct to homology methods, filling in structural gaps and
producing much more complete set of model.
13. 13
List of de novo protein structure prediction
software:
Name Method Description Link
EVfold
Evolutionary couplings calculated from correlated
mutations in a protein family, used to predict 3D
structure from sequences alone and to predict
functional residues from coupling strengths. Predicts
both globular and transmembrane proteins.
Webserver
http://evfold
.org/evfold-
web/evfold.
do
QUARK Monte Carlo fragment assembly
On-line server for
protein modeling (best
for ab initio folding in
CASP9)
http://zhang
lab.ccmb.m
ed.umich.ed
u/QUARK/
NovaFold Combination of threading and ab initio folding
Commercial protein
structure prediction
application
http://www.
dnastar.com
/t-products-
NovaFold.a
spx
I-TASSER Threading fragment structure reassembly
On-line server for
protein modeling
http://zhang
lab.ccmb.m
ed.umich.ed
u/I-
TASSER/
Selvita Protein
Modeling Platform
Package of tools for protein modeling
Interactive webserver
and standalone program
including: CABS ab
initio modeling
http://www.
selvita.com/
selvita-
protein-
modeling-
platform.ht
ml
ROBETTA
Rosetta homology modeling and ab initio fragment
assembly with Ginzu domain prediction
Webserver
http://www.
robetta.org/
Rosetta@home
Distributed-computing implementation of Rosetta
algorithm
Downloadable program
http://boinc.
bakerlab.org
/rosetta/
CABS Reduced modeling tool Downloadable program
CABS-FOLD
Server for de novo modeling, can also use alternative
templates (consensus modeling).
Webserver
http://bioco
mp.chem.u
w.edu.pl/C
ABSfold/
Bhageerath
A computational protocol for modeling and
predicting protein structures at the atomic level.
Webserver
http://www.
scfbio-
iitd.res.in/b
hageerath/in
dex.jsp
Abalone Molecular Dynamics folding Program
PEP-FOLD
De novo approach, based on a HMM structural
alphabet
On-line server for
peptide structure
prediction
http://bioser
v.rpbs.univ-
paris-
diderot.fr/se
rvices/PEP-
FOLD/