WHY PROTEIN STRUCTURE?
• Function of the biological macromolecules is intricately related
to their 3-D shape and structure
• Structural knowledge is therefore an important step to
understand the function
• Structures better conserved than sequences
• Designing site-directed mutants to test hypotheses about
function
• Identification of active/binding site
• Modeling substrate specificity
• Protein-protein Docking simulations
WHY PROTEIN STRUCTURE?
• Protein Engineering
• Drug Designing
• Identifying structure-function relationship of
proteins
HOW TO FIND STRUCTURE?
• Experimental Procedures
– X-ray Crystallography
– NMR Spectroscopy
– Cryo-EM
• Prediction Methods
X-Ray Crystallography
• X-ray crystallography is the science of determining the arrangement of
atoms within a crystal, crystal acts as 3-D grating and produce
diffraction when a beam of X-rays is passes through it. The diffraction
pattern contain the complete information of the placement of electrons
in atoms.
• By Fourier transformation of the diffraction pattern, we can obtain the
structure of the molecule in the crystal.
• The method also produces the 3-D picture of the density of electrons
within the crystal, from which the mean atomic positions, their chemical
bonds, their disorder and several other information can be derived.
• A wide variety of materials can form crystals — such as salts, metals,
minerals, semiconductors, as well as various inorganic, organic and
biological molecules — which has made X-ray crystallography
fundamental to many scientific fields.
X-Ray Crystallography
• Slow, resource intensive process
• Pure and homogeneous Protein
• Screening of crystallization conditions
• Protein must be able to crystallize
• Non-aqueous
• Crystal packing – may deform structure of few proteins
NMR Spectroscopy
• NMR spectroscopy is used to obtain information about the
structure and dynamics of proteins.
• Protein nuclear magnetic resonance spectroscopy – protein
NMR techniques are continually being used and improved in
both academia and the biotech industry.
• Structure determination by NMR spectroscopy usually consists
of several phases, each using a separate set of highly
specialized techniques.
– sample preparation,
– resonances assignment,
– restraints generation and
– a structure calculation and validation.
Principles of NMR
• Measures nuclear magnetism or changes in nuclear magnetism
in a molecule
• NMR spectroscopy measures the absorption of light due to
changes in nuclear spin orientation
• NMR only occurs when a sample is in a strong magnetic field
• Different nuclei absorb light at different energies
NMR
• Crystal is not required
• Protein samples are in aqueous media
• Size of Protein is limited (20-30 kDa)
• Protein must be soluble in high concentrations
(30mg/ml)
Cryo-Electron Microscopy
• Cryo-Electron microscopy – cryo-EM is a form of electron
microscopy (EM) where the sample is studied at cryogenic
temperatures (generally liquid nitrogen temperatures).
• Cryo-EM is developing popularity in structural biology.
• A version of cryo-EM is cryo-electron tomography (CET) where
a 3D reconstruction of a sample is created from tilted 2D images,
again at cryogenic temperatures (either liquid nitrogen or helium).
Cryo-Electron Microscopy
• Frozen Hydrated samples used
• Electron beam used to create an image
• Proteins components as C,N,H,O could be studied.
These give very low absorption hence image contrast
is very low
Prediction Methods
Why Attempt?
• A good guess is better than nothing!
– Enables the design of experiments
– Does not need material
– Complementary to Crystallography/NMR/Cryo-EM
– Pretty high accuracy
• Crystallography/NMR/Cryo-EM don’t work always!
– Many important proteins do not crystallize
– Size limitations with NMR
– Many important proteins have atoms other than C, N, H, O
Prediction of Protein Structure
• Sequence dictates structure
• ideally, we should be capable of structure
determination by using computer simulation
programs that mimic the process of protein
folding…
BUT
Prediction of Protein Structure
• Protein folding problem is not solved yet
• Folding occurs very rapidly with several
intermediate states which are unstable
• Structure determination methods fail to
capture these unstable states
What determines fold?
Anfinsen’s experiments in 1957
demonstrated that proteins can
fold spontaneously into their
native conformations under
physiological conditions. This
implies that primary structure
does indeed determine folding
or 3-D structure.
Other factors
• Physical properties of protein that influence
stability & therefore, determine its fold:
– Rigidity of backbone
– Amino acid interaction with water
– Interactions among amino acids
• Electrostatic interactions
• Hydrogen, disulphide bonds
Why predict secondary structure?
• Prediction of secondary structure is a step towards 3-D
structure prediction (Ab-initio method)
• Can be used in threading methods to identify distinctly
related proteins
• Provides information about class, architecture and
therefore can provide clues to mine further aspects of
structure and function
Secondary Structure Prediction
Methods
• Single Sequence based Procedure
– Statistical Methods (e.g. Chou-Fasman, GOR)
• Multiple Sequence based procedure
– Neural Network Approach (e.g. PHD)
Chou-Fasman Method
Biochemistry, 13:222-245, 1974
• Chou-Fasman method are an empirical technique to predict the
secondary structures of proteins, originally developed in the
1970s.
• The method is based on analyses of the relative frequencies of each
amino acid in α-helices, β-sheets, and turns based on known
protein structures solved with X-ray crystallography
• Based on analyzing frequency of amino acids in different
secondary structures
– A, E, L, and M: α-helix former
– P and G: helix breaker
…continued
• Table of predictive values created for α-helices, β-sheets, and
loops
• Structure with greatest overall prediction value greater than 1
used to determine the structure
• The method is at most about 50-60% accurate in identifying
correct secondary structures
GOR Method
• GOR method (Garnier-Osguthorpe-Robson) is an information
theory-based method for the prediction of secondary structures in
proteins, developed in late 1970's shortly after the Chou-Fasman
method
• Like Chou-Fasman, GOR method is based on probability
parameters derived from empirical studies of known protein
tertiary structures solved by X-ray crystallography
• However, unlike Chou-Fasman, GOR method takes into account
not only the tendency of individual amino acids to form particular
secondary structures, but also the conditional probability of the
amino acid to form a secondary structure given that its immediate
neighbors have already formed that structure
What are neural networks?
• Artificial neural network (ANN) is a mathematical model or computational
model based on biological neural networks.
• It consists of an interconnected group of artificial neurons and processes
information using a connectionist approach to computation.
• In most cases an ANN is an adaptive system that changes its structure based on
external or internal information that flows through the network during the
learning phase.
• In more practical terms neural networks are non-linear statistical data modeling
tools. They can be used to model complex relationships between inputs and
outputs or to find patterns in data.
• Parallel, distributed information processing structures which draw their ultimate
inspiration from neurons in the brain
• Main class = feed-forward network alias multi-layer perceptron
• Paradigm for tackling pattern classification and regression tasks
…continued
• Neural network methods use training sets of solved structures to
identify common sequence motifs associated with particular
arrangements of secondary structures.
• These methods are over 70% accurate in their predictions, although
β-strands are still often under predicted due to the lack of 3-D
structural information that would allow assessment of hydrogen
bonding patterns that can promote formation of the extended
conformation required for the presence of a complete β-sheet.
• Support vector machines have proven particularly useful for
predicting the locations of turns, which are difficult to identify with
statistical methods
• The requirement of relatively small training sets has also been cited
as an advantage to avoid over-fitting to existing structural data
Neural Network Models
• Machine learning approach
• Provides training sets of structures (α-helices,
non α-helices)
• Computers are trained to recognize the patterns
in known secondary structures
…continued
• First successful implementation of neural network is secondary
structure predictions is by Rost and Sander (1993) – PHD
• PHD system uses a combination MSA and Neural network
• When a protein is input, PHD finds all the homologues and finds
residue allowances at every position using a MSA and feeds that
information into a series of NNs
• The design of the system was guided by the following observations:
– MSA is useful (regular SSs are mostly structurally conserved)
– In predicting what is happening at residues, it is useful to consider a
local window around it
– Helices and sheets occur in runs (you do not see αβαβ typically you
expect to see at least 4 α-helical residues in a row to form an α-helix
Some interesting facts
• Accuracy 55% – 85%
• Higher accuracy for α-helices than β-strands
• Accuracy is dependent on protein families
• Prediction of engineered proteins are less accurate
Ab-Initio Prediction
The assumption:
Native structure is at global energy minimum
• Predicting the 3D structure of a protein without any “prior
knowledge”
• Used when homology modeling or fold recognition have
failed (no homologues are evident)
• Equivalent to solving the “Protein Folding Problem”
Ab-Initio Prediction
The algorithm:
1. Reasonably generate all conformations by applying
force-fields
2. Score with an appropriate scoring function to find global
energy minimum
3. Choose the one with best score
Ab-initio Method
• Not always possible
• Resource intensive
• Need of improved, simplified procedure
• Still an ongoing research problem, but
becoming less essential as databases grow