Más contenido relacionado


protein Modeling Abi.pptx

  1. Secondary Structure Prediction Abida Shehezadi Centre of Excellence in Molecular Biology University of the Punjab, Lahore BIOINFORMATICS
  2. WHY PROTEIN STRUCTURE? • Function of the biological macromolecules is intricately related to their 3-D shape and structure • Structural knowledge is therefore an important step to understand the function • Structures better conserved than sequences • Designing site-directed mutants to test hypotheses about function • Identification of active/binding site • Modeling substrate specificity • Protein-protein Docking simulations
  3. WHY PROTEIN STRUCTURE? • Protein Engineering • Drug Designing • Identifying structure-function relationship of proteins
  4. HOW TO FIND STRUCTURE? • Experimental Procedures – X-ray Crystallography – NMR Spectroscopy – Cryo-EM • Prediction Methods
  5. X-Ray Crystallography • X-ray crystallography is the science of determining the arrangement of atoms within a crystal, crystal acts as 3-D grating and produce diffraction when a beam of X-rays is passes through it. The diffraction pattern contain the complete information of the placement of electrons in atoms. • By Fourier transformation of the diffraction pattern, we can obtain the structure of the molecule in the crystal. • The method also produces the 3-D picture of the density of electrons within the crystal, from which the mean atomic positions, their chemical bonds, their disorder and several other information can be derived. • A wide variety of materials can form crystals — such as salts, metals, minerals, semiconductors, as well as various inorganic, organic and biological molecules — which has made X-ray crystallography fundamental to many scientific fields.
  6. X-Ray Crystallography • Slow, resource intensive process • Pure and homogeneous Protein • Screening of crystallization conditions • Protein must be able to crystallize • Non-aqueous • Crystal packing – may deform structure of few proteins
  7. NMR Spectroscopy • NMR spectroscopy is used to obtain information about the structure and dynamics of proteins. • Protein nuclear magnetic resonance spectroscopy – protein NMR techniques are continually being used and improved in both academia and the biotech industry. • Structure determination by NMR spectroscopy usually consists of several phases, each using a separate set of highly specialized techniques. – sample preparation, – resonances assignment, – restraints generation and – a structure calculation and validation.
  8. Principles of NMR • Measures nuclear magnetism or changes in nuclear magnetism in a molecule • NMR spectroscopy measures the absorption of light due to changes in nuclear spin orientation • NMR only occurs when a sample is in a strong magnetic field • Different nuclei absorb light at different energies
  9. NMR • Crystal is not required • Protein samples are in aqueous media • Size of Protein is limited (20-30 kDa) • Protein must be soluble in high concentrations (30mg/ml)
  10. Cryo-Electron Microscopy • Cryo-Electron microscopy – cryo-EM is a form of electron microscopy (EM) where the sample is studied at cryogenic temperatures (generally liquid nitrogen temperatures). • Cryo-EM is developing popularity in structural biology. • A version of cryo-EM is cryo-electron tomography (CET) where a 3D reconstruction of a sample is created from tilted 2D images, again at cryogenic temperatures (either liquid nitrogen or helium).
  11. Cryo-Electron Microscopy • Frozen Hydrated samples used • Electron beam used to create an image • Proteins components as C,N,H,O could be studied. These give very low absorption hence image contrast is very low
  12. Prediction Methods Why Attempt? • A good guess is better than nothing! – Enables the design of experiments – Does not need material – Complementary to Crystallography/NMR/Cryo-EM – Pretty high accuracy • Crystallography/NMR/Cryo-EM don’t work always! – Many important proteins do not crystallize – Size limitations with NMR – Many important proteins have atoms other than C, N, H, O
  13. Prediction of Protein Structure • Sequence dictates structure • ideally, we should be capable of structure determination by using computer simulation programs that mimic the process of protein folding… BUT
  14. Prediction of Protein Structure • Protein folding problem is not solved yet • Folding occurs very rapidly with several intermediate states which are unstable • Structure determination methods fail to capture these unstable states
  15. What determines fold? Anfinsen’s experiments in 1957 demonstrated that proteins can fold spontaneously into their native conformations under physiological conditions. This implies that primary structure does indeed determine folding or 3-D structure.
  16. Other factors • Physical properties of protein that influence stability & therefore, determine its fold: – Rigidity of backbone – Amino acid interaction with water – Interactions among amino acids • Electrostatic interactions • Hydrogen, disulphide bonds
  17. Structure Prediction Methods • Secondary Structure Prediction • Tertiary Structure Prediction – Ab-initio prediction – Fold recognition – Homology modeling
  18. Why predict secondary structure? • Prediction of secondary structure is a step towards 3-D structure prediction (Ab-initio method) • Can be used in threading methods to identify distinctly related proteins • Provides information about class, architecture and therefore can provide clues to mine further aspects of structure and function
  19. Secondary Structure Prediction Methods • Single Sequence based Procedure – Statistical Methods (e.g. Chou-Fasman, GOR) • Multiple Sequence based procedure – Neural Network Approach (e.g. PHD)
  20. Chou-Fasman Method Biochemistry, 13:222-245, 1974 • Chou-Fasman method are an empirical technique to predict the secondary structures of proteins, originally developed in the 1970s. • The method is based on analyses of the relative frequencies of each amino acid in α-helices, β-sheets, and turns based on known protein structures solved with X-ray crystallography • Based on analyzing frequency of amino acids in different secondary structures – A, E, L, and M: α-helix former – P and G: helix breaker
  21. …continued • Table of predictive values created for α-helices, β-sheets, and loops • Structure with greatest overall prediction value greater than 1 used to determine the structure • The method is at most about 50-60% accurate in identifying correct secondary structures
  22. GOR Method • GOR method (Garnier-Osguthorpe-Robson) is an information theory-based method for the prediction of secondary structures in proteins, developed in late 1970's shortly after the Chou-Fasman method • Like Chou-Fasman, GOR method is based on probability parameters derived from empirical studies of known protein tertiary structures solved by X-ray crystallography • However, unlike Chou-Fasman, GOR method takes into account not only the tendency of individual amino acids to form particular secondary structures, but also the conditional probability of the amino acid to form a secondary structure given that its immediate neighbors have already formed that structure
  23. What are neural networks? • Artificial neural network (ANN) is a mathematical model or computational model based on biological neural networks. • It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. • In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. • In more practical terms neural networks are non-linear statistical data modeling tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data. • Parallel, distributed information processing structures which draw their ultimate inspiration from neurons in the brain • Main class = feed-forward network alias multi-layer perceptron • Paradigm for tackling pattern classification and regression tasks
  24. …continued • Neural network methods use training sets of solved structures to identify common sequence motifs associated with particular arrangements of secondary structures. • These methods are over 70% accurate in their predictions, although β-strands are still often under predicted due to the lack of 3-D structural information that would allow assessment of hydrogen bonding patterns that can promote formation of the extended conformation required for the presence of a complete β-sheet. • Support vector machines have proven particularly useful for predicting the locations of turns, which are difficult to identify with statistical methods • The requirement of relatively small training sets has also been cited as an advantage to avoid over-fitting to existing structural data
  25. Neural Network Models • Machine learning approach • Provides training sets of structures (α-helices, non α-helices) • Computers are trained to recognize the patterns in known secondary structures
  26. …continued • First successful implementation of neural network is secondary structure predictions is by Rost and Sander (1993) – PHD • PHD system uses a combination MSA and Neural network • When a protein is input, PHD finds all the homologues and finds residue allowances at every position using a MSA and feeds that information into a series of NNs • The design of the system was guided by the following observations: – MSA is useful (regular SSs are mostly structurally conserved) – In predicting what is happening at residues, it is useful to consider a local window around it – Helices and sheets occur in runs (you do not see αβαβ typically you expect to see at least 4 α-helical residues in a row to form an α-helix
  27. Some interesting facts • Accuracy 55% – 85% • Higher accuracy for α-helices than β-strands • Accuracy is dependent on protein families • Prediction of engineered proteins are less accurate
  28. Tertiary Structure Prediction • Ab-initio Method • Threading or Fold recognition Methods • Homology Modeling
  29. Ab-Initio Prediction The assumption: Native structure is at global energy minimum • Predicting the 3D structure of a protein without any “prior knowledge” • Used when homology modeling or fold recognition have failed (no homologues are evident) • Equivalent to solving the “Protein Folding Problem”
  30. Ab-Initio Prediction The algorithm: 1. Reasonably generate all conformations by applying force-fields 2. Score with an appropriate scoring function to find global energy minimum 3. Choose the one with best score
  31. Ab-initio Method • Not always possible • Resource intensive • Need of improved, simplified procedure • Still an ongoing research problem, but becoming less essential as databases grow