SlideShare ist ein Scribd-Unternehmen logo
1 von 80
FBW
19-11-2013

Wim Van Criekinge
The reason for “bioinformatics” to exist ?

• empirical finding: if two biological
sequences are sufficiently similar, almost
invariably they have similar biological
functions and will be descended from a
common ancestor.
• (i) function is encoded into
sequence, this means: the sequence
provides the syntax and
• (ii) there is a redundancy in the
encoding, many positions in the
sequence may be changed without
perceptible changes in the function, thus
the semantics of the encoding is robust.
Protein Structure

Introduction
Why ?
How do proteins fold ?
Levels of protein structure
0,1,2,3,4
X-ray / NMR
The Protein Database (PDB)
Protein Modeling
Bioinformatics & Proteomics
Weblems
Why protein structure ?

• Proteins perform a variety of cellular
tasks in the living cells
• Each protein adopts a particular folding
that determines its function
• The 3D structure of a protein can bring
into close proximity residues that are far
apart in the amino acid sequence
• Catalytic site: Business End of the
molecule
Rationale for understanding protein structure and function
Protein sequence
-large numbers of
sequences, including
whole genomes

?
Protein function
- rational drug design and treatment of disease
- protein and genetic engineering
- build networks to model cellular pathways
- study organismal function and evolution

structure determination
structure prediction

Protein structure
- three dimensional
- complicated
- mediates function

homology
rational mutagenesis
biochemical analysis
model studies
About the use of protein models (Peitch)

• Structure is preserved under evolution when
sequence is not
– Interpreting the impact of mutations/SNPs and conserved
residues on protein function. Potential link to disease
• Function ?
– Biochemical: the chemical interactions occerring in a protein
– Biological: role within the cell
– Phenotypic: the role in the organism

• Gene Ontology functional classification !

– Priorisation of residues to mutate to determine protein
function
– Providing hints for protein function:Catalytic mechanisms
of enzymes often require key residues to be close
together in 3D space
– (protein-ligand complexes, rational drug design, putative
interaction interfaces)
MIS-SENSE MUTATION
e.g. Sickle Cell Anaemia
Cause: defective haemoglobin due to mutation in βglobin gene
Symptoms: severe anaemia and death in homozygote
Normal β-globin - 146 amino acids
val - his - leu - thr - pro - glu - glu - --------1

2

3

4

Normal gene (aa 6)
DNA
CTC
mRNA
GAG
Product Glu

5

6

7

Mutant gene
CAC
GUG
Valine

Mutant β-globin
val - his - leu - thr - pro - val - glu - ---------
Protein Conformation

• Christian Anfinsen
Studies on reversible denaturation
“Sequence specifies conformation”
• Chaperones and disulfide
interchange enzymes:
involved but not controlling final state, they
provide environment to refold if misfolded

• Structure implies function: The amino
acid sequence encodes the protein’s
structural information
How does a protein fold ?

• by itself:
– Anfinsen had developed what he called his
"thermodynamic hypothesis" of protein folding to explain
the native conformation of amino acid structures. He
theorized that the native or natural conformation occurs
because this particular shape is thermodynamically the
most stable in the intracellular environment. That is, it
takes this shape as a result of the constraints of the
peptide bonds as modified by the other chemical and
physical properties of the amino acids.
– To test this hypothesis, Anfinsen unfolded the RNase
enzyme under extreme chemical conditions and observed
that the enzyme's amino acid structure refolded
spontaneously back into its original form when he returned
the chemical environment to natural cellular conditions.
– "The native conformation is determined by the totality of
interatomic interactions and hence by the amino acid
sequence, in a given environment."
Protein Structure

Introduction
Why ?
How do proteins fold ?
Levels of protein structure
0,1,2,3,4
X-ray / NMR
The Protein Database (PDB)
Protein Modeling
Bioinformatics & Proteomics
Weblems
The Basics

• Proteins are linear heteropolymers: one or more
polypeptide chains
• Below about 40 residues the term peptide is frequently
used.
• A certain number of residues is necessary to perform a
particular biochemical function, and around 40-50
residues appears to be the lower limit for a functional
domain size.
• Protein sizes range from this lower limit to several
hundred residues in multi-functional proteins.
• Three-dimentional shapes (folds) adopted vary
enormously
• Experimental methods:
–
–
–
–

X-ray crystallography
NMR (nuclear magnetic resonance)
Electron microscopy
Ab initio calculations …
Levels of protein structure

• Zeroth: amino acid composition
(proteomics, %cysteine, %glycine)
Amino Acid Residues

The basic structure of an a-amino acid is quite simple. R denotes any one of the
20 possible side chains (see table below). We notice that the Ca-atom has 4
different ligands (the H is omitted in the drawing) and is thus chiral. An easy
trick to remember the correct L-form is the CORN-rule: when the Ca-atom is
viewed with the H in front, the residues read "CO-R-N" in a clockwise
direction.
Amino Acid Residues
Amino Acid Residues
Amino Acid Residues
Amino Acid Residues
Levels of protein structure

• Primary: This is simply the order of
covalent linkages along the
polypeptide chain, I.e. the sequence
itself
Backbone Torsion Angles
Backbone Torsion Angles
Levels of protein structure

• Secondary
– Local organization of the protein backbone: alphahelix, Beta-strand (which assemble into Betasheets) turn and interconnecting loop.
Ramachandran / Phi-Psi Plot
The alpha-helix
A Practical Approach: Interpretation

• Residues with hydrophobic properties
conserved at i, i+2, i+4 separated by
unconserved or hydrophilic residues
suggest surface beta- strands.
A short run of hydrophobic amino acids
(4 residues) suggests a buried betastrand.
Pairs of conserved hydrophobic amino
acids separated by pairs of
unconserved, or hydrophilic residues
suggests an alfa-helix with one face
packing in the protein core.
Likewise, an i, i+3, i+4, i+7 pattern of
conserved hydrophobic residues.
Beta-sheets
Topologies of Beta-sheets
Secondary structure prediction ?
Secondary structure prediction:CHOU-FASMAN

• Chou, P.Y. and Fasman, G.D. (1974).
Conformational parameters for amino acids in helical, sheet, and random coil regions calculated from proteins.
Biochemistry 13, 211-221.
• Chou, P.Y. and Fasman, G.D. (1974).
Prediction of protein conformation.
Biochemistry 13, 222-245.
Secondary structure prediction:CHOU-FASMAN

•Method
•Assigning a set of prediction values to a
residue, based on statistic analysis of 15
proteins
• Applying a simple algorithm to those
numbers
Secondary structure prediction:CHOU-FASMAN

Calculation of preference parameters
For each of the 20 residues and each secondary structure ( helix, -sheet and -turn):
observed counts
• P = Log --------------------- + 1.0
expected counts
• Preference parameter > 1.0  specific residue has a
preference for the specific secondary structure.
• Preference parameter = 1.0  specific residue does not
have a preference for, nor dislikes the specific secondary
structure.
• Preference parameter < 1.0  specific residue dislikes the
specific secondary structure.
Secondary structure prediction:CHOU-FASMAN

Preference parameters
Residue

P(a)

P(b)

P(t)

f(i)

f(i+1)

f(i+2)

f(i+3)

Ala

1.45

0.97

0.57

0.049

0.049

0.034

0.029

Arg

0.79

0.90

1.00

0.051

0.127

0.025

0.101

Asn

0.73

0.65

1.68

0.101

0.086

0.216

0.065

Asp

0.98

0.80

1.26

0.137

0.088

0.069

0.059

Cys

0.77

1.30

1.17

0.089

0.022

0.111

0.089

Gln

1.17

1.23

0.56

0.050

0.089

0.030

0.089

Glu

1.53

0.26

0.44

0.011

0.032

0.053

0.021

Gly

0.53

0.81

1.68

0.104

0.090

0.158

0.113

His

1.24

0.71

0.69

0.083

0.050

0.033

0.033

Ile

1.00

1.60

0.58

0.068

0.034

0.017

0.051

Leu

1.34

1.22

0.53

0.038

0.019

0.032

0.051

Lys

1.07

0.74

1.01

0.060

0.080

0.067

0.073

Met

1.20

1.67

0.67

0.070

0.070

0.036

0.070

Phe

1.12

1.28

0.71

0.031

0.047

0.063

0.063

Pro

0.59

0.62

1.54

0.074

0.272

0.012

0.062

Ser

0.79

0.72

1.56

0.100

0.095

0.095

0.104

Thr

0.82

1.20

1.00

0.062

0.093

0.056

0.068

Trp

1.14

1.19

1.11

0.045

0.000

0.045

0.205

Tyr

0.61

1.29

1.25

0.136

0.025

0.110

0.102

Val

1.14

1.65

0.30

0.023

0.029

0.011

0.029
Secondary structure prediction:CHOU-FASMAN

Applying algorithm
1.
2.

3.
4.

5.
6.

Assign parameters to residue.
Identify regions where 4 out of 6 residues have P(a)>100: -helix. Extend
helix in both directions until four contiguous residues have an average
P(a)<100: end of -helix. If segment is longer than 5 residues and P(a)>P(b):
-helix.
Repeat this procedure to locate all of the helical regions.
Identify regions where 3 out of 5 residues have P(b)>100: -sheet. Extend
sheet in both directions until four contiguous residues have an average
P(b)<100: end of -sheet. If P(b)>105 and P(b)>P(a): -helix.
Rest: P(a)>P(b)  -helix. P(b)>P(a)  -sheet.
To identify a bend at residue number i, calculate the following value:
p(t) = f(i)f(i+1)f(i+2)f(i+3)
If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide; and (3)
averages for tetrapeptide obey P(a)<P(t)>P(b): -turn.
Secondary structure prediction:CHOU-FASMAN

Successful method?
19 proteins evaluated:
• Successful in locating 88% of helical and 95% of
regions
• Correctly predicting 80% of helical and 86% of sheet residues
• Accuracy of predicting the three conformational
states for all residues, helix, b, and coil, is 77%
Chou & Fasman:successful method
After 1974:improvement of preference parameters
Sander-Schneider: Evolution of overall structure

• Naturally occurring sequences with more than
20% sequence identity over 80 or more
residues always adopt the same basic
structure (Sander and Schneider 1991)
Sander-Schneider

• HSSP: homology derived secondary structure
Structural Family Databases

• SCOP:
– Structural Classification of
Proteins

• FSSP:
– Family of Structurally Similar
Proteins

• CATH:
– Class, Architecture, Topology,
Homology
Levels of protein structure

• Tertiary
– Packing of secondary structure
elements into a compact spatial unit
– Fold or domain – this is the level to
which structure is currently possible
Domains
Protein Architecture
Domains

• Protein Dissection into domain
• Conserved Domain Architecture
Retrieval Tool (CDART) uses
information in Pfam and SMART to
assign domains along a sequence
• (automatic when blasting)
Domains

• From the analysis of alignment of protein
families
• Conserved sequence features, usually
associate with a specific function
• PROSITE database for protein
“signature” protein (large amount of FP &
FN)
• From aligment of homologous sequences
(PRINTS/PRODOM)
• From Hidden Markov Models (PFAM)
• Meta approach: INTERPRO
Protein Architecture
Levels of protein structure: Topology
Hydrophobicity Plot
P53_HUMAN (P04637) human cellular tumor antigen p53
Kyte-Doolittle hydrophilicty, window=19
The ‘positive inside’ rule
(EMBO J. 5:3021; EJB 174:671,205:1207; FEBS lett. 282:41)

Bacterial IM
In: 16% KR out: 4% KR
Eukaryotic PM
In: 17% KR out: 7% KR
Thylakoid membrane
In: 13% KR out: 5% KR
Mitochondrial IM
In: 10% KR out: 3% KR
GPCR Topology

• Membrane-bound receptors

• Transducing messages as photons, organic odorants,
nucleotides, nucleosides, peptides, lipids and proteins.
• 6 different families
• A very large number of different domains both to
bind their ligand and to activate G proteins.
• Pharmaceutically the most important class
• Challenge: Methods to find novel GCPRs in human genome
…
GPCR Topology
GPCR Topology

GPCR Structure

• Seven transmembrane regions
• Hydrophobic/ hydrophilic domains
• Conserved residues and motifs (i.e. NPXXY)
GPCR Topology

Eg. Plot conserverd residues (or multiple alignement: MSA to SSA)
Levels of protein structure

• Difficult to predict
• Functional units:
Apoptosome, proteasome
Protein Structure

Introduction
Why ?
How do proteins fold ?
Levels of protein structure
0,1,2,3,4
X-ray / NMR
The Protein Database (PDB)
Protein Modeling
Bioinformatics & Proteomics
Weblems
What is X-ray Crystallography

• X-ray crystallography is an experimental
technique that exploits the fact that X-rays are
diffracted by crystals.
• X-rays have the proper wavelength (in the
Ångström range, ~10-8 cm) to be scattered by
the electron cloud of an atom of comparable
size.
• Based on the diffraction pattern obtained from
X-ray scattering off the periodic assembly of
molecules or atoms in the crystal, the electron
density can be reconstructed.
• A model is then progressively built into the
experimental electron density, refined against
the data and the result is a quite accurate
molecular structure.
NMR or Crystallography ?

• NMR uses protein in solution
– Can look at the dynamic properties of the protein structure
– Can look at the interactions between the protein and
ligands, substrates or other proteins
– Can look at protein folding
– Sample is not damaged in any way
– The maximum size of a protein for NMR structure determination is ~30
kDa.This elliminates ~50% of all proteins
– High solubility is a requirement

• X-ray crystallography uses protein crystals
–
–
–
–
–
–

No size limit: As long as you can crystallise it
Solubility requirement is less stringent
Simple definition of resolution
Direct calculation from data to electron density and back again
Crystallisation is the process bottleneck, Binary (all or nothing)
Phase problem Relies on heavy atom soaks or SeMet incorporation

• Both techniques require large amounts of pure protein and require
expensive equipment!
Protein Structure

Introduction
Why ?
How do proteins fold ?
Levels of protein structure
0,1,2,3,4
X-ray / NMR
The Protein Database (PDB)
Protein Modeling
Bioinformatics & Proteomics
Weblems
PDB
PDB
PDB
PDB
Visualizing Structures

Cn3D versie 4.0 (NCBI)
Visualizing Structures

Ball: Van der Waals radius
Stick: length joins center

N, blue/O, red/S, yellow/C, gray (green)
Visualizing Structures

From N to C
Visualizing Structures

• Demonstration of Protein explorer
• PDB, install Chime
• Search helicase (select structure where
DNA is present)
• Stop spinning, hide water molecules
• Show basic residues, interact with
negatively charged backbone
• RASMOL / Cn3D
Protein Structure

Introduction
Why ?
How do proteins fold ?
Levels of protein structure
0,1,2,3,4
X-ray / NMR
The Protein Database (PDB)
Protein Modeling
Bioinformatics & Proteomics
Weblems
Modeling
Protein Stucture
Molecular Modeling:
building a 3D protein structure
from its sequence
Modeling

• Finding a structural homologue
• Blast
–versus PDB database or PSIblast (E<0.005)
–Domain coverage at least 60%

• Avoid Gaps
–Choose for few gaps and
reasonable similarity scores
instead of lots of gaps and high
similarity scores
Modeling
• Extract “template” sequences and align with query
•
•

Whatch out for missing data (PDB file) and complement with additonal
templates
Try to get as much information as possible, X/NMR

•

Sequence alignment from structure comparson of templates (SSA) can be
different from a simple sequence aligment

•
•

>40% identity, any aligment method is OK
<40%, checks are essential
–
–
–
–

•

Residue conservation checks in functional regions (patterns/motifs)
Indels: combine gaps separted by few resides
Manual editing: Move gaps from secondary elements to loops
Within loops, move gaps to loop ends, i.e. turnaround point of backbone

Align templates structurally, extract the corresponding SSA or QTA
(Query/template alignment)
Modeling

Input for model building
• Query sequence (the one you want the 3D
model for)
• Template sequences and structures
• Query/Template(s) (structure) sequence
aligment
Modeling

• Methods (details on these see paper):
– WHATIF,
– SWISS-MODEL,
– MODELLER,
– ICM,
– 3D-JIGSAW,
– CPH-models,
– SDC1
Modeling

• Model evaluation (How good is the prediction,
how much can the algorithm rely/extract on
the provided templates)
– PROCHECK
– WHATIF
– ERRAT

• CASP (Critical Assessment of Structure
Prediction)
– Beste method is manual alignment editing !
Comparative modelling at CASP
BC
alignment
side chain
short loops
longer loops

CASP1

CASP2

CASP3

CASP4

excellent
~ 80%
1.0 Å
2.0 Å

poor
~ 50%
~ 3.0 Å
> 5.0 Å

fair
~ 75%
~ 1.0 Å
~ 3.0 Å

fair
~75%
~ 1.0 Å
~ 2.5 Å

fair
~75%
~ 1.0 Å
~ 2.0 Å

CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity
**T128/sodm – 1.0 Å (198 residues; 50%)

**T111/eno – 1.7 Å (430 residues; 51%)

**T122/trpa – 2.9 Å (241 residues; 33%)

**T125/sp18 – 4.4 Å (137 residues; 24%)

**T112/dhso – 4.9 Å (348 residues; 24%)

**T92/yeco – 5.6 Å (104 residues; 12%)
Protein Engineering / Protein Design

Weitere ähnliche Inhalte

Was ist angesagt?

threading and homology modelling methods
threading and homology modelling methodsthreading and homology modelling methods
threading and homology modelling methodsmohammed muzammil
 
Presentation1
Presentation1Presentation1
Presentation1firesea
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_predictionShwetA Kumari
 
Drug design and discovery
Drug design and discoveryDrug design and discovery
Drug design and discoveryShikha Popali
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methodsratanvishwas
 
Motif & Domain
Motif & DomainMotif & Domain
Motif & DomainAnik Banik
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure predictionSamvartika Majumdar
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonNatalio Krasnogor
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYShikha Popali
 
BEL110 presentation
BEL110 presentationBEL110 presentation
BEL110 presentationvariable_orr
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionArindam Ghosh
 

Was ist angesagt? (20)

threading and homology modelling methods
threading and homology modelling methodsthreading and homology modelling methods
threading and homology modelling methods
 
Presentation1
Presentation1Presentation1
Presentation1
 
De novo str_prediction
De novo str_predictionDe novo str_prediction
De novo str_prediction
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Drug design and discovery
Drug design and discoveryDrug design and discovery
Drug design and discovery
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Homology modelling
Homology modellingHomology modelling
Homology modelling
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
demonstration lecture on Homology modeling
demonstration lecture on Homology modelingdemonstration lecture on Homology modeling
demonstration lecture on Homology modeling
 
Motif & Domain
Motif & DomainMotif & Domain
Motif & Domain
 
Protein Threading
Protein ThreadingProtein Threading
Protein Threading
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Protein Structure Alignment and Comparison
Protein Structure Alignment and ComparisonProtein Structure Alignment and Comparison
Protein Structure Alignment and Comparison
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
HOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAYHOMOLOGY MODELING IN EASIER WAY
HOMOLOGY MODELING IN EASIER WAY
 
Protein modeling
Protein modelingProtein modeling
Protein modeling
 
BEL110 presentation
BEL110 presentationBEL110 presentation
BEL110 presentation
 
Protein Predictinon
Protein PredictinonProtein Predictinon
Protein Predictinon
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 

Ähnlich wie Bioinformatics t7-protein structure-v2013_wim_vancriekinge

Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Prof. Wim Van Criekinge
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekingeProf. Wim Van Criekinge
 
2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekingeProf. Wim Van Criekinge
 
Protein Structural predection
Protein Structural predectionProtein Structural predection
Protein Structural predectionSantu Chall
 
Cross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyCross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyChris Mungall
 
Proteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsProteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsLionel Wolberger
 
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteinslehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteinskrupal parmar
 
PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...Valerie Wood
 
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and ProteinsLearning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and ProteinsTauqeer Ahmad
 
Amino acids and proteins
Amino acids and proteinsAmino acids and proteins
Amino acids and proteinsudhay roopavath
 

Ähnlich wie Bioinformatics t7-protein structure-v2013_wim_vancriekinge (20)

Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
Bioinformatica t7-protein structure
Bioinformatica t7-protein structureBioinformatica t7-protein structure
Bioinformatica t7-protein structure
 
2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge2016 bioinformatics i_proteins_wim_vancriekinge
2016 bioinformatics i_proteins_wim_vancriekinge
 
2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge2015 bioinformatics protein_structure_wimvancriekinge
2015 bioinformatics protein_structure_wimvancriekinge
 
Bioinformatica 01-12-2011-t7-protein
Bioinformatica 01-12-2011-t7-proteinBioinformatica 01-12-2011-t7-protein
Bioinformatica 01-12-2011-t7-protein
 
Part I : Introduction to Protein Structure
Part I : Introduction to Protein StructurePart I : Introduction to Protein Structure
Part I : Introduction to Protein Structure
 
Protein Structural predection
Protein Structural predectionProtein Structural predection
Protein Structural predection
 
Cs273 structure prediction
Cs273 structure predictionCs273 structure prediction
Cs273 structure prediction
 
Cross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene OntologyCross Product Extensions to the Gene Ontology
Cross Product Extensions to the Gene Ontology
 
Proteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsProteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for Proteomics
 
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteinslehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
lehninger(sixth edition) Ch 03: Amino acids, peptides and proteins
 
Protein
ProteinProtein
Protein
 
PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...PomBase conventions for improving annotation depth, breadth, consistency and ...
PomBase conventions for improving annotation depth, breadth, consistency and ...
 
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and ProteinsLearning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
Learning Keys , Lehninger Chapter # 3 Amino Acids,Peptides and Proteins
 
Proteins
ProteinsProteins
Proteins
 
Amino acids and proteins
Amino acids and proteinsAmino acids and proteins
Amino acids and proteins
 
Lecture 14 2013.ppt
Lecture 14 2013.pptLecture 14 2013.ppt
Lecture 14 2013.ppt
 
Protein structure
Protein structureProtein structure
Protein structure
 
Atindra-protein.pptx
Atindra-protein.pptxAtindra-protein.pptx
Atindra-protein.pptx
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 

Mehr von Prof. Wim Van Criekinge

2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_uploadProf. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 

Mehr von Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Kürzlich hochgeladen

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 

Kürzlich hochgeladen (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

Bioinformatics t7-protein structure-v2013_wim_vancriekinge

  • 1.
  • 3.
  • 4. The reason for “bioinformatics” to exist ? • empirical finding: if two biological sequences are sufficiently similar, almost invariably they have similar biological functions and will be descended from a common ancestor. • (i) function is encoded into sequence, this means: the sequence provides the syntax and • (ii) there is a redundancy in the encoding, many positions in the sequence may be changed without perceptible changes in the function, thus the semantics of the encoding is robust.
  • 5. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 6. Why protein structure ? • Proteins perform a variety of cellular tasks in the living cells • Each protein adopts a particular folding that determines its function • The 3D structure of a protein can bring into close proximity residues that are far apart in the amino acid sequence • Catalytic site: Business End of the molecule
  • 7. Rationale for understanding protein structure and function Protein sequence -large numbers of sequences, including whole genomes ? Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution structure determination structure prediction Protein structure - three dimensional - complicated - mediates function homology rational mutagenesis biochemical analysis model studies
  • 8. About the use of protein models (Peitch) • Structure is preserved under evolution when sequence is not – Interpreting the impact of mutations/SNPs and conserved residues on protein function. Potential link to disease • Function ? – Biochemical: the chemical interactions occerring in a protein – Biological: role within the cell – Phenotypic: the role in the organism • Gene Ontology functional classification ! – Priorisation of residues to mutate to determine protein function – Providing hints for protein function:Catalytic mechanisms of enzymes often require key residues to be close together in 3D space – (protein-ligand complexes, rational drug design, putative interaction interfaces)
  • 9. MIS-SENSE MUTATION e.g. Sickle Cell Anaemia Cause: defective haemoglobin due to mutation in βglobin gene Symptoms: severe anaemia and death in homozygote
  • 10. Normal β-globin - 146 amino acids val - his - leu - thr - pro - glu - glu - --------1 2 3 4 Normal gene (aa 6) DNA CTC mRNA GAG Product Glu 5 6 7 Mutant gene CAC GUG Valine Mutant β-globin val - his - leu - thr - pro - val - glu - ---------
  • 11. Protein Conformation • Christian Anfinsen Studies on reversible denaturation “Sequence specifies conformation” • Chaperones and disulfide interchange enzymes: involved but not controlling final state, they provide environment to refold if misfolded • Structure implies function: The amino acid sequence encodes the protein’s structural information
  • 12. How does a protein fold ? • by itself: – Anfinsen had developed what he called his "thermodynamic hypothesis" of protein folding to explain the native conformation of amino acid structures. He theorized that the native or natural conformation occurs because this particular shape is thermodynamically the most stable in the intracellular environment. That is, it takes this shape as a result of the constraints of the peptide bonds as modified by the other chemical and physical properties of the amino acids. – To test this hypothesis, Anfinsen unfolded the RNase enzyme under extreme chemical conditions and observed that the enzyme's amino acid structure refolded spontaneously back into its original form when he returned the chemical environment to natural cellular conditions. – "The native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence, in a given environment."
  • 13. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 14. The Basics • Proteins are linear heteropolymers: one or more polypeptide chains • Below about 40 residues the term peptide is frequently used. • A certain number of residues is necessary to perform a particular biochemical function, and around 40-50 residues appears to be the lower limit for a functional domain size. • Protein sizes range from this lower limit to several hundred residues in multi-functional proteins. • Three-dimentional shapes (folds) adopted vary enormously • Experimental methods: – – – – X-ray crystallography NMR (nuclear magnetic resonance) Electron microscopy Ab initio calculations …
  • 15. Levels of protein structure • Zeroth: amino acid composition (proteomics, %cysteine, %glycine)
  • 16. Amino Acid Residues The basic structure of an a-amino acid is quite simple. R denotes any one of the 20 possible side chains (see table below). We notice that the Ca-atom has 4 different ligands (the H is omitted in the drawing) and is thus chiral. An easy trick to remember the correct L-form is the CORN-rule: when the Ca-atom is viewed with the H in front, the residues read "CO-R-N" in a clockwise direction.
  • 17.
  • 22. Levels of protein structure • Primary: This is simply the order of covalent linkages along the polypeptide chain, I.e. the sequence itself
  • 25. Levels of protein structure • Secondary – Local organization of the protein backbone: alphahelix, Beta-strand (which assemble into Betasheets) turn and interconnecting loop.
  • 28. A Practical Approach: Interpretation • Residues with hydrophobic properties conserved at i, i+2, i+4 separated by unconserved or hydrophilic residues suggest surface beta- strands. A short run of hydrophobic amino acids (4 residues) suggests a buried betastrand. Pairs of conserved hydrophobic amino acids separated by pairs of unconserved, or hydrophilic residues suggests an alfa-helix with one face packing in the protein core. Likewise, an i, i+3, i+4, i+7 pattern of conserved hydrophobic residues.
  • 32. Secondary structure prediction:CHOU-FASMAN • Chou, P.Y. and Fasman, G.D. (1974). Conformational parameters for amino acids in helical, sheet, and random coil regions calculated from proteins. Biochemistry 13, 211-221. • Chou, P.Y. and Fasman, G.D. (1974). Prediction of protein conformation. Biochemistry 13, 222-245.
  • 33. Secondary structure prediction:CHOU-FASMAN •Method •Assigning a set of prediction values to a residue, based on statistic analysis of 15 proteins • Applying a simple algorithm to those numbers
  • 34. Secondary structure prediction:CHOU-FASMAN Calculation of preference parameters For each of the 20 residues and each secondary structure ( helix, -sheet and -turn): observed counts • P = Log --------------------- + 1.0 expected counts • Preference parameter > 1.0  specific residue has a preference for the specific secondary structure. • Preference parameter = 1.0  specific residue does not have a preference for, nor dislikes the specific secondary structure. • Preference parameter < 1.0  specific residue dislikes the specific secondary structure.
  • 35. Secondary structure prediction:CHOU-FASMAN Preference parameters Residue P(a) P(b) P(t) f(i) f(i+1) f(i+2) f(i+3) Ala 1.45 0.97 0.57 0.049 0.049 0.034 0.029 Arg 0.79 0.90 1.00 0.051 0.127 0.025 0.101 Asn 0.73 0.65 1.68 0.101 0.086 0.216 0.065 Asp 0.98 0.80 1.26 0.137 0.088 0.069 0.059 Cys 0.77 1.30 1.17 0.089 0.022 0.111 0.089 Gln 1.17 1.23 0.56 0.050 0.089 0.030 0.089 Glu 1.53 0.26 0.44 0.011 0.032 0.053 0.021 Gly 0.53 0.81 1.68 0.104 0.090 0.158 0.113 His 1.24 0.71 0.69 0.083 0.050 0.033 0.033 Ile 1.00 1.60 0.58 0.068 0.034 0.017 0.051 Leu 1.34 1.22 0.53 0.038 0.019 0.032 0.051 Lys 1.07 0.74 1.01 0.060 0.080 0.067 0.073 Met 1.20 1.67 0.67 0.070 0.070 0.036 0.070 Phe 1.12 1.28 0.71 0.031 0.047 0.063 0.063 Pro 0.59 0.62 1.54 0.074 0.272 0.012 0.062 Ser 0.79 0.72 1.56 0.100 0.095 0.095 0.104 Thr 0.82 1.20 1.00 0.062 0.093 0.056 0.068 Trp 1.14 1.19 1.11 0.045 0.000 0.045 0.205 Tyr 0.61 1.29 1.25 0.136 0.025 0.110 0.102 Val 1.14 1.65 0.30 0.023 0.029 0.011 0.029
  • 36. Secondary structure prediction:CHOU-FASMAN Applying algorithm 1. 2. 3. 4. 5. 6. Assign parameters to residue. Identify regions where 4 out of 6 residues have P(a)>100: -helix. Extend helix in both directions until four contiguous residues have an average P(a)<100: end of -helix. If segment is longer than 5 residues and P(a)>P(b): -helix. Repeat this procedure to locate all of the helical regions. Identify regions where 3 out of 5 residues have P(b)>100: -sheet. Extend sheet in both directions until four contiguous residues have an average P(b)<100: end of -sheet. If P(b)>105 and P(b)>P(a): -helix. Rest: P(a)>P(b)  -helix. P(b)>P(a)  -sheet. To identify a bend at residue number i, calculate the following value: p(t) = f(i)f(i+1)f(i+2)f(i+3) If: (1) p(t) > 0.000075; (2) average P(t)>1.00 in the tetrapeptide; and (3) averages for tetrapeptide obey P(a)<P(t)>P(b): -turn.
  • 37. Secondary structure prediction:CHOU-FASMAN Successful method? 19 proteins evaluated: • Successful in locating 88% of helical and 95% of regions • Correctly predicting 80% of helical and 86% of sheet residues • Accuracy of predicting the three conformational states for all residues, helix, b, and coil, is 77% Chou & Fasman:successful method After 1974:improvement of preference parameters
  • 38.
  • 39. Sander-Schneider: Evolution of overall structure • Naturally occurring sequences with more than 20% sequence identity over 80 or more residues always adopt the same basic structure (Sander and Schneider 1991)
  • 40. Sander-Schneider • HSSP: homology derived secondary structure
  • 41. Structural Family Databases • SCOP: – Structural Classification of Proteins • FSSP: – Family of Structurally Similar Proteins • CATH: – Class, Architecture, Topology, Homology
  • 42. Levels of protein structure • Tertiary – Packing of secondary structure elements into a compact spatial unit – Fold or domain – this is the level to which structure is currently possible
  • 45. Domains • Protein Dissection into domain • Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence • (automatic when blasting)
  • 46. Domains • From the analysis of alignment of protein families • Conserved sequence features, usually associate with a specific function • PROSITE database for protein “signature” protein (large amount of FP & FN) • From aligment of homologous sequences (PRINTS/PRODOM) • From Hidden Markov Models (PFAM) • Meta approach: INTERPRO
  • 48. Levels of protein structure: Topology
  • 49. Hydrophobicity Plot P53_HUMAN (P04637) human cellular tumor antigen p53 Kyte-Doolittle hydrophilicty, window=19
  • 50.
  • 51. The ‘positive inside’ rule (EMBO J. 5:3021; EJB 174:671,205:1207; FEBS lett. 282:41) Bacterial IM In: 16% KR out: 4% KR Eukaryotic PM In: 17% KR out: 7% KR Thylakoid membrane In: 13% KR out: 5% KR Mitochondrial IM In: 10% KR out: 3% KR
  • 52.
  • 53. GPCR Topology • Membrane-bound receptors • Transducing messages as photons, organic odorants, nucleotides, nucleosides, peptides, lipids and proteins. • 6 different families • A very large number of different domains both to bind their ligand and to activate G proteins. • Pharmaceutically the most important class • Challenge: Methods to find novel GCPRs in human genome …
  • 55. GPCR Topology GPCR Structure • Seven transmembrane regions • Hydrophobic/ hydrophilic domains • Conserved residues and motifs (i.e. NPXXY)
  • 56. GPCR Topology Eg. Plot conserverd residues (or multiple alignement: MSA to SSA)
  • 57. Levels of protein structure • Difficult to predict • Functional units: Apoptosome, proteasome
  • 58. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 59. What is X-ray Crystallography • X-ray crystallography is an experimental technique that exploits the fact that X-rays are diffracted by crystals. • X-rays have the proper wavelength (in the Ångström range, ~10-8 cm) to be scattered by the electron cloud of an atom of comparable size. • Based on the diffraction pattern obtained from X-ray scattering off the periodic assembly of molecules or atoms in the crystal, the electron density can be reconstructed. • A model is then progressively built into the experimental electron density, refined against the data and the result is a quite accurate molecular structure.
  • 60. NMR or Crystallography ? • NMR uses protein in solution – Can look at the dynamic properties of the protein structure – Can look at the interactions between the protein and ligands, substrates or other proteins – Can look at protein folding – Sample is not damaged in any way – The maximum size of a protein for NMR structure determination is ~30 kDa.This elliminates ~50% of all proteins – High solubility is a requirement • X-ray crystallography uses protein crystals – – – – – – No size limit: As long as you can crystallise it Solubility requirement is less stringent Simple definition of resolution Direct calculation from data to electron density and back again Crystallisation is the process bottleneck, Binary (all or nothing) Phase problem Relies on heavy atom soaks or SeMet incorporation • Both techniques require large amounts of pure protein and require expensive equipment!
  • 61. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 62. PDB
  • 63. PDB
  • 64. PDB
  • 65. PDB
  • 67. Visualizing Structures Ball: Van der Waals radius Stick: length joins center N, blue/O, red/S, yellow/C, gray (green)
  • 69. Visualizing Structures • Demonstration of Protein explorer • PDB, install Chime • Search helicase (select structure where DNA is present) • Stop spinning, hide water molecules • Show basic residues, interact with negatively charged backbone • RASMOL / Cn3D
  • 70. Protein Structure Introduction Why ? How do proteins fold ? Levels of protein structure 0,1,2,3,4 X-ray / NMR The Protein Database (PDB) Protein Modeling Bioinformatics & Proteomics Weblems
  • 72. Protein Stucture Molecular Modeling: building a 3D protein structure from its sequence
  • 73. Modeling • Finding a structural homologue • Blast –versus PDB database or PSIblast (E<0.005) –Domain coverage at least 60% • Avoid Gaps –Choose for few gaps and reasonable similarity scores instead of lots of gaps and high similarity scores
  • 74. Modeling • Extract “template” sequences and align with query • • Whatch out for missing data (PDB file) and complement with additonal templates Try to get as much information as possible, X/NMR • Sequence alignment from structure comparson of templates (SSA) can be different from a simple sequence aligment • • >40% identity, any aligment method is OK <40%, checks are essential – – – – • Residue conservation checks in functional regions (patterns/motifs) Indels: combine gaps separted by few resides Manual editing: Move gaps from secondary elements to loops Within loops, move gaps to loop ends, i.e. turnaround point of backbone Align templates structurally, extract the corresponding SSA or QTA (Query/template alignment)
  • 75. Modeling Input for model building • Query sequence (the one you want the 3D model for) • Template sequences and structures • Query/Template(s) (structure) sequence aligment
  • 76. Modeling • Methods (details on these see paper): – WHATIF, – SWISS-MODEL, – MODELLER, – ICM, – 3D-JIGSAW, – CPH-models, – SDC1
  • 77. Modeling • Model evaluation (How good is the prediction, how much can the algorithm rely/extract on the provided templates) – PROCHECK – WHATIF – ERRAT • CASP (Critical Assessment of Structure Prediction) – Beste method is manual alignment editing !
  • 78. Comparative modelling at CASP BC alignment side chain short loops longer loops CASP1 CASP2 CASP3 CASP4 excellent ~ 80% 1.0 Å 2.0 Å poor ~ 50% ~ 3.0 Å > 5.0 Å fair ~ 75% ~ 1.0 Å ~ 3.0 Å fair ~75% ~ 1.0 Å ~ 2.5 Å fair ~75% ~ 1.0 Å ~ 2.0 Å CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity **T128/sodm – 1.0 Å (198 residues; 50%) **T111/eno – 1.7 Å (430 residues; 51%) **T122/trpa – 2.9 Å (241 residues; 33%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T112/dhso – 4.9 Å (348 residues; 24%) **T92/yeco – 5.6 Å (104 residues; 12%)
  • 79.
  • 80. Protein Engineering / Protein Design