SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
RNA bioinformatics
Paul Gardner
April 2, 2015
Paul Gardner RNA bioinformatics
Main questions
How can we predict RNA structure?
Paul Gardner RNA bioinformatics
Why do we care about RNA?
RNA is important for translation and gene regulation
2
3 of the ribosome is RNA. Ribosomal function is preserved
even after amino-acid residues are deleted from the active site!
Current estimates indicate that the number of ncRNA genes is
comparable to the number of protein coding genes.
mDNA
uDNA
rDNA
tDNA
pre-mRNA
mRNA
nascent
protein
localised
protein
spliceosome
ribosome
tRNA
+
RNase P
RNase MRP+snoRNP
snoRNP
SRP
tmRNA
transcription
splicing
translation
transport
RISC (miRNA)
Paul Gardner RNA bioinformatics
RNA: why is this stuff interesting?
RNA world was an essential step to modern protein-DNA
based life (using current reasonable models).
Which came first, DNA or protein?
RNA has catalytic potential (like protein), carries hereditary
information (like DNA).
Image by James W. Brown, www.mbio.ncsu.edu/JWB/soup.html
Paul Gardner RNA bioinformatics
RNA interference
Image lifted from: http://en.wikipedia.org/wiki/RNA interference
Paul Gardner RNA bioinformatics
RNA: structure
G
C
G
G
A
U
UU
A
GCUC
AGD
D
G
G G A
G A G C
G
C
C
A
GA
C
U
G
A A
.
A
.
C
U
G
GAGG
U
C
C U G U G
T . C
G
A
UC
CACAG
A
A
U
U
C
G
C
A
C
CA
Variable
LoopAnticodon
Loop
T ΨC
Loop
10 15 20 25 30 355 40 45 50 55 60 65 70 75
Anticodon
Loop
Acceptor
Stem
GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA5’ 3’
Secondary Structure Tertiary StructureB C
Primary StructureA
Acceptor
Stem
T ΨC
Loop
ΨΨ
Ψ
Ψ
Y
65
60
55
40
10
20
15
5
70
75
25
30
35
45
50
D Loop
3’
5’
5’
3’
D Loop
Paul Gardner RNA bioinformatics
RNA: base-pairing
Canonical (Watson-Crick) base-pairs C · G, A · U.
Non-canonical (Wobble) base-pair G · U
Note: other non-canonical base-pairs do occur, but these are
“rare” and generally re-defined as “tertiary” interactions.
Central dogma of structural biology: structure is important for
function.
Images lifted from: http://en.wikipedia.org/wiki/Base pair
Paul Gardner RNA bioinformatics
RNA: base-pairing
Images lifted from: http://eternawiki.org/wiki/index.php5/Base Pair
Paul Gardner RNA bioinformatics
RNA: base-pairing
bpC C:G U:A U:G G:A C:A U:C A:A C:C G:G U:U Total
WC 49.8% 14.4% 0.01% 1.2% 0.1% 0.5% - - - - 66.1%
Wb 0.06% 0.06% 7.1% - 0.2% - 0.3% 0.5% 0.2% 0.9% 9.6%
Other 0.8% 5.8% 1.5% 9.4% 2.3% 0.6% 2.6% 0.5% 0.7% 0.3% 24.3%
Total 50.7% 20.3% 8.7% 10.6% 2.6% 1.0% 2.9% 1.0% 0.9% 1.3% 100.0%
Just 71.3% of rRNA contacts are canonical or G:U wobble!
Lee & Gutell (2004) Diversity of base-pair conformations and their occurrence in rRNA structure and RNA
structural motifs J Mol Biol.
Paul Gardner RNA bioinformatics
RNA stacking
Laurberg et al. (2008) Structural basis for translation termination on the 70S ribosome Nature. Image lifted from:
http://rna.ucsc.edu/pdbrestraints/index.html
Paul Gardner RNA bioinformatics
RNA: number of structures
AN is the number of possible secondary sequences of length N.
AN ∼ 4N
SN is the number of possible secondary structures of length N.
S0 = S1 = 1
SN+1 = SN +
N
j=1
Sj−1SN−j+1
SN ∼ 1.8N
Hofacker et al. (1998) Combinatorics of RNA Secondary Structures, Discrete Applied Mathematics.
Paul Gardner RNA bioinformatics
How can we make a secondary structure prediction
algorithm?
Maximize the number of base-pairs in a
RNA sequence?
Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math.
Paul Gardner RNA bioinformatics
Structure prediction: Nussinov
Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math.
Image from: Eddy SR (2004) How do RNA folding algorithms work? Nature Biotechnology.
Paul Gardner RNA bioinformatics
Structure prediction: Nussinov
Maximize the number of base-pairs in RNA sequence.
Seq = s1s2 · · · sn
Ni,j = 0, ∀ j − i < 3.
Ni,j = max



Ni+1,j−1 + ρ(i, j), i, j pair
Ni+1,j , i unpaired
Ni,j−1, j unpaired
maxi<k<j [Ni,k + Nk+1,j ] bifurcation
O(n3) in CPU, O(n2) in memory.
ρ(i, j) = 1 if si and sj are complementary, otherwise
ρ(i, j) = 0.
N1,n = BPmax .
Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math.
Paul Gardner RNA bioinformatics
Structure prediction: Nussinov
There are a few problems with this approach:
the solution to Nussinov is frequently not unique. For example,
the 77 nucleotide long tRNAhis
has 22 base-pairs in the
phylogentic structure, there are 149, 126 structures with the
maximal number of 26 base-pairs!
The method ignores stacking interactions.
Fontana (2002) Modelling ‘evo-devo’ with RNA. BioEssays.
Paul Gardner RNA bioinformatics
Structure prediction: Zuker
Nearest neighbour model
Modified Nussinov algorithm to find minimal free energy
(most stable) structures
A U
C G
U A
G C
S3
S2
S1
S1 S2 S3
GU L
A C
Free Energy = L + + +
= −1.70 kcal/mol
= 5.00 − 2.11 − 2.35 − 2.24
∆Gstack = ∆H37,stack − T∆S37,stack
∆Gloop = −T∆S37,loop
Tinoco et al. (1971) Estimation of secondary structure in RNA. Nature.
Paul Gardner RNA bioinformatics
Structure prediction: Zuker
WXY Z CG GC AU UA GU UG
CG -3.26 -2.36 -2.11 -2.08 -1.41 -2.11
GC -3.42 -3.26 -2.35 -2.24 -1.53 -2.51
AU -2.24 -2.08 -0.93 -1.10 -0.55 -1.36
UA -2.35 -2.11 -1.33 -0.93 -1.00 -1.27
GU -2.51 -2.11 -1.27 -1.36 +0.47 +1.29
UG -1.53 -1.41 -1.00 -0.55 +0.30 +0.47
Energies (∆G in kcals/mol) of 5
3
W
X
Y
Z
3
5 stacked basepairs.
Note that ∆G of 5
3
W
X
Y
Z
3
5 stacks is the same as 5
3
Z
Y
X
W
3
5 stacks.
Mathews et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA
secondary structure. JMB.
Paul Gardner RNA bioinformatics
Suboptimal structures
“There is an embarrassing abundance of structures having a free
energy near that of the optimum.” (McCaskill 1990)
−5 0 5 10 15 20 25 30 35
−22
−21.8
−21.6
−21.4
−21.2
−21
−20.8
−20.6
−20.4
−20.2
−20
dBP
(Si
,Smfe
)
∆G(kcal/mol)
G
C
G
G
A
U
U
U
A
G
CU
C
A
G U
U
G
G
G
A
G
A
G
C
G
C
C
A
G
A
C
U
G
A
A
G
A U U
U
G
G
AG
G
U
C
C
U
G
U
G
U
U
C
G
A
U
C
C
A
C
A
G
A
A
U
U
C
G
C
A
G
C
G
G
A
UUU
A
GCUC
AGU
U
G
G G A
G A G C
G
C
C
A
G
A
C
U
G A
A
GA
U
U
U
G
GAGG
U
C
C U G U G
U U
C
G
AUC
CACAG
A
A
U
U
C
G
C
A
G
C
G
G
A
U
U
UA
G
C
UCAGUUG
GGAG
A
G C G
C C A
G A C U G A
AGAU
U
U G
G A
G G U C
C
U G
U
G
U
UC
GAUC
CA
CA
G
A
A
U
U
C
G
C
A
Biological
Suboptimal
MFE
Wuchty et al. (1999) Complete suboptimal folding of RNA and the stability of secondary structures, Biopolymers.
Paul Gardner RNA bioinformatics
Accuracy of MFE predictions
Non-independant benchmarks:
Walter et al. (1994) Mean sensitivity 63.6
Mathews et al. (1999) Mean sensitivity 72.9%
Independant benchmarks:
Doshi et al. (2004) Mean sensitivity 41%
Dowell & Eddy (2004) Mean sensitivity 56% Mean PPV 48%
Gardner & Giegerich (2004) Mean sensitivity 56% Mean PPV
46%
Data-sets: tRNA, SSU rRNA, LSU rRNA, SRP, RNase P, tmRNA.
Paul Gardner RNA bioinformatics
Limitations of MFE predictions
Energy parameters: estimated at constant salt
concentrations and temperatures.
Energy model: models of loop energies are extrapolated from
relatively few experiments, no pseudoknots, ...
Cellular environment: contains proteins, RNAs, DNAs,
sugars, etc
Post-transcriptional modifications: many functional RNAs
have been covalently modified.
Folding kinetics: RNAs fold along “pathways”, perhaps
becoming trapped in sub-optimal conformations.
Co-transcriptional folding: RNAs fold during transcription,
the transcriptional apparatus occludes 3’ portions of the
sequence.
Transcription is jerky: transcriptional pausing can influence
folding.
Paul Gardner RNA bioinformatics
Comparative sequence analysis
Input: a set of sequences with the same biological function
which are assumed to have approximately the same structure.
Output: the common structural elements, aligned sequences
and a phylogeny which best explains the observed data.
2
4
5
3
1
>1
GCAUCCAUGGCUGAAUGGUUAAAGCGCCCAACUCAUAAUUGGCGAACUCGCGGGUUCAAUUCCUGCUGGAUGCA
>2
GCAUUGGUGGUUCAGUGGUAGAAUUCUCGCCUGCCACGCGGGAGGCCCGGGUUCGAUUCCCGGCCAAUGCA
>3
UGGGCUAUGGUGUAAUUGGCAGCACGACUGAUUCUGGUUCAGUUAGUCUAGGUUCGAGUCCUGGUAGCCCAG
>4
GAAGAUCGUCGUCUCCGGUGAGGCGGCUGGACUUCAAAUCCAGUUGGGGCCGCCAGCGGUCCCGGGCAGGUUCGACUCCUGUGAUCUUCCG
>5
CUAAAUAUAUUUCAAUGGUUAGCAAAAUACGCUUGUGGUGCGUUAAAUCUAAGUUCGAUUCUUAGUAUUUACC
** *
1 GCAUCCAUGGCUGAAU-GGUU-AAAGCGCCCAACUCAUAAUUGGCGAA--
2 GCAUUGGUGGUUCAGU-GGU--AGAAUUCUCGCCUGCCACGCGG-GAG--
3 UGGGCUAUGGUGUAAUUGGC--AGCACGACUGAUUCUGGUUCAG-UUA--
4 GAAGAUCGUCGUCUCC-GGUG-AGGCGGCUGGACUUCAAAUCCA-GU-UG
5 CUAAAUAUAUUUCAAU-GGUUAGCAAAAUACGCUUGUGGUGCGU-UAA--
**** * **
1 ------------------CUCGCGGGUUCAAUUCCUGCUGGAUGC-A
2 ------------------G-CCCGGGUUCGAUUCCCGGCCAAUGC-A
3 ------------------G-UCUAGGUUCGAGUCCUGGUAGCCCA-G
4 GGGCCGCCAGCGGUCCCG--GGCAGGUUCGACUCCUGUGAUCUUCCG
5 ------------------A-UCUAAGUUCGAUUCUUAGUAUUUAC-C
S
M
A
D
M
Y
MUR
SYUC
A
MY-
G
G
Y
u a A
V M M M
R M
H
C
R
MY
U
S
H V R
H
K
C
V
R
c
K
W
A
-
-
-
-
- c c - c
c
a
-
c
-
-
-
c
c
c
-V-YS Y R R G
U U
C
R
AY
U
CCYRS
Y
M
D
M
Y
V
M
c
V
Paul Gardner RNA bioinformatics
Comparative sequence analysis
Evolution of RNA sequences
Base-pairs that covary have strong evolutionary support
U
A
C
A
A
G
A
G
U
G C
G
U
U
U
A
A
G
U
AY
R
Y
A
A
S
M
G
U
S C
G
Y
K
K
A
A
G
Y
RY
A
U
A
A
N
A
D
U
G C
G
U
U
G
A
A
G
U
R
c
b
(((..(((....)))..)))
(((..(((....)))..)))
(((..(((....)))..)))
(((..(((....)))..)))
UACAAGAGUGCGCUUAAGUA
UGCAAAAGUCCGUUUAAGCA
UAUAACCUUUCGAGGAAAUA
CAUAAUAAUGCGUUGAAGUG
a
MIS
YAUAANADUGCGUUGAAGURAncestral
UACAAGAGUGCGUUUAAGUA
YRYAASMGUSCGYKKAAGYR
consensus
consensusAncestral MIS
G U
A U
G C
U G
C G
U A
fast fast
slow
Paul Gardner RNA bioinformatics
Alignment Folding: RNAalifold
Generate an alignment (e.g. with ClustalW)
Find a consensus structure that is both energetically stable in
all sequences and has covariation support
G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A
G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A
GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA
GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA
G
C
B
K
M
W
WU
A
GCUC
A
GU
u
-
G
G K A
G A G C
R
Y
Y
W
S
A
Y
U
K
A W
R
A
U
C
W
R
RAKG
u
C
S C S -R G
U U
C
G
AWY
CYSKB
W
W
U
S
S
G
C
A
UA
Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol.
Paul Gardner RNA bioinformatics
Alignment Folding: RNAalifold
RNAalifold: energy + covariation.
βi,j =
1
N
N
α
Zα
i,j − Cov
Ci,j =
2
N(N − 1)
bα
i bα
j ,bβ
i bβ
j
DH(bα
i bα
j , bβ
i bβ
j )Πα
ij Πβ
ij
Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol.
Paul Gardner RNA bioinformatics
Covariation metrics
Lindgreen, Gardner & Krogh (2006) Measuring covariation in RNA alignments: physical realism improves
information measures. Bioinformatics.
Paul Gardner RNA bioinformatics
Rfam: annotation hierarchy
Types Clans Families Sequences
ribozyme
tRNA
CD-box_snoRNA
splicing
thermoregulator
leader
HACA-box_snoRNA
scaRNA
Intron
IRES
frameshift_element
sRNA
riboswitch
antisense
rRNA
miRNA
CRISPR
Cis-reg.
Gene
snRNA
snoRNA
Intron
Types
Paul Gardner RNA bioinformatics
Building an Rfam family
A structure from literature
An Rfam family: produced manually from publication figures
Paul Gardner RNA bioinformatics
An example Rfam entry
Paul Gardner RNA bioinformatics
Relevant reading
Reviews:
Eddy SR (2004) How do RNA folding algorithms work?
Nature Biotechnology.
Methods:
Hofacker et al. (2002) Secondary Structure Prediction for
Aligned RNA Sequences, J.Mol.Biol.
Paul Gardner RNA bioinformatics
The End
Paul Gardner RNA bioinformatics

Weitere ähnliche Inhalte

Was ist angesagt?

160902 Progress Report 進捗報告
160902 Progress Report 進捗報告160902 Progress Report 進捗報告
160902 Progress Report 進捗報告Yanbin Lin
 
CRISPR - gene-editing for everyone
CRISPR - gene-editing for everyoneCRISPR - gene-editing for everyone
CRISPR - gene-editing for everyoneCandy Smellie
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...Spencer Bliven
 
Gene Editing for everyone
Gene Editing for everyoneGene Editing for everyone
Gene Editing for everyoneMike Jowett
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Paul Gardner
 
University of Texas at Austin
University of Texas at AustinUniversity of Texas at Austin
University of Texas at Austinbutest
 
Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Michael Weiner
 
the application of CRISPR/Cas9 system in genome editing
the application of CRISPR/Cas9 system in genome editingthe application of CRISPR/Cas9 system in genome editing
the application of CRISPR/Cas9 system in genome editingArash zolnori
 
Characterization in Dvilp 7 gene
Characterization in Dvilp 7 geneCharacterization in Dvilp 7 gene
Characterization in Dvilp 7 geneHunter Kelley
 
Gene editing application for cancer therapeutics
Gene editing application for cancer therapeuticsGene editing application for cancer therapeutics
Gene editing application for cancer therapeuticsNur Farrah Dini
 
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHMijcsa
 
2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp SciJason Stajich
 
Bioo Scientific - Absolute Quantitation for RNA-Seq
Bioo Scientific - Absolute Quantitation for RNA-SeqBioo Scientific - Absolute Quantitation for RNA-Seq
Bioo Scientific - Absolute Quantitation for RNA-SeqBioo Scientific
 
CRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
CRISPR/Cas9 for the Correction of Duchenne Muscular DystrophyCRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
CRISPR/Cas9 for the Correction of Duchenne Muscular DystrophyNofiaFira
 
Genome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELENGenome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELENabhijeetanandha1
 

Was ist angesagt? (20)

160902 Progress Report 進捗報告
160902 Progress Report 進捗報告160902 Progress Report 進捗報告
160902 Progress Report 進捗報告
 
CRISPR - gene-editing for everyone
CRISPR - gene-editing for everyoneCRISPR - gene-editing for everyone
CRISPR - gene-editing for everyone
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
 
Gene Editing for everyone
Gene Editing for everyoneGene Editing for everyone
Gene Editing for everyone
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
emm201548a
emm201548aemm201548a
emm201548a
 
University of Texas at Austin
University of Texas at AustinUniversity of Texas at Austin
University of Texas at Austin
 
Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]Gene 151_119 (1994) [SDM of dsDNA]
Gene 151_119 (1994) [SDM of dsDNA]
 
the application of CRISPR/Cas9 system in genome editing
the application of CRISPR/Cas9 system in genome editingthe application of CRISPR/Cas9 system in genome editing
the application of CRISPR/Cas9 system in genome editing
 
Characterization in Dvilp 7 gene
Characterization in Dvilp 7 geneCharacterization in Dvilp 7 gene
Characterization in Dvilp 7 gene
 
Austin Neurology & Neurosciences
Austin Neurology & NeurosciencesAustin Neurology & Neurosciences
Austin Neurology & Neurosciences
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
 
1.4 av
1.4 av1.4 av
1.4 av
 
Gene editing application for cancer therapeutics
Gene editing application for cancer therapeuticsGene editing application for cancer therapeutics
Gene editing application for cancer therapeutics
 
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHM
 
Zinc finger
Zinc fingerZinc finger
Zinc finger
 
2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci2009 11 16 UCR Comp Sci
2009 11 16 UCR Comp Sci
 
Bioo Scientific - Absolute Quantitation for RNA-Seq
Bioo Scientific - Absolute Quantitation for RNA-SeqBioo Scientific - Absolute Quantitation for RNA-Seq
Bioo Scientific - Absolute Quantitation for RNA-Seq
 
CRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
CRISPR/Cas9 for the Correction of Duchenne Muscular DystrophyCRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
CRISPR/Cas9 for the Correction of Duchenne Muscular Dystrophy
 
Genome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELENGenome Editing- ZNF vs TELEN
Genome Editing- ZNF vs TELEN
 

Ähnlich wie BIOL335: RNA bioinformatics

Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...Alexander Junge
 
RNA and Dendritic Granules
RNA and Dendritic GranulesRNA and Dendritic Granules
RNA and Dendritic Granulestoryblackwell
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1ICGEB
 
Conservation of codon optimality
Conservation of codon optimalityConservation of codon optimality
Conservation of codon optimalityAlistair Martin
 
genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptMohamedHasan816582
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Computational studies of proteins and nucleic acid (Dissertation)
Computational studies of proteins and nucleic acid (Dissertation)Computational studies of proteins and nucleic acid (Dissertation)
Computational studies of proteins and nucleic acid (Dissertation)chrisltang
 
Gutell 041.nar.1994.22.03502
Gutell 041.nar.1994.22.03502Gutell 041.nar.1994.22.03502
Gutell 041.nar.1994.22.03502Robin Gutell
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networksBITS
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Robin Gutell
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesPaul Gardner
 

Ähnlich wie BIOL335: RNA bioinformatics (20)

Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...
 
RNA and Dendritic Granules
RNA and Dendritic GranulesRNA and Dendritic Granules
RNA and Dendritic Granules
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1
 
Conservation of codon optimality
Conservation of codon optimalityConservation of codon optimality
Conservation of codon optimality
 
genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.ppt
 
Thesis def
Thesis defThesis def
Thesis def
 
proteome.pdf
proteome.pdfproteome.pdf
proteome.pdf
 
Molecular markers
Molecular markersMolecular markers
Molecular markers
 
Computational studies of proteins and nucleic acid (Dissertation)
Computational studies of proteins and nucleic acid (Dissertation)Computational studies of proteins and nucleic acid (Dissertation)
Computational studies of proteins and nucleic acid (Dissertation)
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
Gutell 041.nar.1994.22.03502
Gutell 041.nar.1994.22.03502Gutell 041.nar.1994.22.03502
Gutell 041.nar.1994.22.03502
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
proteome.pptx
proteome.pptxproteome.pptx
proteome.pptx
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
 
CE-Symm jLBR talk
CE-Symm jLBR talkCE-Symm jLBR talk
CE-Symm jLBR talk
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Cell 672
Cell 672Cell 672
Cell 672
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
 

Mehr von Paul Gardner

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfPaul Gardner
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfPaul Gardner
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfPaul Gardner
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfPaul Gardner
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfPaul Gardner
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methodsPaul Gardner
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methodsPaul Gardner
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrapPaul Gardner
 
Contingency tables
Contingency tablesContingency tables
Contingency tablesPaul Gardner
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlationPaul Gardner
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samplesPaul Gardner
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samplesPaul Gardner
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spreadPaul Gardner
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysisPaul Gardner
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...Paul Gardner
 
Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seqPaul Gardner
 

Mehr von Paul Gardner (20)

ppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdfppgardner-lecture07-genome-function.pdf
ppgardner-lecture07-genome-function.pdf
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdfppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
 
ppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdfppgardner-lecture04-annotation-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
 
ppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdfppgardner-lecture03-genomesize-complexity.pdf
ppgardner-lecture03-genomesize-complexity.pdf
 
Machine learning methods
Machine learning methodsMachine learning methods
Machine learning methods
 
Clustering
ClusteringClustering
Clustering
 
Monte Carlo methods
Monte Carlo methodsMonte Carlo methods
Monte Carlo methods
 
The jackknife and bootstrap
The jackknife and bootstrapThe jackknife and bootstrap
The jackknife and bootstrap
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
Regression (II)
Regression (II)Regression (II)
Regression (II)
 
Regression (I)
Regression (I)Regression (I)
Regression (I)
 
Analysis of covariation and correlation
Analysis of covariation and correlationAnalysis of covariation and correlation
Analysis of covariation and correlation
 
Analysis of two samples
Analysis of two samplesAnalysis of two samples
Analysis of two samples
 
Analysis of single samples
Analysis of single samplesAnalysis of single samples
Analysis of single samples
 
Centrality and spread
Centrality and spreadCentrality and spread
Centrality and spread
 
Fundamentals of statistical analysis
Fundamentals of statistical analysisFundamentals of statistical analysis
Fundamentals of statistical analysis
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
 
01 nc rna-intro
01 nc rna-intro01 nc rna-intro
01 nc rna-intro
 
Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seq
 

Kürzlich hochgeladen

Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 

Kürzlich hochgeladen (20)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 

BIOL335: RNA bioinformatics

  • 1. RNA bioinformatics Paul Gardner April 2, 2015 Paul Gardner RNA bioinformatics
  • 2. Main questions How can we predict RNA structure? Paul Gardner RNA bioinformatics
  • 3. Why do we care about RNA? RNA is important for translation and gene regulation 2 3 of the ribosome is RNA. Ribosomal function is preserved even after amino-acid residues are deleted from the active site! Current estimates indicate that the number of ncRNA genes is comparable to the number of protein coding genes. mDNA uDNA rDNA tDNA pre-mRNA mRNA nascent protein localised protein spliceosome ribosome tRNA + RNase P RNase MRP+snoRNP snoRNP SRP tmRNA transcription splicing translation transport RISC (miRNA) Paul Gardner RNA bioinformatics
  • 4. RNA: why is this stuff interesting? RNA world was an essential step to modern protein-DNA based life (using current reasonable models). Which came first, DNA or protein? RNA has catalytic potential (like protein), carries hereditary information (like DNA). Image by James W. Brown, www.mbio.ncsu.edu/JWB/soup.html Paul Gardner RNA bioinformatics
  • 5. RNA interference Image lifted from: http://en.wikipedia.org/wiki/RNA interference Paul Gardner RNA bioinformatics
  • 6. RNA: structure G C G G A U UU A GCUC AGD D G G G A G A G C G C C A GA C U G A A . A . C U G GAGG U C C U G U G T . C G A UC CACAG A A U U C G C A C CA Variable LoopAnticodon Loop T ΨC Loop 10 15 20 25 30 355 40 45 50 55 60 65 70 75 Anticodon Loop Acceptor Stem GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA5’ 3’ Secondary Structure Tertiary StructureB C Primary StructureA Acceptor Stem T ΨC Loop ΨΨ Ψ Ψ Y 65 60 55 40 10 20 15 5 70 75 25 30 35 45 50 D Loop 3’ 5’ 5’ 3’ D Loop Paul Gardner RNA bioinformatics
  • 7. RNA: base-pairing Canonical (Watson-Crick) base-pairs C · G, A · U. Non-canonical (Wobble) base-pair G · U Note: other non-canonical base-pairs do occur, but these are “rare” and generally re-defined as “tertiary” interactions. Central dogma of structural biology: structure is important for function. Images lifted from: http://en.wikipedia.org/wiki/Base pair Paul Gardner RNA bioinformatics
  • 8. RNA: base-pairing Images lifted from: http://eternawiki.org/wiki/index.php5/Base Pair Paul Gardner RNA bioinformatics
  • 9. RNA: base-pairing bpC C:G U:A U:G G:A C:A U:C A:A C:C G:G U:U Total WC 49.8% 14.4% 0.01% 1.2% 0.1% 0.5% - - - - 66.1% Wb 0.06% 0.06% 7.1% - 0.2% - 0.3% 0.5% 0.2% 0.9% 9.6% Other 0.8% 5.8% 1.5% 9.4% 2.3% 0.6% 2.6% 0.5% 0.7% 0.3% 24.3% Total 50.7% 20.3% 8.7% 10.6% 2.6% 1.0% 2.9% 1.0% 0.9% 1.3% 100.0% Just 71.3% of rRNA contacts are canonical or G:U wobble! Lee & Gutell (2004) Diversity of base-pair conformations and their occurrence in rRNA structure and RNA structural motifs J Mol Biol. Paul Gardner RNA bioinformatics
  • 10. RNA stacking Laurberg et al. (2008) Structural basis for translation termination on the 70S ribosome Nature. Image lifted from: http://rna.ucsc.edu/pdbrestraints/index.html Paul Gardner RNA bioinformatics
  • 11. RNA: number of structures AN is the number of possible secondary sequences of length N. AN ∼ 4N SN is the number of possible secondary structures of length N. S0 = S1 = 1 SN+1 = SN + N j=1 Sj−1SN−j+1 SN ∼ 1.8N Hofacker et al. (1998) Combinatorics of RNA Secondary Structures, Discrete Applied Mathematics. Paul Gardner RNA bioinformatics
  • 12. How can we make a secondary structure prediction algorithm? Maximize the number of base-pairs in a RNA sequence? Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Paul Gardner RNA bioinformatics
  • 13. Structure prediction: Nussinov Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Image from: Eddy SR (2004) How do RNA folding algorithms work? Nature Biotechnology. Paul Gardner RNA bioinformatics
  • 14. Structure prediction: Nussinov Maximize the number of base-pairs in RNA sequence. Seq = s1s2 · · · sn Ni,j = 0, ∀ j − i < 3. Ni,j = max    Ni+1,j−1 + ρ(i, j), i, j pair Ni+1,j , i unpaired Ni,j−1, j unpaired maxi<k<j [Ni,k + Nk+1,j ] bifurcation O(n3) in CPU, O(n2) in memory. ρ(i, j) = 1 if si and sj are complementary, otherwise ρ(i, j) = 0. N1,n = BPmax . Nussinov et al. (1978) Algorithms for loop matching, SIAM J. Appl. Math. Paul Gardner RNA bioinformatics
  • 15. Structure prediction: Nussinov There are a few problems with this approach: the solution to Nussinov is frequently not unique. For example, the 77 nucleotide long tRNAhis has 22 base-pairs in the phylogentic structure, there are 149, 126 structures with the maximal number of 26 base-pairs! The method ignores stacking interactions. Fontana (2002) Modelling ‘evo-devo’ with RNA. BioEssays. Paul Gardner RNA bioinformatics
  • 16. Structure prediction: Zuker Nearest neighbour model Modified Nussinov algorithm to find minimal free energy (most stable) structures A U C G U A G C S3 S2 S1 S1 S2 S3 GU L A C Free Energy = L + + + = −1.70 kcal/mol = 5.00 − 2.11 − 2.35 − 2.24 ∆Gstack = ∆H37,stack − T∆S37,stack ∆Gloop = −T∆S37,loop Tinoco et al. (1971) Estimation of secondary structure in RNA. Nature. Paul Gardner RNA bioinformatics
  • 17. Structure prediction: Zuker WXY Z CG GC AU UA GU UG CG -3.26 -2.36 -2.11 -2.08 -1.41 -2.11 GC -3.42 -3.26 -2.35 -2.24 -1.53 -2.51 AU -2.24 -2.08 -0.93 -1.10 -0.55 -1.36 UA -2.35 -2.11 -1.33 -0.93 -1.00 -1.27 GU -2.51 -2.11 -1.27 -1.36 +0.47 +1.29 UG -1.53 -1.41 -1.00 -0.55 +0.30 +0.47 Energies (∆G in kcals/mol) of 5 3 W X Y Z 3 5 stacked basepairs. Note that ∆G of 5 3 W X Y Z 3 5 stacks is the same as 5 3 Z Y X W 3 5 stacks. Mathews et al. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. JMB. Paul Gardner RNA bioinformatics
  • 18. Suboptimal structures “There is an embarrassing abundance of structures having a free energy near that of the optimum.” (McCaskill 1990) −5 0 5 10 15 20 25 30 35 −22 −21.8 −21.6 −21.4 −21.2 −21 −20.8 −20.6 −20.4 −20.2 −20 dBP (Si ,Smfe ) ∆G(kcal/mol) G C G G A U U U A G CU C A G U U G G G A G A G C G C C A G A C U G A A G A U U U G G AG G U C C U G U G U U C G A U C C A C A G A A U U C G C A G C G G A UUU A GCUC AGU U G G G A G A G C G C C A G A C U G A A GA U U U G GAGG U C C U G U G U U C G AUC CACAG A A U U C G C A G C G G A U U UA G C UCAGUUG GGAG A G C G C C A G A C U G A AGAU U U G G A G G U C C U G U G U UC GAUC CA CA G A A U U C G C A Biological Suboptimal MFE Wuchty et al. (1999) Complete suboptimal folding of RNA and the stability of secondary structures, Biopolymers. Paul Gardner RNA bioinformatics
  • 19. Accuracy of MFE predictions Non-independant benchmarks: Walter et al. (1994) Mean sensitivity 63.6 Mathews et al. (1999) Mean sensitivity 72.9% Independant benchmarks: Doshi et al. (2004) Mean sensitivity 41% Dowell & Eddy (2004) Mean sensitivity 56% Mean PPV 48% Gardner & Giegerich (2004) Mean sensitivity 56% Mean PPV 46% Data-sets: tRNA, SSU rRNA, LSU rRNA, SRP, RNase P, tmRNA. Paul Gardner RNA bioinformatics
  • 20. Limitations of MFE predictions Energy parameters: estimated at constant salt concentrations and temperatures. Energy model: models of loop energies are extrapolated from relatively few experiments, no pseudoknots, ... Cellular environment: contains proteins, RNAs, DNAs, sugars, etc Post-transcriptional modifications: many functional RNAs have been covalently modified. Folding kinetics: RNAs fold along “pathways”, perhaps becoming trapped in sub-optimal conformations. Co-transcriptional folding: RNAs fold during transcription, the transcriptional apparatus occludes 3’ portions of the sequence. Transcription is jerky: transcriptional pausing can influence folding. Paul Gardner RNA bioinformatics
  • 21. Comparative sequence analysis Input: a set of sequences with the same biological function which are assumed to have approximately the same structure. Output: the common structural elements, aligned sequences and a phylogeny which best explains the observed data. 2 4 5 3 1 >1 GCAUCCAUGGCUGAAUGGUUAAAGCGCCCAACUCAUAAUUGGCGAACUCGCGGGUUCAAUUCCUGCUGGAUGCA >2 GCAUUGGUGGUUCAGUGGUAGAAUUCUCGCCUGCCACGCGGGAGGCCCGGGUUCGAUUCCCGGCCAAUGCA >3 UGGGCUAUGGUGUAAUUGGCAGCACGACUGAUUCUGGUUCAGUUAGUCUAGGUUCGAGUCCUGGUAGCCCAG >4 GAAGAUCGUCGUCUCCGGUGAGGCGGCUGGACUUCAAAUCCAGUUGGGGCCGCCAGCGGUCCCGGGCAGGUUCGACUCCUGUGAUCUUCCG >5 CUAAAUAUAUUUCAAUGGUUAGCAAAAUACGCUUGUGGUGCGUUAAAUCUAAGUUCGAUUCUUAGUAUUUACC ** * 1 GCAUCCAUGGCUGAAU-GGUU-AAAGCGCCCAACUCAUAAUUGGCGAA-- 2 GCAUUGGUGGUUCAGU-GGU--AGAAUUCUCGCCUGCCACGCGG-GAG-- 3 UGGGCUAUGGUGUAAUUGGC--AGCACGACUGAUUCUGGUUCAG-UUA-- 4 GAAGAUCGUCGUCUCC-GGUG-AGGCGGCUGGACUUCAAAUCCA-GU-UG 5 CUAAAUAUAUUUCAAU-GGUUAGCAAAAUACGCUUGUGGUGCGU-UAA-- **** * ** 1 ------------------CUCGCGGGUUCAAUUCCUGCUGGAUGC-A 2 ------------------G-CCCGGGUUCGAUUCCCGGCCAAUGC-A 3 ------------------G-UCUAGGUUCGAGUCCUGGUAGCCCA-G 4 GGGCCGCCAGCGGUCCCG--GGCAGGUUCGACUCCUGUGAUCUUCCG 5 ------------------A-UCUAAGUUCGAUUCUUAGUAUUUAC-C S M A D M Y MUR SYUC A MY- G G Y u a A V M M M R M H C R MY U S H V R H K C V R c K W A - - - - - c c - c c a - c - - - c c c -V-YS Y R R G U U C R AY U CCYRS Y M D M Y V M c V Paul Gardner RNA bioinformatics
  • 22. Comparative sequence analysis Evolution of RNA sequences Base-pairs that covary have strong evolutionary support U A C A A G A G U G C G U U U A A G U AY R Y A A S M G U S C G Y K K A A G Y RY A U A A N A D U G C G U U G A A G U R c b (((..(((....)))..))) (((..(((....)))..))) (((..(((....)))..))) (((..(((....)))..))) UACAAGAGUGCGCUUAAGUA UGCAAAAGUCCGUUUAAGCA UAUAACCUUUCGAGGAAAUA CAUAAUAAUGCGUUGAAGUG a MIS YAUAANADUGCGUUGAAGURAncestral UACAAGAGUGCGUUUAAGUA YRYAASMGUSCGYKKAAGYR consensus consensusAncestral MIS G U A U G C U G C G U A fast fast slow Paul Gardner RNA bioinformatics
  • 23. Alignment Folding: RNAalifold Generate an alignment (e.g. with ClustalW) Find a consensus structure that is both energetically stable in all sequences and has covariation support G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A G C G G A A U U A G C U C A G U U _ G G G A G A G C G C C A G A C U G A A A A U C U G G A G G U C C C C _ G G U U C G A A U C C C G G A A U C C G C A GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA GCGGAAUUAGCUCAGUU_GGGAGAGCGCCAGACUGAAAAUCUGGAGGUCCCC_GGUUCGAAUCCCGGAAUCCGCA G C B K M W WU A GCUC A GU u - G G K A G A G C R Y Y W S A Y U K A W R A U C W R RAKG u C S C S -R G U U C G AWY CYSKB W W U S S G C A UA Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  • 24. Alignment Folding: RNAalifold RNAalifold: energy + covariation. βi,j = 1 N N α Zα i,j − Cov Ci,j = 2 N(N − 1) bα i bα j ,bβ i bβ j DH(bα i bα j , bβ i bβ j )Πα ij Πβ ij Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  • 25. Covariation metrics Lindgreen, Gardner & Krogh (2006) Measuring covariation in RNA alignments: physical realism improves information measures. Bioinformatics. Paul Gardner RNA bioinformatics
  • 26. Rfam: annotation hierarchy Types Clans Families Sequences ribozyme tRNA CD-box_snoRNA splicing thermoregulator leader HACA-box_snoRNA scaRNA Intron IRES frameshift_element sRNA riboswitch antisense rRNA miRNA CRISPR Cis-reg. Gene snRNA snoRNA Intron Types Paul Gardner RNA bioinformatics
  • 27. Building an Rfam family A structure from literature An Rfam family: produced manually from publication figures Paul Gardner RNA bioinformatics
  • 28. An example Rfam entry Paul Gardner RNA bioinformatics
  • 29. Relevant reading Reviews: Eddy SR (2004) How do RNA folding algorithms work? Nature Biotechnology. Methods: Hofacker et al. (2002) Secondary Structure Prediction for Aligned RNA Sequences, J.Mol.Biol. Paul Gardner RNA bioinformatics
  • 30. The End Paul Gardner RNA bioinformatics