SlideShare ist ein Scribd-Unternehmen logo
1 von 19
AA.AG@Helix.Ends: A:A and A:G Base-pairs at the
Ends of 16 S and 23 S rRNA Helices
Tricia Elgavish1
, Jamie J. Cannone2
, Jung C. Lee3
, Stephen C. Harvey1
and Robin R. Gutell2
*
1
Department of Biochemistry
and Molecular Genetics
University of Alabama at
Birmingham, Birmingham
AL 35294, USA
2
Institute for Cellular and
Molecular Biology, University
of Texas at Austin, 2500
Speedway, Austin, TX 78712-
1095, USA
3
Division of Medicinal
Chemistry, College of
Pharmacy, University of Texas
at Austin, Austin
TX 78712, USA
This study reveals that AA and AG oppositions occur frequently at the
ends of helices in RNA crystal and NMR structures in the PDB database
and in the 16 S and 23 S rRNA comparative structure models, with the G
usually 3H
to the helix for the AG oppositions. In addition, these opposi-
tions are frequently base-paired and usually in the sheared conformation,
although other conformations are present in NMR and crystal structures.
These A:A and A:G base-pairs are present in a variety of structural
environments, including GNRA tetraloops, E and E-like loops, interfaced
between two helices that are coaxially stacked, tandem G:A base-pairs,
U-turns, and adenosine platforms. Finally, given structural studies that
reveal conformational rearrangements occurring in regions of the RNA
with AA and AG oppositions at the ends of helices, we suggest that
these conformationally unique helix extensions might be associated with
functionally important structural rearrangements.
# 2001 Academic Press
Keywords: ribosomal RNA structure; comparative sequence analysis; A:A
and A:G base-pairs (non-canonical pairs); structure motifs; computational
biology/bioinformatics (coaxial stacking)*Corresponding author
Introduction
Our ultimate goal is to accurately predict RNA
secondary and tertiary structure from its sequence.
To begin to achieve this objective, we need a
detailed set of RNA structure rules and principles
that relate sequences to small structural elements
as well as to global structure. Given that the num-
ber of possible secondary structures for an RNA
sequence is very large (http://www.rna.icmb.utex-
as.edu/METHODS/) and the current set of RNA
structure principles within the best of the RNA
folding algorithms1,2
are not adequate to achieve
these goals,3,4
we have utilized comparative
sequence analysis5,6
to identify those base-pairs
that would form similar structures for a set of
sequences considered to be structurally and func-
tionally equivalent. Traditionally, we have
searched for positions in a sequence alignment
with similar patterns of variation (also called co-
variation). Due to the strong congruence between
these covariation-based comparative structure
models and crystal structure solutions7
(Gutell
et al., unpublished results), we are very con®dent
in the authenticity of these proposed base-pairs.
While the majority of the positions that covary
with one another are associated with secondary
structure base-pairs, there are a few short- and
long-range tertiary interactions in the rRNAs8
(CRW Site; see Materials and Methods). We now
aspire to predict additional base-pairings at the
positions that are not base-paired in the covaria-
tion-based structure models. These base-pairs
would add more secondary structure to the current
comparative structure models and fold this model
into a three-dimensional structure.
Both of these latter aspirations will require a
different type of comparative sequence analysis
that goes beyond simple covariation analysis.
Operationally, we de®ne comparative sequence
analysis as the general method that identi®es struc-
tures that are common to different sequences,
while covariation analysis is the method that ident-
i®es positions in a sequence alignment with similar
patterns of variation. Covariation analysis will
identify a subset of the total number of base-pairs
that are in common to different sequences. While
this latter type of analysis identi®es structurally
E-mail address of the corresponding author:
robin.gutell@mail.utexas.edu
Abbreviations used: PDB, Protein Data Bank.
doi:10.1006/jmbi.2001.4807 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 310, 735±753
0022-2836/01/040735±19 $35.00/0 # 2001 Academic Press
isomorphic base-pairs (e.g. A:U, G:C, C:G, and
U:A) from the identi®cation of positions with simi-
lar patterns of variation in a sequence alignment, it
is possible to form isomorphic base-pair confor-
mations from two positions that have different pat-
terns of variation. To identify these, we need to
know, a priori, the base-pair exchanges (e.g. G:U to
G:C or A:G to A:A) that will form isomorphic
base-pair conformations within a speci®c structural
context. A few years ago, we developed a compu-
ter program that would return the isomorphic
base-pair conformations that are possible for any
known set of pairing types.9
However, this system
by itself will not help us to identify new base-pairs
at positions with no matching pattern of variation
since, without additional information, we do not
know which positions to base-pair. Ultimately, we
need to have a larger set of structural constraints
that will help us decipher the unique patterns of
variation into isomorphic structures.
Beyond the canonical base-pairs (A:U, G:C, G:U)
that are arranged into the standard secondary
structure helices and tertiary interactions, several
other RNA structural motifs have been identi®ed
with a sequence analysis perspective.5,6,8,10
These
include tetraloops,11
lone-pair tri-loops,8
pseudoknots,6,12,13
dominant G:U base-pairs,14
tan-
dem G:A base-pairs,15
E-loops,15± 17
U-turns,18
base
triples,19,20
tetraloop receptors,21 ±23
adenosine
platforms,24,25
and base-pairs arranged in parallel.6
A structural perspective of these RNA motifs is
presented in two recent reviews.26,27
In addition to the comparative sequence analysis
of these RNA motifs, it was ®rst observed in the
early 1980's that helices in Escherichia coli 16 S
rRNA were frequently ¯anked by AG
oppositions.28,29
Consistent with this observation, it
was observed that the majority of the 3H
ends of
loops are an adenosine while the 5H
ends of loops
are an adenosine or guanosine in the covariation-
based 16 S and 23 S rRNA structure models.25
An AG opposition (where an opposition refers
to two bases on opposite strands at the end of a
helix that are in proximity with one another) at
positions 1056:1103 (E. coli numbering) is base-
paired in the crystal structure for the L11 binding
fragment of 23 S rRNA.30
Position 1056 is a G in
the majority of the Bacteria, Archaea, and chloro-
plasts, while it is an A in the majority of the Eucar-
ya. Position 1103 is an A in nearly all of the
Bacteria, Archaea, Eucarya, and chloroplasts. Thus,
from a comparative perspective, we expect the
majority of the Eucarya with an A at position 1056
to form an A1056:A1103 base-pair. The experimen-
tal support for this A:G base-pair, in addition to
the earlier AG sightings at the ends of E. coli 16 S
rRNA helices and the bias for unpaired As and Gs
at the ends of helices, suggested that many helices
in the rRNAs might be ¯anked with A:G and A:A
base-pairs. During the preparation of this manu-
script, high-resolution crystal structures were
determined for the 30 S and 50 S ribosomal sub-
units.31± 33
Our objectives for this paper are: (1) to
identify the conserved AA and AG oppositions at
the helix ends in the comparative structure models
for 16 S and 23 S rRNA, (2) to determine if AA
and AG oppositions are base-paired in all RNA
crystal and NMR structures that contain an AA or
AG at the end of a standard helix, and (3) to deter-
mine the conformations for these A:A or A:G base-
pairs.
Results
Comparative sequence analysis of the ends of
rRNA helices
The nucleotide frequencies at the positions ¯ank-
ing the ends of all helices in our 16 S and 23 S
rRNA alignments (see Materials and Methods and
CRW Site) were determined for the nuclear
encoded rRNAs from the three major phylogenetic
groups (Bacteria, Archaea, and Eucarya) and the
two Eucarya organelles (chloroplasts and mito-
chondria). Only helix ends in the Bacteria with an
AA, AG, or AA/AG in more than 90 % of the
sequences were scored as candidates. Since
approximately 90 % of the AG oppositions have
the G 3H
of the helix, we have focused on this orien-
tation in this manuscript and in Table 1. However,
a small number (eight in rRNA and 14 in the PDB
structure database) of examples of AG oppositions
where the G is 5H
to the helix are discussed below.
All oppositions were subdivided into two cat-
egories: invariant and exchange. Invariant sites
contain only AA or AG in the Bacterial alignment,
while sites with both types of pairings (where the
minimum for each pairing is 2 %) in at least one of
the primary alignments (Archaea, Bacteria, Eucar-
ya nuclear, chloroplast, or mitochondrial) were
classi®ed as exchanges. These oppositions are
mapped onto the December 1999 version of the
E. coli 16 S and 23 S rRNA covariation-based struc-
ture models (Figure 1; CRW Site). The base-pair
frequencies for each of the AA and AG sites for
each of the 16 S and 23 S alignments (Archaea,
Bacteria, Eucarya nuclear, chloroplast, and mito-
chondrial) are all available at our web site, CRW
AA.AG (see Materials and Methods).
There are 139 oppositions (as de®ned above) in
the 16 S and 263 oppositions in the 23 S rRNA
comparative structure models. In the hypothetical
world where the frequency of each of the four
nucleotides is 25 % at paired and unpaired pos-
itions and there is no bias for any nucleotide pairs
at these positions, for each opposition, we expect a
12.5 % (2/16) chance of ®nding an AA or AG.
Thus, for any one rRNA sequence, we expect,
based upon this random sampling, there to be
approximately 17 (139 Â 0.125) sites in 16 S and 33
(263 Â 0.125) sites in 23 S rRNA with an AA or AG
opposition at the end of a helix (referred to hereun-
der as AA.AG@helix.ends). The expected number
of AA and AG sites that occur at the same pos-
itions in 90 % of 5850 Bacterial 16 S sequences is
1.7 Â 10À4755
, and for 325 Bacterial 23 S rRNA
736 A:A and A:G Base-pairs at the Ends of RNA Helices
sequences the number is 7.0 Â 10À265
. Thus, we
conclude that the odds of ®nding the same pattern
in 90 % of the sequence sets by random chance are
extremely low; however, 30 % of the oppositions at
the ends of 16 S rRNA helices (42 of 139) and 28 %
of the oppositions at the ends of 23 S rRNA helices
(73 of 263) have an AA or AG opposition in at
least 90 % of the sequences.
Since the 1056:1103 base-pair in 23 S rRNA has a
signi®cant number of AA and AG oppositions
with a minimal number of alternative base-pairs,
we have ¯agged this base-pair, along with other
similar positions that also have a more signi®cant
extent of A:A and A:G pairings. These sites are
shown in Figure 1 with red and green asterisks on
the 16 S and 23 S rRNA secondary structure dia-
grams and within the AA/AG base-pair frequency
tables (CRW AA.AG Online Table 4). The red
asterisk sites contain only AA and AG in all of the
Archaea, Bacteria, Eucarya nuclear and chloroplast
alignments, with a minimum number of excep-
tions. The 23 S rRNA 1056:1103 site contains sig-
ni®cant amounts of AA/AG pairings in nearly all
of the non-mitochondrial sequences; only a few
sequences out of 582 do not have an AA or AG.
The other red asterisk sites in 23 S rRNA are
627:636 and 2126:2162; sites with comparable
nucleotide frequencies in 16 S rRNA are 780:802,
888:909, 959:976, 1408:1493, 1417:1483, and
1418:1482.
The green asterisks (Figure 1; CRW AA.AG)
reveal those sites with signi®cant amounts of AA/
AG exchanges with a minimal amount of other
oppositions in at least one alignment while at least
one other alignment contains a larger number of
exceptions to the pure AA/AG exchange pattern.
Green sites in the 16 S rRNA are: 26:557, 60:107,
197:220, 447:487 (with a large percentage of Wat-
son-Crick/G:U base-pairs in the Archaea), 691:696,
860:869, 1157:1179, and 1304:1333. Green asterisk
sites in 23 S rRNA are 244:254, 463:466, 602:655,
603:625, 637:651 (with a large percentage of Wat-
son-Crick base-pairs in the Archaea), 861:916,
945:972, 975:988, 1000:1155, 1354:1377, 1655:2005,
1791:1828, 2125:2173, 2199:2224, 2287:2345,
2346:2371, 2358:2429, 2587:2607, and 2639:2775.
Orientation of the AG oppositions
There are two orientations possible for AG oppo-
sitions relative to the helix to which they are adja-
cent: the G can be 5H
or 3H
to the adjacent helix. The
analysis of an early version of the E. coli 16 S
rRNA comparative structure model revealed that
Table 1. Distribution of AA/AG oppositions (with G 3H
to helix for AG oppositions) in the bacterial 16 S and 23 S
rRNA comparative structure models
Loop type Hairpin Internal Multi-stem
Opposition C[ ‡,ù, À ]a
[S,I,O]b
C[ ‡,ù, À ]a
[S,I,O]b
C[ ‡,ù, À ]a
[S,I,O]b
Coc
Crd
(%)
16 S rRNA
Invariant 7[7,0,0] [7,0,0] 9[6,0,3] [3,2,1] 5[4,0,1] [0,2,2] 21 17 (81%)
AA 0[0,0,0] [0,0,0] 5[2,0,3] [0,1,1] 1[1,0,0] [0,1,0] 6 3 (50%)
AG 7[7,0,0] [7,0,0] 4[4,0,0] [3,1,0] 4[3,0,1] [0,1,2] 15 14 (93%)
Exchange 2[2,0,0] [2,0,0] 10[9,1,0] [7,1,1] 9[4,0,5] [2,0,2] 20 15 (75%)
Total 9[9,0,0] [9,0,0] 19[15,1,3] [10,3,2] 14[8,0,6] [2,2,4] 41 32
% xtal.str.e
9/9ˆ100% 15/18ˆ83% 8/14ˆ57% 32/41ˆ78%
23 S rRNA
Invariant 11[9,2,0] [9,0,0] 13[10,2,1] [9,0,1] 13[6,1, 6] [5,1,0] 32 25 (78%)
AA 0[0,0,0] [0,0,0] 4[1,2,1] [0,0,1] 4[0,0, 4] [0,0,0] 6 1 (17%)
AG 11[9,2,0] [9,0,0] 9[9,0,0] [9,0,0] 9[6,1, 2] [5,1,0] 26 24 (92%)
Exchange 4[2,1,1] [2,0,0] 12[8,3,1] [8,0,0] 20[9,6, 5] [7,0,2] 26 19 (74%)
Total 15[11,3,1] [11,0,0] 25[18,5,2] [17,0,1] 33[15,7,11] [12,1,2] 58 44
% xtal.str.e
11/12ˆ92% 18/20ˆ90% 15/26ˆ58% 44/58ˆ76%
rRNA Total 24[20,3,1] [20,0,0] 44[33,6,5] [27,3,3] 47[23,7,17] [14,3,6] 99 76
% xtal.str.e
20/21ˆ95% 33/38ˆ87% 23/40ˆ58% 76/99ˆ77%
S: 20/20 (100%) S: 27/33 (82%) S: 14/23 (61%) S: 61/76 (80%)
I: 3/33 (9%) I: 3/23 (13%) I: 6/76 (8%)
O: 3/33 (9%) O: 6/23 (26%) O: 9/76 (12%)
a
C, number of predicted base-pairings based on the bacterial structure; ‡, number of predicted pairings in the crystal structure;
ù, number of predicted pairings for which there is no homologous structure in the crystal structures (see the text for details); À,
number of predicted pairings that are not present in the crystal structure.
b
Conformation of the base-pair: S, sheared; I, imino or imino-like; O, other.
c
Co, the total number of homologous base-pairs from that category in the comparative structure model.
d
Cr, the total number and percentage of base-pairs in the crystal structure.
e
The percentage of base-pairs predicted with comparative analysis that are present in the crystal structure [`` ‡ ``/(``C''-``ù``)].
Percentage of the base-pairs having the conformation: S, sheared; I, imino; O, other.
A:A and A:G Base-pairs at the Ends of RNA Helices 737
the G tends to be at the 3H
end of the helix.28
Our
analysis here of the most recent versions of a large
number of phylogenetically diverse 16 S and 23 S
rRNA comparative structure models is consistent
with this earlier result. Of the invariant AG and
AA/AG oppositions that ¯ank a helix, approxi-
mately 87 are oriented with the G 3H
to the helix,
while eight AG oppositions have the G 5H
to the
helix. This result, as discussed later, is consistent
with the types and frequencies of A:A and A:G
base-pair conformations present in the crystal
structures.
10
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
5’
3’
I
II
III
*
*
* *
*
*
*
*
*
*
**
*
*
*
*
* *
*
*
*
a
A
A
A
U
U
G
A
A
G A G U U
U G
A
UCAUGGCUCAG
A
U
U
GA
A
C
G
C
U
GG
C
G
G
C
A
G
G
C
C
UA
AC
A
C A
U
G
C
A
A
G U C
G A
A C G G U
A A
C A G G A A G A A G C
U
U
GCUUCUUU
G
CUGAC
G
AGUGGC
G
G
A
CGG
G
U
G
A
G
U
A
A
UG
U
C
U
G
G
G
A
A
A
C
U
G
C
C
U
G
A
U
G
G
A G G G G
G A U A
A C U A C U G G
A
A
ACGGUAGC
U
AAU
A
CCGC
A
U
A
A
C
G
U
C
G
CA
A
G
A
C
C
A
A
A
GAGGGG
GA
CCU
U
C
G G G C C U C U U G
C
C
A
U
C
G
G
A
U
G
U
G
C
C
C
A
G
A
UG
G
G
A
UU
A
G
C
U
A
GU
A
G
G
U
G
G
G
G
UA
A
C
G
G C
U
C
A
C
C
U
A
G
G
C
G
A
C
G A
U
C
C
C
U
A
GCUG
GUCU
G
A
G A
GGA U
G A
C
C A GC C
A
C
A
CUGGAA
CUG
A
G
A
CA C G
G U C C A G
A
C
U
C
C
U
A
C G
G
G
A
G
G C A G
C
A
G
U
G
G
G
G
A
A
U
AU
U
GCA
CAA
UGGGCG
C
A
A G C C U G A U G C A GC
C
A U
G
C
C
G
CGUGUAU
G
AAGA
A
GGCCU
U
C
G G G U U
G
U A A
A
G U A C
U
U
U
C
A
G
C
G
G
GG
A
G
GAA
G
G
G
A
G
U
A
A
A
GU
U
A
A U A
C
C
U
U
U
G
C
U
CA U
U
G
A
C G U
U
A
C
C
C
G
C
A
G
A
A
G
A
AG
C
A
C
CGGC
UA A C
U
C
C
G
ψ
G
C
C
A
G
C
A
G C C
G
C G
G
U
A
A
U
AC
G
G
A
G
G
G
U
G
C
A
A
G
C
G
U
U
A
A
U
C
G
G
A
A
U
U
A
C
U
G G
G
C
GU
A
A
A
G
C
G
C
A
CG
CA
G
G
C
GGUUUGUU
A
AGUCAGAUGUG
A
AA
U
CCCCGGGCU
C
A A C C U G G G A
A C
U G C A U C U G A
U A
C U G G C A A G C
U
U
G
A
G
U
C
U
C
G
U
A
G
A
G
G
G
G
G
G
U
AGAAUUCCAGGU
GUA
GCGGU
G
A
A A U G C
G
U
A G
A
G
A U C U G G A G G A A U
A
C C
G
G
U G
G C G
A
A
GGCG
G
C
C
C
C
C
U
G
G
A
C
G
A
A
G
A
C
U
G
A
C
G
C
U
C
A
G
G
U
G
C
G
A
A
A
G
C
G
U
G
GG
G
A G
C
A
A
A
C
A
G
G
A
U
U
A G A
U
A
C
C
C
U
G
G
U
A
G
U
C
C
A
C
G
C C G U
A
A
A
C
G
AU
G U C G A C U U G
G
A
G
G
U
U
G
U
G
C
C
C U U
G
A
G
G
C
G
U
G
G
C
U
U
C
CG
G
A
G
C
U
A
AC
G
CGU
U
A
A
GUCGAC
C
G
C
C
U
G G G
G
A
G U
A
C
G G C C G
C
A
AGGUU
AAAA
CUC
A
A A
U G A A U U G A C G
G
G G G C C C G
C
A C A A G
C
G
G
U
G
G
A
G
C
A
U
G
U
G
G
UU
UAAU
U
C
G
A
UGC
A
A
C
G C
G
A
A
G
A
A
C C U U
A
C
C
U
G
G
U
CU
U
GA
C
A
U
C
C
A
C
G
GAAGUUUUCAG
A
G
A U G A G A A U G
U
G
C
C
U
U C
G
G
G
A
A
C
C
G
U
GA
G
A
C A
G
G
U
G
C
U
GC
A U
G
G
C
U
G
U
C
G
U
C
A
GCUCGUG
U
U
G
UG
A
A
A
U
G
U
U
G
G
G
U
U
A A
G
U
C
C
C
G C
A
A C G A G C
G
C A A
C
C C U U A U C C U U U G U U G C C
A G
C G G U C
C
G
GCCGGG
AACU
CAAAGGA
G
A
C
U
G
C
C
A
G
U
G
AUA
A
A
C
U
G
G
A
G
G
A
A
G
G
UGGGGA
U
G
A
C
G
U
C
A
A
G
U C
A
UC
A
U
G
G
C
C
C
U
U
A
CG
A
C
C
A
G
G
G
C
U
A
C
A
C
A
C
G
U
G
C
U
A
C A A
U G
G
C
G
C
A
U
A
C
A A A G
A
G
A
A G
C
G
A C C
U
C
G C
G
A
G
A
G
C
AA
G
C
G
G
AC
C
U
C
A
U
AAAG
U
G
C
G
U
C
G
U
A
G
U
C
C
G
G
A
U
U
G
G
A
G
U
C
U
G
C
AAC
U
C
G
A
C
U
C
C
A
U
G
A
A
GU
C
G
G
A
A
U
C
G
C
U
A
G
U
A
A
U
C
G
U
G
G
A
U
C
A
GAA
U
G
C
C
A
C
G
G
UG
A
A
U
A
C
GU
U
C
C
CGGGCCUUGU
A
CA
C
A
C
C
G
C
C
C
G
U
C
A
C
A
C
C
A
U
G
G
G
A
G
U
G
G
G
U
U
G
C
A
A
A
A
G
A
A
G
U
A
G
G
U
A
G
C
U
U
A
A
C
C
U
U C
G
G
G
A
G
G
G
C
G
C
U
U
A
C
C
A
C
U
U
U
G
U
G
A
U
U
C
A
U
G
A
C
U
G
G
G
G
U
GA
AG
U
C
G
U
A
A
C
A A
G
G
U A A C C G U A G G G
G
A
ACCUGCGGUUG
G
A
U
C
A
C
C
U
C
C
U
U
A
Figure 1 (legend shown on page 741)
738 A:A and A:G Base-pairs at the Ends of RNA Helices
II
III
50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
1550
1600
1640
2900
5’ 3’
3’ half
m1
m
5
(2407-2410)
(2010-2011)
(2018)
(2057/2611 BP)
(2016-2017)
(2012)
*
*
*
*
*
*
*
*
*
*
*
*
**
*
*
*
*
*
*
*
*
*
b
G
G
U
U
A
A
G
C
G
A
C
UAAG
C
G
U
A
C
A
C
G
G
U
G
G
A
U
G
C
C C
U
G G C A G U C A G A G
G
C
G
A
U
G
A
A
G
G
AC
G
U
G
C
UA
A
U
C U
G
C
G
A
U
A
A
G C
G
U
C
G
G
U
A
A
G
G
U
G
A
U
A
U
G
A
A
C
C GU
U
A
UAA
C
C
G
G
C
G
A
U
U
U
C
C
G
A A U G
G
G
G
A A
A
C
C
C A
G
U
G
U
G
U
U
U C
G
A
C
A
C
A
C
U
A
U
C
A
U
U
A
A
C
U
G
A A U
C
C
A
U
AG
G
U
U
A
A
U
G
A
G
G
C
G
A
A
C C G G G G
G A A C
U
G A A
A
C
AUC
UAAGU
A
CCCCGA
G
G
A
A
A
A
G
A
A
AU
C
A
AC
C
G
AGAU
U
C
C
C
C C
A
G
U
A
G
C
G
G
CG
A
G
CG
A
A
C
G
G
G
G
A
G
C
A
G
C
C
C
A
G A G C
C
U G A A
U
C A G U G U G U G U G U U A G U G
G
A
A G
C
G
U
C
U
G
G AA
A
G
G
C
G
C
G
C
G A
U
AC
A
G
G
G
U
G
ACA
G
C
C
C
CG
U
A
CAC
AAA
AAUGCACAUGCUG
UGA
GCUCGAUGA
G
U
A
G
G
G
C
G
G
G
A
C
ACG
U
G
G
U AU
C
C
U
G
U
C
U
G
A
A
U
A
U
G
G
G
G
G
G
A
C C A
U
C
C
U
C
C A A
G
G
C
U
A
A
A
U
A
CU
C
CUGACUG
A
CC
G
A
U
A
GUGAACC
A
G
U
A
CCG
U
G
A G G
G
A
A A G
GCGAAAAGAACCCCG
G
C
G
A G G G GA GU GAA A A A GAA CC
U
G
A
A
A
C
C
G
U
G
U
A
C
G
UACAAGCA
G
U
G
G
G
A
G
C
A
C
G
C
UU
A
G
G
C
G
U
G
U
G
A
C
U
G
C
G
U
A C C U UU
U
G
U
AUA
AUGG
GUCAGC
G
A
C
UU
A
U
A
U
U
C
U
G
U
A
G
C
A
A
G G U U
A A
C C G A
A
U
AGG
GG
AGCC
G
A
AG
G
G
AA
A
C
C
G
AGUCUUA
A
C
U G G G C G
U
U
A A G
U
U
G
C
A
G
G
G
U
A
U
AG
A
C
C
CG
A
A
AC
C
C
G
G
U
G
A
U
C
U
A
G
C
C
A
U
G
G
G
C
A
G G U U
G A A
G G U U G G G U
A
A
CACUAACU
G
GA
G
GACC
GAA
C
C
G
AC
U
A
A
U
G
ψU
G
A
A
A
A A
U
U
A
G
C
G
G
A
U
G
A
C
U
U
G
U
G
G
C
U
G
G
G
GGU
GA
A
A
G GC C
A
A
U
C A AA
C
C
G
G
GA
G
A
UA G
C
UG
G
U
U
CUCCCC
G
A
A
A
G
C
U
A
U
U
U
AG
G
U
A
G
CGC
C
U
C
G
U
G
A
A
UU
C
A
U
C
U
C
C
G
G
G
G
G
U
A
G
A
G
C
A
CU
G
U
U
U
C
G
G
C
A
AG
G
G
G
G
U
C
A
UC
C
C
G
A
C
U
U
A C
C
A
A
C
C
C
G
A
U
G
C
A
A
A
C
U
G C
G
A
A
U
A
C
C
G
G
A
G
A A
U
G
U
UA
U
C
A
C
G
G
G
AG
A
C
A
CACGGCGGGψGC
U
A
A C G U C C G U C G U G
A
A
G
A
G
G
G
A
A
A
C A
A
C
C
C
A G A C
C
G
C
C
A
G
C
U
A
A
G
G
UCC
C
A AA G
U C
A
U
G
G
U
U
A
A
G
U
G
G
G
A
A
A C
G
A
U
G
U
G
G
G
A
A
G
G
CCC
A
G
A
C A G
C
C
A
G
G
AUGUUGGC
UUA
G
A
A
G C A
G C C A U C A U U
U
A
A
A G
A
A
A
G C
G U
A
A
UA
GCUC
A
C
U
G
G
U
C
G
A
G
U
C
G
G
C
C
U
G
C
G
C
G G A
A
G
A
U
G
U
A
A
C
G
G
G
G
CUAAA
C
C
A
U
G
C
A
C
C
G
A
A
G
C
U
G
C
G
G C
A
G
C
G
A
C
G
C
U U
A
U
G
C
G
U
U
G
U
U
G
G
G
U
A
G G G G A G
C
G
U
U
C
U
G
U
A
A
G
C
C
U
G
C
G
A A G
G
U
G
U
G
C
U
G U
G
A
G
G
C
A
U
G
C
U
G
G
A
G
G
U
A
U
C
A
G
A
AG
U
G C
G
A
A
U
G C U G A C
A
U
A
A
G
U
A
AC
G
A U A A A
G
C
G
G
G
U
G
A
A A
A
G
C
C
C
G
C
U C
G
C
C
G
G
A
A
G
A
C
C
A
A
G
GGUUCCUGUC
CAA
CGU
U
A
A U C G G G G C A G G
G
U
G
A
GU C
G
A
CCCC
UAA
GGC
G
A
G
GCCG
A
A
A G G C
G
U
A
G U C
G A U
G G
G
A
A A
C
A
G
G
U
U
A
A U
A
U
U
C
C
U
G
U
AC
U U G G U G U U A C U G C
G A
A G G G G G
G
A C
G
G
A
G
A
A
G
G
C
U
A
U
G
U
U
G
GCCGGG
CGA
C
G
G
U
U G U
C C C G G U
U
U
A
AGCGU
GUA
GGCUGGUUUUCC
A
GGCA
A
A
U C C G G A A A A U C
A A
G G C U
G A G
G C G U G
A
U
G
A C
G A G G C A C U
A
C
GGUGCUGAAGC
A
A
C
A
A
A
U
G
C
C
C
U
G
C
U
U
C
C
A
G
GAAA
A
GCCUCUAAGC
A
UC
A
GGUAACAUCAAA
U
C
G
U
A
C
CC
CA
A
A
C C G A
C
A
CAGGUG
G
U
C A
G G U A G
A
G
AAUACC
A
AG
G
C
G C
G
C
U
U
A
A
C
C
U
U
Figure 1 (legend shown on page 741)
A:A and A:G Base-pairs at the Ends of RNA Helices 739
IV
V
VI
5’
3’
1650
1700
1750
1800
1850
1900
1950
2000
2050
2100
2150
2200
2250
2300
2350
2400
2450
2500
2550
2600
2650
2700
2750
2800
2850
2900
5’ half
m
(1269-1270)
(413-416)
(1262-1263)
(746)
(531)
(1268)
*
*
* *
*
*
*
*
*
*
*
*
*
c
G
G
U
U
A
A
G
C
U U
G
A
GA
G
A
A C
U
C
G
G
G
U
G
A
A
G
GAACUAGGCAAAAUGGUGCC
GUA
ACU
U
C
G G G
A G A A
G G C A C
G
C
U
G
A
U
A
U
G
U
A
GG
U
G
A
GG
U
C
C
C
U
C G
C
G
G
A
U
G
G
A
G
C
U
G
A
A
A
U
C
A
G
U C
GA A
G A U A C C A G C
U
G
G
C
U
G
C
A
A
C
UGU
UUA
U
U
A
A A A
A C A
C
A
G
C
A
C
U
G
U
G
C
A
A
A
C
A
C
G
A A
A
G
U
G
G
A
C
GU
AU
A
C
G
G
U
G
U
G
A
C G C C
U
G
C
CC
G G
U
G
C
C
G
GA
A G
G
U
U
A
A
U
U
G
A
U
G
G
G
G
U
U
A
G
C
G
C A
A
G
C
G
A
A
G
C
U
C
U
U
G
A
U
C
G
A
A
G
C
C
C
C
G
G
U A
AA
C
G
G
C G
G
C
C
G
ψ
A
A
C ψ
A
ψ
A
A
C
G
G
U
C C
U A
A
G
G
U
A
G
C
G
A
A
A
U
U
CCUUG
U
C
G
G
G
U
AAG
U
U
C
C
G
A
CCU
G
C
A
C
G
A
A
U
GGCG
U
A
AU
GA
U
G
G
C
C
A
G
G
C
U
G
U
C
U
C
C
A
C
C
C
G
A
G
A
C
U
C
A GU G A A A
U
U
G
A
A
C
U
C GC U G
UG A
A
G
A
UGCAGUG
U
A
C C C G C G G C
A
A G A C G G
A
A
A
G
A C
C
C
C
GU
G
A
A
C
C
U
U
U
A
C
U
A
U
A
G
C
U
U
G
A
C
A
C
U
G
A
A
C
A
U
U
G
A
G
C
C
U
U
G
A
U
G
U
G
U
A
G
G A U
A
G G U G G
G
A G
G
CU
U
U
G
A
A G
U
G
U
G
G
A
C
G
C C
A
G
U
C
U
G
C
A
U
G
G
A
G
C
C
G
A
C
C
U
U
GAAAU
A
CCACCC
U
U
U
A
A
U
G
U
U
U
G
A
U
G
U
U
C
U A A C G U
U
G A C C C G U A
A
UCCGGGUUGCG
G
ACAGU
G
U
C
U
G
G
U
G
G
G
U
A
G
U
U U G
A
C
U
G
GG G
C
G
G
U
C U
C
C
U
C
C
U
A
A
A
G A G
U
A
A
C
G
G
A
G
G
A G C A C
G
A
A
G
G
U
U
G
G
C
U
A
A
U
C
C
U
G
G
U
C
G G A
C
A
U
C
A
G
G
A G
G
U
U
A G
U
GC A
A
U
G
G
C
A
UA
AG
C
C
A
G
C
U
U
G
A
C U G C G A G C G U G
A
C
GGCGCGAGCAG
G
U
G
C
G
AA
A
G
C
A
G
GU
C
A
U
A
GU
G
A
U
CC
G
G
U
G
G
U U
C
U
G
A
A
UG
G
A
A
G
G
G
C
C
A
U
C
GC
U
C
A
ACG
G
A
U
A
AA
A
G
G
U A
CU
C
C
G
G
G
G A D
A
A
C
A
G
G C ψ
G
A U A C C G C C
C A A
G A
G U
U
C
A
UA
UC
GAC
GGCGGUG
UU
UGGC
A
C
C
U
C
G
A
ψGUC
G
G
C
U
C
A
U
C
A
C
A U C C U G G G G C U G A
A
G
UAGGUCCC
AA
GGGU
A
U
G
G
C
U
GUU
C
G
C
C
A
UU
U
A
A
A G
U
G
G
UA
C
GC
GA
G
C
ψ
G
GGUUU
A
G
A
A
C
G
U
C
GU
G
A
G
A
C
A G
U
ψ
C
G
G
U
C
CC
UA
UCUGCCGUGGG
C
G
C
U
G
G
A
G
A
A
C
U G
A
G
G
G
G
GG
C
U
G
C
U
C
C
U
A G
U
A C
G A
G
A
G
GA
C
CG
G
A
G
U
G
G
A
C
G
C
A
UC A
C
U
G
GU G
U
U
C
G
G
G
U
U
G
U
C
A
U
G
C
CA
A
U
G
G
C
AC
U
G
C
C
C
GGU
A
G
C
U
AA
A
U
G
C
G
G
AAG
A
G
A
U
AAG
U
G
C
U
G
A
AAG
C
A
U
C
U A A
G
C
A
C
G
A
A A C
U
U
G
C
C
C
C
GAG
A
U
G
A
G
U
U
C
U
C
C
C
U
G
A
C
C
C
U
UU
A
A
G
G
G
U
C
CUGAAG
G
A
A C G U U G
A A
G
A
C
GA
CGACG
U
U
GAU
A
G
G
C
C
G
G
G
U
G
U
G
U A
AG
C
G
C
A
G
CG
A
U
G
C
G
U
U
G
A
G
C
U
A
A
C
C
G
G
U
A C
U
A
A
U
G
A
A
C
CGUGA
G
G
C
U
U
A
A
C
C
U
U
Figure 1 (legend shown on page 741)
740 A:A and A:G Base-pairs at the Ends of RNA Helices
An analysis of the helix ends in the crystal and
NMR structures and in the 16 S and 23 S rRNA
crystal structures
AA.AG@helix.ends in rRNAs
An analysis of approximately 6000 Bacterial 16 S
and over 300 23 S rRNA sequences aligned for
maximum structure similarity revealed 115 helix
ends with AA, AG, and AA/AG oppositions in
more than 90 % of the sequences (Table 1 and
Figure 1). These are proportionately distributed in
the 16 S and 23 S rRNAs, with 42 occurrences in
16 S and 73 in 23 S rRNA, and are present in the
three loop categories, with 24 candidates in hair-
pins, 44 in internal loops, and 47 in multi-stem
loops. Invariant and exchange cases occur at nearly
the same frequencies. 75 % of the invariant sites
contain an AG opposition, while only 25 % have an
AA (Table 1). In addition, there is a bias for invar-
iant A:G base-pairs in hairpin loops (with the
majority of these occurring in tetraloops11
), and a
slight bias for multi-stem loops to have AA/AG
exchanges (Table 1). The nucleotide frequencies for
a larger set of sequences (approximately 8500 16 S
and over 1000 23 S rRNA sequences) that includes
the nuclear encoded rRNAs in the three primary
phylogenetic groups, Archaea, Bacteria and Eucar-
ya, and the two Eucarya organelles, chloroplasts
and mitochondria (see Online Table 4 at CRW
AA.AG), reveal that the majority of the positions
contain the AA and AG oppositions in all of the
alignments and phylogenetic groups, while some
of the AA and AG oppositions in the Bacteria con-
tain AU/GC or other nucleotide sets in one or
more of the non-bacterial alignments. For example,
23 S rRNA positions 637:651 and 713:718 both con-
tain AG oppositions in nearly all of the Bacteria,
and both exchange between G:C and C:G in the
Archaea.
During the preparation of this manuscript, the
crystal structures for the 30 S32,33
and 50 S31
riboso-
mal subunits were solved. We have analyzed these
structures to determine if the AA and AG opposi-
tions at the ends of helices that occur in more than
90 % of the known Bacterial rRNA sequences are
base-paired in the crystal structures. A total of 99
of the 115 Bacterial-centric oppositions were
resolved in the crystal structures and had homolo-
gous positions in the Thermus thermophilus 16 S
and Haloarcula marismortui 23 S rRNA crystal struc-
tures; these are tabulated in Table 1 and high-
lighted on the 16 S and 23 S rRNA secondary
structure diagrams in Figure 1. Of these 99, 76
(77 %) form an A:A or A:G base-pair (78 % (32/41)
in 16 S and 76 % (44/58) in 23 S rRNA). Invariant
AG oppositions (41 examples) at the ends of helices
occur more frequently than invariant AA opposi-
tions (12 examples) in the 16 S and 23 S rRNAs
(Table 1); our analysis of the rRNA crystal struc-
tures reveals that the AG oppositions form base-
pairs more frequently than the AA oppositions.
The 99 homologous oppositions have a slightly
biased distribution in the three unpaired loop cat-
egories. A total of 40 % (40/99) occur in multi-stem
loops, 38 % (38/99) in internal loops, and 21 % (21/
99) in hairpin loops.
A total of 20 of the 21 (95 %) homologous AA
and AG candidates in hairpin loops are base-
paired (Table 1 and Figure 1). GNRA tetraloops
occur at 62 % (13/21) of these hairpin loops, and
all of these have base-pairing between the ®rst and
last nucleotide of this hairpin loop. As well, six of
the seven (86 %) homologous hairpin loops with
more than four nucleotides also have base-pairing
at the two ends of the loop. Finally, all of these
base-pairs are in the sheared conformation.
For the AA and AG oppositions at the ends of
helices in internal loops, 87 % (33/38) are base-
paired (83 % and 90 % of the 16 S and 23 S rRNA
candidates). In contrast with the hairpin loops,
where 76 % (16/21) of the candidates have an
invariant AG, 47 % (18/38) of the internal loops
have an AA/AG exchange, while only 34 % (13/
38) have an invariant AG. All of the invariant AG
oppositions are base-paired, and all except one of
these (92 %) form a sheared conformation. All but
one of the 18 (94 %) AA/AG exchanges are also
base-paired. 15 of the 17 (88 %) base-paired AA/
AG exchanges are in the sheared conformation,
Figure 1. E. coli 16 S and 23 S rRNA comparative secondary structure models (based upon the sequences in Gen-
Bank Accession no. J01695) showing the AA and AG oppositions at the ends of helices that occur in more than 90 %
of the bacterial sequences. These opposed nucleotides are shown in red. Highlights indicate additional information
from crystal structures: orange, opposition is base-paired in the crystal structure; green, candidate is not base-paired
in the crystal structure; blue, candidate is not homologous, was not determined or is a Watson-Crick base-pair in the
crystal structure (e. g. this region is deleted, or is not an AA or AG opposition in the sequence of the organism that
was crystallized). Candidates with AA/AG exchanges are marked with asterisks: red, signi®cant exchanges in all
alignments with minimal exceptions; green, signi®cant exchanges in at least one alignment with minimal exceptions
but with more exceptions in at least one other alignment; blue, exchanges in at least one alignment (excluding mito-
chondria). Nucleotides which are base-paired in the crystal structures but not in the comparative structure models
which affect potential coaxial stacking and AA/AG oppositions that are not base-paired are colored blue and con-
nected with blue lines and boxes to indicate the base-pairing. Highlights within helices indicate potential coaxial
stacking: brown, not present in crystal structure; yellow, present in crystal structure. Base-pairs predicted with covar-
iation analysis are denoted with - for canonical A:U and G:C base-pairs, small closed circles for G:U base-pairs, large
open circles for G:A base-pairs, and large closed circles for non-canonical base-pairs. (a) 16 S rRNA (crystal structure:
T. thermophilus33
). (b) 23 S rRNA, 5H
half (crystal structure: H. marismortui31
). (c) 23 S rRNA, 3H
half (crystal structure:
H. marismortui31
).
A:A and A:G Base-pairs at the Ends of RNA Helices 741
a cb
f
Front view of sheared A:G base-pairs
Front view of imino A:G base-pairs
Front view of A:A base-pairs
Side view of sheared A:G base-pairs
Side view of imino A:G base-pairs
Side view of A:A base-pairsg h i
d e
Figure2(legendshownopposite)
one is in the imino conformation, and the other is
in the unusual A:G N3-amino base-pair confor-
mation (see CRW AA.AG Online Figure 3 for
chemical structure drawings and abbreviations
used in other online materials). A lower percentage
of base-pairing occurs with the invariant AA oppo-
sitions. Here, base-pairing occurs in only three of
the seven (43 %) homologous invariant AA opposi-
tions. None of these form a sheared conformation,
one forms an imino conformation, and two form
unusual conformations. On the whole, the sheared
conformation occurs in 82 % (27/33) of the paired
oppositions in internal loops. 9 % (3/33) have the
imino conformation and the remaining 9 % (3/33)
have another type of conformation (Table 1).
Of the three loop categories, the lowest percen-
tage of base-pairs for AA/AG oppositions at the
ends of helices occurs in multi-stem loops. Here,
58 % (23/40) of these candidates are base-paired in
the 16 S and 23 S rRNA. Within this category, the
highest percentage of base-pairings occurs for the
invariant AG oppositions, where 75 % (9/12) are
base-paired. Base-pairing occurs in 57 % (13/23) of
the AA/AG exchanges, and for only one of ®ve
(20 %) invariant AA oppositions. 61 % (14/23) of
the AA/AG oppositions in multi-stem loops form
sheared conformations, 13 % (3/23) form the imino
conformation, and six (26 %) form other confor-
mation types. For these rRNA oppositions, the
highest percentage of base-pairs occur for the
invariant AGs, followed by the AA/AG exchanges,
with the lowest percentage of pairing in multi-stem
loops (Table 1). 93 % (38/41) of the invariant AG
oppositions are base-paired, 74 % (34/46) of the
AA/AG exchanges are base-paired, and only 33 %
(4/12) of the invariant AAs are base-paired.
Several conformations are possible for these A:G
base-pairs. The most common and well-character-
ized are sheared and imino (Figure 2(a) and (d)34
).
The sheared conformation occurs in 80 % (61/76)
of the base-paired oppositions of the 16 S and 23 S
rRNAs. The sheared conformation forms in 87 %
(33/38) of the invariant A:G base-pairs
(Figure 2(a)), in 82 % (28/34) of the AA/AG
exchanges, and does not occur in any of the four
invariant A:A base-pairs (Figure 2(g), top). An
imino or imino-like conformation occurs six times
(6/76 ˆ 8 %) in the 16 S and 23 S rRNAs. They
form in 8 % (3/38) of the invariant A:G base-pairs
(Figure 2(d)), in just one of the 34 (3 %) AA/AG
exchanges and in two of the four (50 %) invariant
A:A base-pairs (Figure 2(g), bottom). Beyond these
two well-characterized conformations, there are
®ve other conformations (CRW AA.AG Online
Figure 3 and Online Table 4): (1) A:A N7-amino
(``A7-1``; one in 16 S rRNA at positions 1248:1289);
(2) A:A N7-amino symmetric (``A7``; one in 23 S
rRNA at positions 1689:1698); (3) A:G N1-amino
(``G1``; one in 16 S rRNA at positions 983:1222); (4)
A:G N7-amino (``G7``; one in 23 S rRNA at pos-
itions 149:177); and (5) A:G N3-amino (``G3``; four
in 16 S rRNA at positions 60:107, 197:220, 687:700,
and 1067:1108; one in 23 S rRNA at positions
627:636).
There are eight examples of the A:G base-pair in
the 16 S and 23 S rRNA crystal structures where
the G is 5H
to the helix. These occur at 16 S rRNA
positions 112:315, 143:220, 321:332, 945:1236,
1160:1176, and 1357:1365, and at 23 S rRNA pos-
itions 75:111 and 2547:2561 (Figure 1 and base-pair
frequency tables at CRW AA.AG). Five of these
base-pairs were already in the covariation-based
rRNA structure models, with exchanges between
the G:A and G:C/G:U/A:U or A:G base-pairs. The
remaining three had minor exchanges with G:C/
G:U/A:U base-pairs. All eight of these rRNA base-
pairs are in the imino conformation, which is con-
sistent with the similarity between the G:A imino
and Watson-Crick conformations.
AA.AG@helix.ends in the PDB structure database
To appreciate the conformation and structural
details about these AA and AG oppositions at the
ends of rRNA helices, and to establish a set of
rules for RNA structure principles that de®ne them
and will help us predict their occurrence in the
future, we have also analyzed the ends of helices
in the crystal and NMR structures available at the
PDB structure database (http://www.rcsb.org/
pdb/35
). The crystal and NMR RNA structures that
are analyzed and discussed below are summarized
in Table 2 and detailed in CRW AA.AG Online
Table 5. These 29 crystal and 41 NMR structures
contain 116 AA and AG oppositions (61 in crystal
structures and 55 in NMR structures) at the end of
a helix. The 70 structures can be divided by RNA
molecule into the following categories: 12 rRNA
structures (22 cases), 11 tRNA structures (22 cases),
four group I intron structures (14 cases), and 43
Figure 2. Stereo views of A:G and A:A base-pairs at helix ends in different structural motifs from X-ray crystallo-
graphy. NMR structures are omitted for clarity. The A in each base-pair is superimposed on the left of each panel.
Chemical drawings were created using ISIS/Draw and stereo images were created using Insight II. (a) Chemical
drawing of the G:A sheared base-pair (G:A N3-amino, amino-N7 base-pair34
). (b) Front view of sheared A:G base-
pairs: blue, GNRA tetraloop; yellow, E loop; green, tandem GA; red, helix end. (c) Side view of (b). (d) Chemical
drawing of the G:A imino base-pair (G:A carbonyl-amino, imino-N1 base-pair34
). (e) Front view of imino A:G base-
pairs: blue, 5H
helix end; yellow, 3H
helix end. (f) Side view of (e). (g) Chemical drawings of the A:A sheared-like base-
pair (top; A:A N3-amino base-pair44
) and the A:A imino-like base-pair (bottom; A:A N1-amino base-pair44
). (h) Front
view of A:A base-pairs: yellow, N1-amino conformation; blue, N3-amino conformation; red, N7-amino conformation;
green, tandem; gray, triple. (i) Side view of (h).
A:A and A:G Base-pairs at the Ends of RNA Helices 743
other RNA structures (58 cases), including one SRP
structure (three cases), ®ve ribozyme structures
(nine cases), ®ve pseudoknot structures (®ve
cases), and four Rev response element structures
(six cases).
For the PDB structure database (Table 2), 80 %
(93/116) of the oppositions are base-paired. AG
oppositions at the ends of helices occur more fre-
quently than AA oppositions in the PDB structure
database (Table 2). Our analysis of the structure
database reveals that the AG oppositions form
base-pairs more frequently than the AA opposi-
tions. These oppositions also have a biased distri-
bution in the three loop categories. 44 % (51/116)
occur in internal loops, 39 % (45/116) in hairpin
loops, and 17 % (20/116) in multi-stem loops.
There is an even distribution of oppositions that
are base-paired in these loops: 76 % (34/45) in the
hairpin loops, 82 % (42/51) in the internal loops,
and 85 % (17/20) in the multi-stem loops.
A total of 90 % (70/78) of the AG oppositions at
the ends of helices in the PDB structure database
(Table 2) are base-paired. These include both orien-
tations (i.e. G 5H
and 3H
to the helix, and GA tan-
dems). However, 70 % (54/78) have the G 5H
to the
helix. 67 % (47/70) of the A:G base-pairs are in the
sheared conformation (Figure 2(a)), 30 % (21/70)
are in the imino conformation (Figure 2(d)), and
3 % (2/70) form the G:A‡
carbonyl-amino, N7-N1
base-pair conformation (Online Figure 3(e)).
When the G is 3H
to the helix in the examples in
Table 2, the sheared conformation is formed in
83 % (40/48) of the A:G base-pairs. 12 % (6/48) are
in the imino conformation, and 4 % (2/48) form
other conformations. These A:G sheared base-pairs
are often a component of a larger motif that we
currently recognize. All 16 examples of A:G base-
pairs in GNRA tetraloops are in the sheared con-
formation, and all of the A:G base-pairs in hairpin
loops and at the end of a helix are in the sheared
conformation (with the G 3H
to the helix). All 11 of
the A:G base-pairs in the E-loop and E-like loop
cases that occur in internal and multi-stem loops
are also sheared. 14 of the 22 other A:G base-pairs
with the G 3H
to the helix are also in the sheared
conformation. The sheared conformation induces a
bend in the backbone that does not distort the
¯anking helix when the G is 3H
to the helix; how-
ever, the ¯anking helix will be distorted when the
G is 5H
to the helix. The observed bias for sheared
conformations for those A:G base-pairs oriented
with the G 3H
to the helix is consistent with this
topological constraint. However, there is one
example from a lower-resolution crystal structure
of a sheared A:G base-pair when the G is 5H
to the
helix; this base-pair is at positions A299:G279
in the Tetrahymena thermophila group I intron, with
3-4 AÊ between the hydrogen bonding pairs.36
In contrast with the sheared conformation, A:G
base-pairs at the ends of helices can adopt an
imino conformation34
that can form at either end of
a helix (with the G 5H
or 3H
to the helix) without dis-
torting the surrounding base-pairs. There are six
examples in Table 2 where an A:G base-pair (with
the G 3H
to the helix) forms an imino conformation.
There are also a few examples where an A:G base-
pair with this orientation in Table 2 adopts another
conformation type (see below). As well, 71 % (15/
21) of the A:G base-pairs in the imino conformation
(including the two tandem GA cases) are oriented
with the G 5H
to the helix (Table 2). 93 % (13/14) of
the single A:G base-pairs with the G 5H
to the helix
are in the imino conformation; the other is a
sheared base-pair (see above). There are two
examples of tandem G:A imino base-pairs where
the G is 3H
to the helix in one case and 5H
to the
helix in the other.37
A total of four of the six
examples of imino A:G base-pairs with the G 3H
to
a helix are in single nucleotide bulges, adjacent to
the A:G or A:A base-pair, where only one nucleo-
tide remains unpaired.38-41
In these instances, an
imino conformation, with its non-helix-distorting
properties, may be preferred over the sheared con-
formation.
We have investigated the A:G base-pair confor-
mations in different structural motifs to determine
if the nucleotides surrounding the A:G base-pair
in¯uence the conformation of this base-pair. The
A:G base-pairs in Figure 2 are color-coded for the
GNRA tetraloop, E loop, and GA tandem motifs
Table 2. Distribution of AA and AG juxtapositions at the ends of helices in the structures in the PDB Structure
Database
Loop type Hairpin Internal Multi-stem Total
Opposition C[ ‡ , À ]a
[S,I,O]b
C[ ‡ , À ]a
[S,I,O]b
C[ ‡ , À ]a
[S,I,O]b
C[ ‡ , À ]a
[S,I,O]b
AA 18[11,7] [11,0,0] 14[8,6] [4,1,3] 6[4,2] [1,1,2] 38[23,15] [16,2,5]
AG c
27[23,4] [23,0,0] 24[23,1] [15,6,2] 3[2,1] [2,0,0] 54[48,6] [40,6,2]
GA d
0[0,0] [0,0,0] 7[5,2] [0,5,0] 9[9,0] [1,8,0] 16[14,2] [1,13,0]
GA tandem 0[0,0] [0,0,0] 6[6,0] [4,2,0] 2[2,0] [2,0,0] 8[8,0] [6,2,0]
AG totals 27[23,4] [23,0,0] 37[34,3] [19,13,2] 14[13,1] [5,8,0] 78[70,8] [47,21,2]
Total 45[34,11] [34,0,0] 51[42,9] [23,14,5] 20[17,3] [6,9,2] 116[93,23] [63,23,7]
a
C, number of examples of AA or AG juxtapositions at the ends of helices from crystal or NMR structures. ‡, Base-pair is
present; À, base-pair is absent.
b
Conformation of AA or AG base-pairs present in the crystal or NMR structures: S, sheared; I, imino or imino-like; O, other.
c
G is 3H
to the helix.
d
G is 5H
to the helix.
744 A:A and A:G Base-pairs at the Ends of RNA Helices
and the unincorporated A:G base-pairs when the G
is 3H
to the helix. Our analysis revealed that the
conformations for the A:G base-pairs are nearly
identical in all of these motifs except for the GNRA
tetraloops (Figure 2(b) and (c), blue nucleotides),
where the G of the GNRA tetraloop G:A sheared
base-pair is shifted toward the major groove of the
A. This shift is due to the additional hydrogen
bonds between the guanosine base and the back-
bone of A in the tetraloop, and between the back-
bone atoms of G and other bases in the loop.42
There is a minimal amount of conformational ¯exi-
bility in tandem G:A base-pairs with sheared and
imino conformations (Figure 2(b), (c), (e) and (f)).
Imino base-pairs showed much less conformational
¯exibility than sheared base-pairs, regardless of
whether the base-pair was 5H
or 3H
to the helix
(Figure 2(e) and (f)).
Two consecutive A:G base-pairs can both form
sheared base-pairs within a helix when the ®rst
G:A base-pair is followed by another A:G base-
pair. Both A:G base-pairs distort the helix; how-
ever, they are oriented so that they offset or com-
pensate one another to maintain the overall
regularity of the helix.15,43
There are six examples
of tandem sheared G:A base-pairs in the database.
We have identi®ed conformations for A:A base-
pairs that are analogous to the sheared and imino
A:G base-pairs. 61 % (23/38) of the AA oppositions
at the end of helices in the PDB NMR and crystal
structure database (Table 2) are base-paired. There
are ®ve different A:A base-pairing conformations;
two are analogous to the conformations in the
sheared and imino A:G base-pairs. The A:A N3-
amino (A:A sheared) base-pair has one hydrogen
bond between N3 of the ®rst adenosine and the
amino group on the second (Figure 2(g), top44
); in
comparison, the sheared A:G base-pair forms two
hydrogen bonds, one from the N3 of the guanosine
to the adenosine amino group and the second
between N7 of A and the amino group of G
(Figure 2(a)). The A:A N1-amino (A:A imino-like)
base-pair conformation forms a single hydrogen
bond between N1 of one adenosine and the amino
group of the second (Figure 2(g), bottom44
); while
the hydrogen bonding pattern is different, the
overall shape of the base-pair resembles that of the
A:G imino conformation and the orientation of the
backbone (Figure 2(d)). 70 % (16/23) of the A:A
base-pairs in the PDB structure database (Table 2)
are in the sheared (A:A N3-amino) conformation,
and 9 % (2/23) are in the A:A imino-like (A:A N1-
amino) conformation. The sheared A:A (A:A N3-
amino) base-pairs occur at the end of the D stem/
hairpin loop junction in tRNAs and within the A:A
tandem base-pairs. Other sheared A:A (A:A N3-
amino) base-pairs occur in a tetraloop and in the
unincorporated 3H
helix end category. All 11 of the
hairpin loops with the A:A base-pair have the
sheared conformation, while 50 % (4/8) of the
internal loops and 33 % (1/3) of the multi-stem
loops have this conformation for the A:A base-pair.
The imino-like A:A (A:A N1-amino) base-pairs
occur in the unincorporated 3H
helix end category.
The remaining 21 % (5/23) of the A:A base-pairs in
the structure database have three other confor-
mations, each with two hydrogen bonds, as
opposed to a single hydrogen bond for the sheared
(A:A N3-amino; Figure 2(g), top) and imino-like
(A:A N1-amino; Figure 2(g), bottom) confor-
mations. There are three ``A:A N7-amino, amino-
N1`` base-pairs (with hydrogen bonds between the
Watson-Crick and Hoogsteen faces of each A, one
from N7 of the ®rst A to the amino group of the
second, and one from N1 of the second A to the
amino group of the ®rst34
), one ``A:A N1-amino
symmetric'' base-pair (similar to the imino-like
A:A (A:A N1-amino) base-pair, but with one ade-
nosine ¯ipped so that two hydrogen bonds can
form between N1 on each adenosine and the
amino group of its partner34
), and one ``A:A N7-
amino symmetric'' base-pair (with hydrogen bonds
between the N7 and amino groups of each A44
),
which is analogous to a sheared A:G base-pair
where the G is in the syn conformation.
A:A and A:G base-pairs that stack onto the
ends of helices
Beyond the base-pairing of the AA and AG
oppositions at the ends of helices, we have investi-
gated the structures in the PDB structure database
to determine if these non-canonical base-pairs
stack onto the adjoining base-pair in the helix to
which they are adjacent. The results are af®rma-
tive: all but one of the 72 A:G and 23 A:A base-
pairs are stacked, with stacking de®ned as one or
both of the base-paired nucleotides overlapping
with the adjoining base-pair in the helix. Examples
of the three-dimensional structures for stacked A:G
base-pairs in the sheared and imino conformations
are shown in Online Figure 4.
The one exception for the base stacking in the
PDB structure database occurs in the mouse mam-
mary tumor virus pseudoknot, where an A:A base-
pair does not stack onto the end of the helix. This
base-pair is composed of A14, situated between the
two helices of the pseudoknot, and A6, located in
one of the loops. This base-pair forms in one of the
two constructs of the mouse mammary tumor
virus. In the construct where A14 is unpaired, A14
stacks on G15 in the helix below.45
In the construct
where A14 is base-paired to A6, the A14:A6 base-
pair does not stack on the G15:C5 base-pair at the
end of the helix.46
Burkard et al.47
analyzed the nucleotide stackings
at the ends of helices in the PDB structure database
and found that all AG oppositions at the ends of
helices are base-paired and stacked when the G is
3H
to the helix. Our analysis of the rRNA crystal
structures31,33
revealed that both positions of the
A:G base-pairs at the ends of helices are stacked in
78 % (21/27) of the cases in the 16 S rRNA and
88 % (36/41) of the cases in the 23 S rRNA (infor-
mation about stacking is available from the base-
pair frequency tables at CRW AA.AG). In the
A:A and A:G Base-pairs at the Ends of RNA Helices 745
remaining six 16 S rRNA and ®ve 23 S rRNA
cases, one nucleotide of each A:G base-pair is
stacked upon the neighboring base-pair. For A:A
base-pairs, four of the ®ve 16 S rRNA cases and all
three of the 23 S rRNA cases have both nucleotides
stacked; in the lone exception, one nucleotide of
the A:A base-pair is stacked upon the neighboring
base-pair. In total, stacking occurs on both pos-
itions in 78 % (25/32) of the 16 S rRNA base-pairs
and 89 % (39/44) of the 23 S rRNA base-pairs; the
remaining seven 16 S rRNA and ®ve 23 S rRNA
A:A and A:G base-pairs have only one of the two
base-paired positions involved in stacking (see
Online Table 4 at CRW AA.AG).
Coaxial stacking with A:A and A:G
base-pairing at the helix interfaces
The ends of helices have a propensity to stack
onto one another. Transfer RNA contains two sets
of coaxial helices, the acceptor and TÉC helices,
and the D and anticodon helices.48
The two most
common base-pairs at positions 26:44, at the top of
the tRNA anticodon helix (and stacked onto the D
stem), are G:A and A:G (see CRW AA.AG Online
Table 7 for the base-pair frequencies for tRNA pos-
ition numbers 26:44, Saccharomyces cerevisiae
phenylalanine numbering). Other base-pairs pre-
sent in more than 5 % of the sequences are A:A,
A:U, A:C and U:A.
More recently, two sets of coaxial helices were
identi®ed in the crystal structure for the L11 bind-
ing region of 23 S rRNA (Figure 1(b)30,49
). The
lone-pair 1082:1086 (E. coli numbering) is stacked
onto the 1057-1059/1079-1081 helix. A second lone-
pair, 1087:1102, is stacked onto the G1056:A1103
base-pair at the top of the 1051-1056/1103-1108
helix.
Given these two precedents for A:G and A:A
base-pairs at the interface between coaxially
stacked helices, we questioned if (1) there are other
examples in the RNA structure database for this
motif and (2) if one of the functions of A:A and
A:G base-pairs at the termini of helices is to be at
the interface of two helices that are coaxially
stacked.
23 of the 116 examples in the PDB structure
database with an AA or AG at the end of a helix
(Online Table 6) are adjacent to another helix. 21 of
these are base-paired, while two are unpaired
(Table 2). A Curves analysis was performed on
these helix junctions to measure the angle between
the two helices and the overall helix dis-
placement.50,51
Helices are considered to be coaxial
when both the angle between the helix axes and
their displacement are minimal, as discussed in
Materials and Methods. Eight of the 21 examples
in the structure database with an A:A or A:G base-
pair at the end of one helix and adjacent to another
helix occur at the anticodon/D helix junction in
tRNA. All eight of these tRNA examples are coaxi-
ally stacked with the G 5H
to the helix and the G:A
base-pair in the imino conformation (the N1-amino
conformation for the one A:A base-pair). In
addition to the eight tRNA cases, eight more
examples satisfy these strict criteria, including
examples in 23 S rRNA and the RRE RNA.
However, there are a few cases where an A:A or
A:G base-pair at the end of a helix is not coaxial to
a second helix. The P4-P6 domain of the group I
intron contains tandem G:A base-pairs in a multi-
stem loop at positions 139:164 and 140:163 that
extends the P5b helix and ¯anks and adjoins the
P5a and P5c helices (PDB ID 1GID52
). The axis of
the P5c helix (165-167/173-175) that is 3H
to the
A139:G164 base-pair continues at an angle of 94 
to and is 11.7 AÊ displaced from the P5b helix end-
ing in A:G. The axis of the P5a helix (136-138/180-
182), 5H
to the A139:G164 base-pair, has an angle of
42 
to and 9.15 AÊ displacement from the P5b helix
ending in A:G. Helices P5a and P5c are not con-
sidered to be coaxial with P5b. The second excep-
tion also occurs in the group I intron, where the P3
and P7 helices that end with A:A base-pairs
(A269:A306 and A270:A104) are not coaxial.36
Here, the two helices are separated by 3.9 AÊ and
occur at an angle of 40 
.
While 21 of the 23 examples in the PDB database
with an AA or AG opposition at the end of one
helix and adjacent to another helix form an A:A or
A:G base-pair, A:A or A:G base-pairs do not form
in the remaining two examples. In both cases, the
helices are not coaxial with one another. The RNA
is kinked at the internal loop junction by 170 
and
the axis is displaced by 16 AÊ when the spliceo-
somal U1A protein is bound to its RNA.53
Helices are also not stacked for the unpaired AA
oppositions in the mouse mammary tumor virus
pseudoknot junction. Here, the angle between the
helices is 78 
and the helix displacement is 5.3 AÊ .45
As noted earlier, there are 115 cases with an AA
or AG opposition at the end of a helix in the Bac-
terial 16 S and 23 S rRNA secondary structure
models. A total of 99 of these are homologous and
have their structures determined in the 16 S and
23 S rRNA crystal structures; 76 of these are base-
paired in the two crystal structures. All additional
base-pairs in the crystal structures that are not in
the comparative structure models, adjacent to A:A
and A:G base-pairs at the ends of helices, and
immediately opposed to another helix with no
intervening nucleotides were identi®ed on the sec-
ondary structure diagrams in Figure 1. Two helices
with an A:A or A:G base-pair at their interface and
no unpaired nucleotides on the strand connecting
them were considered as a possible coaxial helix
and identi®ed in Figure 1; those stacked in the
crystal structures were identi®ed.
Discussion
Our goal is to predict base-pairs for those pos-
itions with similar patterns of variation (covaria-
tion) and, more recently, for those positions with
either unique patterns of variation or no variation.
746 A:A and A:G Base-pairs at the Ends of RNA Helices
Toward this end, an earlier analysis of base-paired
and unpaired nucleotides in covariation-based
rRNA structure models has revealed that there is a
signi®cant bias for adenosines to be unpaired, and
a more pronounced bias for unpaired As at the 3H
end of loops.25
The same analysis also determined
that Gs and As are the two most frequent nucleo-
tides at the 5H
end of a loop. Given that the GA/
AA opposition at positions 1056:1103 is base-
paired in the 23 S rRNA L11 crystal structures,30,49
we have searched for other examples of AA and
AG oppositions at the ends of helices.
AA and AG oppositions, base-pairs, and
conformations at the ends of helices
Our analysis of the 16 S and 23 S rRNA covaria-
tion-based models revealed that AA and AG oppo-
sitions that occur in more than 90 % of the rRNA
sequences at the ends of helices are very common.
Of the approximately 400 oppositions at the end of
a helix, more than 100 of them have a very con-
served AA, AG or an AA/AG exchange. Prior to
the resolution of the 16 S and 23 S rRNA crystal
structure solutions, our only examples with physi-
cal evidence for A:G and A:A base-pairs at the
ends of helices were in the NMR and crystal struc-
ture solutions available from the PDB structure
database. Our analysis of both databases revealed,
as discussed earlier, the following trends. (1) More
than 75 % of these AA and AG oppositions are
base-paired. (2) Of the AA and AG oppositions,
AG oppositions occur more frequently and are
base-paired at a higher percentage. (3) For the two
AG orientations, the G is 3H
to the helix in approxi-
mately 90 % of the cases. (4) For the three loop cat-
egories, the highest percentage of base-pairing
occurs in the hairpin loops, followed by internal
and multi-stem loops. (5) Overall, the most com-
mon conformation for the base-paired oppositions
is sheared. The imino and several unusual confor-
mations occur at a much lower frequency. The per-
centage of sheared conformations is higher for A:G
base-pairs (versus A:A) and higher when the G is 3H
to the helix. In contrast, essentially all of the A:G
base-pairs with the G 5H
to the helix have the imino
conformation.
AA and AG oppositions that are not
base-paired
While 80 % (93/116) of the AA and AG opposi-
tions at the ends of helices from the PDB structure
database are base-paired, 23 are not. 65 % (15/23)
of these involve AA oppositions and 35 % (8/23)
have AG oppositions. For the 16 S and 23 S rRNA,
we have similar percentages of unpaired AA and
AG oppositions. 77 % (76/99) of the oppositions
are base-paired while 23 are not. Here, the highest
percentage of non-pairing occurs for the invariant
AA oppositions (66 %; 8/12), followed by AA/AG
exchanges (26 %; 12/46) and invariant AGs (7 %;
3/41). It is not obvious why these oppositions are
not base-paired, while the majority of them are. A
higher percentage of AA oppositions are not base-
paired, and for the 16 S and 23 S rRNA a higher
percentage of oppositions in multi-stem loops are
not base-paired (42 % of the oppositions in multi-
stem loops are not base-paired, versus 5 % and
13 % of the oppositions in hairpin and internal
loops; Table 1). There are no obvious sequence pat-
terns ¯anking the oppositions that distinguish the
paired from the unpaired. Maybe there is a higher
percentage of unpaired oppositions in the multi-
stem loops since these regions of the RNA have
more opportunities to form interactions with other
positions in the multi-stem loop. And maybe the
explanation for the higher frequency of unpaired
AA oppositions is that these unpaired adenosines
are inserting into the minor groove of helices, as
recently documented in the A-minor motif54
and
type I/II base triples.55
Alternatively, these AA and AG oppositions
might not base-pair because one or both of these
positions are involved in a standard base-base
interaction with another region of the RNA or an
interaction with a protein. Some of the unpaired
oppositions in the PDB database are associated
with protein binding to the RNA, pseudoknots and
unusual base-pair conformations between one of
the positions in the opposition and another pos-
ition (entries with unpaired oppositions associated
with proteins are: 1CN8, 1AUD, 1RNK, 1ZDI,
1ZDJ, 7MSF, 1YFG, 1C04; 1QA6, 1TLR, and 1GID).
For the rRNAs, there are 23 oppositions that are
not base-paired. The positions in only four of these
are not involved in other intramolecular base-base
interactions, while both positions in 12 oppositions
are involved in other intramolecular RNA-RNA
interactions, and one of the positions in seven of
the oppositions is involved in another intramolecu-
lar RNA-RNA interaction (Figure 1).
However, in contrast, there are examples of A:A
and A:G base-pairs at the ends of helices in the
PDB database that are also interacting with pro-
teins (entries with paired oppositions associated
with proteins are: 1A4T, 1QFQ, 1D6 K, 1DFU,
1ETF, 1ULL, 484D, 2TOB, 1NEM, and 1PBR). For
the rRNAs, there are examples of A:A and A:G
base-pairs at the ends of helices that are interacting
with other positions in the rRNA crystal
structures.31,33
Thus, there is no simple explanation
for why some of the AA and AG oppositions are
not base-paired. However, there is an example of
an A:A/A:G base-pair at the end of a helix in the
16 S rRNA that becomes unpaired during protein
synthesis, suggesting that these AA and AG oppo-
sitions might not be static, but instead involved in
movement (see below).
A:A and A:G base-pairs and conformations in
larger motifs
In 1985, it was observed that the majority of the
adenosines were unpaired in the E. coli 16 S rRNA
covariation-based structure model.56
More recently,
A:A and A:G Base-pairs at the Ends of RNA Helices 747
it was determined that this bias occurs in a large
collection of 16 S and 23 S rRNA structure mod-
els,25
and that there is an even stronger bias for
unpaired adenosines to be at the 3H
end of loops,
and guanines and adenines to occur at the 5H
end
of loops. These biases are consistent with and aug-
ment our identi®cation of AA and AG oppositions
at the ends of helices. Other biases in the distri-
butions of nucleotides in the loop structures with
these dominant adenosines at the 3H
ends of loops
were identi®ed, with several different structural
motifs mapped onto these regions of the 16 S and
23 S rRNA25
(see also 16 S and 23 S rRNA second-
ary structure Figures with motifs mapped onto the
oppositions at CRW AA.AG). These include adeno-
sine platforms, E and E-like loops, tandem GAs,
GNRA tetraloops, and U-turns. The AA and AG
oppositions at the ends of helices are a component
in these motifs, although not necessarily in all
examples for each of these motifs. Sheared A:G
base-pairs with the G 3H
to the helix are present in
GNRA tetraloops, the E loop, tandem A:G base-
pairs, and in some of the U-turns. Thus, the
sheared base-pairing conformation appears to be
an important structural element utilized in these
larger structural motifs. The GNRA tetraloop is a
common structural element in various RNAs,
including the rRNAs.11
The second motif is the E
loop that was ®rst identi®ed in the 5 S rRNA and
subsequently observed in several other RNAs.16,57 ±
61
The third motif is tandem G:A base-pairs. Here
the A:G base-pairs that are arranged in tandem can
be in the sheared or imino conformation. A single
A:G base-pair in the sheared conformation and
¯anked by standard G:C or A:U base-pairs would
distort the helix; however, a second A:G base-pair
with a sheared conformation in the proper orien-
tation would offset this original distortion and
bring the helix back into register. An unexpectedly
high number of tandem G:A base-pairs was ident-
i®ed with comparative sequence analysis of the
rRNAs15,62
(a revised list of tandem GA opposi-
tions in the rRNAs is available at CRW A Story).
The U-turn is the fourth motif, where the RNA
backbone undergoes a sharp bend after the single-
stranded U in a UNR sequence. This motif is most
notably present in the anticodon and T loops of
tRNAs.63,64
The UNR sequence, as revealed in a
recent study of comparative structures of 16 S and
23 S rRNAs,18
is sometimes ¯anked by an A:G
base-pair, and occurs within the three loop cat-
egories: hairpin, internal, and multi-stem. (We
have also noted that there is usually a AG or AA
opposition that is adjacent to the G:U base-pair
associated with the adenosine platform14,25
(see
above).)
Given that these A:G base-pairs at the ends of
helices are associated with several larger motifs,
we have analyzed here the conformation of the
A:G base-pairs in various structural motifs and
have determined that the conformations are identi-
cal in all of these motifs, except for the GNRA tet-
raloops, where it is shifted slightly (Figure 2(b) and
(c)).
A:A and A:G base-pair and coaxial stacking
All but one of the A:A and A:G base-pairs at the
ends of helices in the PDB database and the 16 S
and 23 S rRNA crystal structures are stacked onto
the end of the helix. The extension of these helices
occurs for all of the A:A and A:G base-pairs in the
structure database, except for one example in a
conformationally constrained pseudoknot.45
This
preponderance of stacking is maintained in the
rRNAs, as noted earlier.
Given the tendency for helices to coaxially stack
onto one another when they are adjacent to one
another, we have questioned if A:A and A:G base-
pairs at the interface of two helices might in¯uence
the coaxial stacking potential of these two helices.
Our analysis of the structures in the PDB structure
database was af®rmative: 76 % (16/21) of adjacent
helices with an A:A or A:G base-pair between
them are coaxially stacked. Previously, it has been
shown that coaxial stacking at helix junctions
stabilizes the structure by about 2 kcal/mol.65± 66
Additional studies con®rmed that A:G base-pairs
at the junction between coaxially stacked helices
contribute the same energy as U:A base-pairs,
while tandem GAs are almost as stabilizing as
single AGs in a junction.67
The analysis of the potential coaxial helices in
the 16 S and 23 S rRNA revealed mixed results. A
total of 11 of the 12 (92 %) potential coaxial helices
are stacked in the 16 S rRNA crystal structure
(Figure 1(a); base-pair frequency tables at CRW
AA.AG). However, only 11 of the 22 (50 %) poten-
tial coaxially stacked helices are actually stacked in
the 23 S rRNA crystal structure (Figure 1(b) and
(c); base-pair frequency tables at CRW AA.AG).
Conformational changes in the 16 S rRNA
A-site
In our paper about unpaired adenosines in the
covariation-based rRNA structure models,25
we
observed that some of the positions involved in
AA and AG oppositions at the ends of helices also
occur in adenosine platforms, E and E-like loops,
tandem GAs, and U-turn sequence motifs. We
speculated that conformational rearrangements
might be necessary if both of these sequence motifs
fold into their respective structural motifs. The
crystal structure of the A-site in 16 S rRNA has
been determined in the presence and absence of
the antibiotics paromomycin, streptomycin, and
spectinomycin,68
initiation factor 1 (IF1),69
and
mRNA/tRNA.70
The analysis of the crystal struc-
ture revealed the status of the 1408:1493 AA/AG
opposition at the end of a helix. This opposition is
adjacent to the invariant C1407:G1494 base-pair.
Position 1408 is an A in greater than 99 % of the
bacteria, 98 % of the chloroplasts, and 96 % of the
mitochondria (see Online Table 4(a) at CRW
748 A:A and A:G Base-pairs at the Ends of RNA Helices
AA.AG, and the individual nucleotide frequency
tables at the CRW Site). All of these sequences that
do not have an A at position 1408 have a G. Great-
er than 99 % of the Eucarya 16 S-like rRNA
sequences have a G at position 1408; the remaining
sequences have an A. 70 % of the Archaea 16 S
rRNA sequences have an A at position 1408, while
the remaining 30 % have a G. Position 1493 is an A
in more than 99 % of all 16 S and 16 S-like rRNA
sequences. Position 1492 is also equally conserved,
with an adenosine in more than 99 % of all 16 S
and 16 S-like rRNA sequences (CRW Site Single
Base Frequency Tables). Thus, in the Bacteria,
chloroplasts, and mitochondria, and 70 % of the
Archaea, the 1408:1493 opposition is an AA, while
it is a GA in the Eucarya and 30 % of the Archaea.
Positions 1408:1493 form an A:A base-pair in the
T. thermophilus 30 S ribosomal subunit crystal
structure that is not complexed with antibiotics,
IF1, or mRNA/tRNA33
(Online Table 8), while
they are unpaired in the three different crystal
structures that are complexed with the antibiotics
paromomycin, streptomycin, and spectinomycin,
IF1, and a mRNA/tRNA codon-anticodon helix.
When positions 1408:1493 are not base-paired, the
two invariant adenines at positions 1492 and 1493
are ¯ipped out of the helix and are available for
interactions with IF1 and the codon-anticodon
helix. In conjunction with the unpairing of the
1408:1493 base-pair and the movement of positions
1492 and 1493 from the inside to the outside of the
helix, there are minor changes in the bend angle
and the displacement of the coaxial stack ¯anking
both sides of the 1408:1493 opposition (Online
Table 8). The base-pairs in proximity to the
1408:1493 opposition (C1399:G1504, G1401:C1501,
C1402:A1500, C1404:G1497, G1405:C1496,
U1406:U1495, C1407:G1494, C1409:G1491,
G1410:C1490, C1411:G1489, and C1412:G1488) are
all base-paired in both the presence and absence of
these molecules involved in protein synthesis
(Online Table 8). The conserved, but not invariant,
A1413:G1487 base-pair (see CRW Site base-pair fre-
quency tables for 16 S rRNA; predominantly A:G
in the Bacteria, Archaea, and chloroplasts, U:A in
the Eucarya, and C:G in the mitochondria) is base-
paired in the imino conformation in three of the
four crystal structures, and is unpaired in the pre-
sence of IF1. These results reveal that the 1408:1493
AA/AG opposition at the end of the helix is
involved in a conformational rearrangement
directly associated with protein synthesis. This
region of the A-site contains a set of commonly
occurring rRNA motifs, described earlier.25
More
than 50 % (527 in total) of the 3H
ends of loops in
16 S and 23 S rRNA contain a conserved adenosine
in the covariation-based structure models. 56
(11 %) of these ``A-motifs'' are ¯anked by an A on
its 5H
end and a paired G on its 3H
end. This highly
conserved AAG motif occurs at 16 S rRNA pos-
itions 1492-1494. While this sequence motif con-
tains some of the features characteristic of the
adenosine platform,24,25
we do not know if pos-
itions 1492 and 1493 are base-paired at some stage
in protein synthesis, as they are in the adenosine
platform.
Concluding statement
Our analysis of the PDB structure database and
the 16 S and 23 S rRNA crystal structures revealed
general similarities in the higher than expected fre-
quencies of AA and AG oppositions at the ends of
helices, and, for both sets of data, similar extents of
base-pairing (80 % for the PDB, 76 % for the two
rRNAs). The frequencies of AG oppositions and
oppositions that are base-paired were higher than
the frequencies of AA oppositions and their base-
pairs for both data sets. As well, the frequency of
oppositions that are base-paired is highest for the
hairpin loops for both data sets, followed by
internal and multi-stem loops for the rRNAs. The
frequencies of A:G base-pairs (when the G is 3H
to
the helix) in the sheared conformation are signi®-
cantly higher than the frequency of imino confor-
mations and other unusual conformations for both
data sets, while essentially all of the A:G base-pairs
with the G 5H
to the helix are in the imino confor-
mation for both data sets. The sheared confor-
mation occurs in 100 % of the A:A/A:G base-pairs
at the ends of helices in hairpin loops in both data
sets, a lower percentage in internal loops (82 %
(27/33) in rRNA, 55 % (23/42) in the PDB), and the
lowest percentage in multi-stem loops (61 % (14/
23) in rRNA, 35 % (6/17) in the PDB). In contrast,
the imino conformation occurs at the lowest per-
centage in hairpin loops (0 % in both data sets), a
higher percentage in internal loops (9 % (3/33) in
rRNA, 33 % (14/42) in the PDB), and the highest
percentage in multi-stem loops (13 % (6/23) in
rRNA, 53 % (9/17) in the PDB). Other confor-
mations occur in both data sets, although limited
to internal and multi-stem loops. For the rRNAs,
they are more prevalent than imino conformations,
especially in multi-stem loops (Table 1). All of
these A:A/A:G base-pairs are stacked in some
form onto the ¯anking helix. The one major, anom-
alous difference between the two data sets is for
coaxial stacking. 91 % (21/23) of the potential coax-
ial stacks in the PDB database are coaxial. For 16 S
rRNA, this number is 92 % (11/12). However, for
23 S rRNA, this number is only 50 % (11/22). The
combined total for 16 S and 23 S rRNA is 65 %
(22/34).
A:A and A:G base-pairs at the ends of helices
are associated with several different structural
motifs, including E loops, U-turns, and GNRA tet-
raloops. While the majority of the AA and AG
oppositions are base-paired, approximately 25 % of
them are not. The percentage of unpaired AA
oppositions is higher than unpaired AG opposi-
tions. For the ribosomal RNAs, the highest percen-
tage of unpaired oppositions is for those that occur
in the multi-stem loops. Currently, there is no
obvious explanation for why 25 % of the opposi-
tions are not base-paired. However, given that the
A:A and A:G Base-pairs at the Ends of RNA Helices 749
16 S rRNA 1408:1493 AA/AG opposition is
dynamic, changing its form from paired to
unpaired during protein synthesis, we wonder if
the state of other AA/AG oppositions at the ends
of helices are also dynamic and associated with
ribosomal movement during assembly and protein
synthesis.71,72
Materials and Methods
The rRNA sequence alignments used for this analysis
are maintained by us at the University of Texas and are
available from the CRW AA.AG Site (see below).
Sequences were manually aligned with the alignment
editor AE2 (T. Macke, Scripps Clinic, San Diego, CA).
Our analysis of the AA and AG oppositions at the ends
of helices was performed on this large collection of 16 S
and 23 S rRNA sequences that span the three primary
phylogenetic lineages and the two Eucarya organelles, as
outlined in Table 3. The numbering systems from the
E. coli 16 S and 23 S rRNA sequences (GenBank Acces-
sion no. J01695) are used as the references for position
numbers for both 16 S and 23 S rRNAs.
AA and AG oppositions at the ends of helices in the
most recent (December 1999) 16 S and 23 S rRNA E. coli
covariation-based structure models (CRW Site; see
below) were manually identi®ed. Each candidate was
classi®ed into one of three loop types: hairpin, internal
or multi-stem. The program query (Gutell et al., unpub-
lished) was used to collect single nucleotide and base-
pair frequency data from the (AE2) sequence alignments.
Base frequencies for each candidate were computed
independently from each of the alignments (16 S and
23 S rRNAs; bacterial, archaea, and eucarya nuclear,
chloroplast, and mitochondrial). AA.AG@helix.ends can-
didates with greater than 90 % AA, AG or AA/AG (with
the G 3H
to the helix for AG and AA/AG oppositions) in
the bacterial alignment were considered further. The
comparative sequence analysis data is summarized in
Table 1 and presented in greater detail in Online Table 4
at CRW AA.AG (see below).
Supplementary data that augments the Tables and
Figures in this manuscript is available from the CRW
AA.AG@helix.ends pages (abbreviated as CRW AA.AG;
http://www.rna.icmb.utexas.edu/ANALYSIS/AAAG/),
the CRW Site (http://www.rna.icmb.utexas.edu), and
the CRW A Story pages (http://www.rna.icmb.utexas.e-
du/ANALYSIS/A-STORY/). The information available
at CRW AA.AG includes: base-pair frequency tables for
all of the AA and AG oppositions at the ends of helices
that occur in more than 90 % of the bacterial sequences
(Online Table 4); tables of the PDB structures analyzed
in Table 2 (Online Table 5) and for the coaxial stacking
analysis (Online Table 6); chemical structure diagrams
for all of the base-pair types described here (Online
Figure 3); and 16 S and 23 S rRNA secondary structure
diagrams showing the AA/AG oppositions, potential
coaxial stackings (Figure 1) and multiple motifs (Online
Figure 5).
The tabulated information in Table 1 is culled from
Online Table 4 (16 S and 23 S rRNA base-pair frequency
tables), which includes: (1) the percent occurrences for all
16 base-pairing types (e.g., A:A, A:C, A:G, etc.) at each
of the AA and AG sites in ®ve alignments (Bacteria,
Archaea, Eucarya nuclear, chloroplasts and mitochon-
dria); (2) the exchange patterns between AA and AG; (3)
the loop type (hairpin, internal, or multi-stem); (4) any
associated motifs (e.g. E loop); and (5) for all of the
oppositions that are base-paired in the rRNA crystal
structures, four additional entries: (a) a RasMol73,74
image of that base-pair created from the crystal struc-
tures (16 S rRNA, PDB ID 1FJF;33
23 S rRNA, PDB ID
1FFK31
); (b) the conformation of the base-pair;34,44
(c)
identi®cation of the nucleotides of the opposition which
stack onto the adjoining helix; and (d) the adjoining
base-pair(s) upon which the opposition stacks. The
online tables describing the PDB structures (Online
Tables 5 and 6) present, for each of the NMR and crystal
structures, an expanded description of the experimental
systems, RasMol73,74
images highlighting the AA and
AG oppositions, links to the MEDLINE abstract, and
additional information pertinent to that analysis.
The secondary structure Figures showing the
AA.AG@helix.ends sites (Figure 1) and additional
secondary structure diagrams at CRW AA.AG (Online
Figure 5) were generated using the interactive graphics
program XRNA (Weiser  Noller, University of Califor-
nia, Santa Cruz). Chemical structures were generated
using ISIS/Draw and CS ChemDraw Std. 3D images
were generated using Insight II.
The PDB ®le for each rRNA crystal structure was visu-
alized using RasMol.73,74
The conformation34,44
of each
base-pair was assessed.
We have analyzed the A:A and A:G oppositions at the
ends of helices in the NMR and crystal structures from
the PDB.35
Only one structure was analyzed when that
structure was solved more than once with the same
method. For NMR structures, we analyzed either the
minimized average structure (when available) or the ®rst
structure. Both NMR and crystal structures were ana-
lyzed when a single structure was solved using both
methods. For sequences determined by both X-ray crys-
tallography and NMR spectroscopy, we analyzed one
structure from each method. Both the free and bound
forms were analyzed when the same RNA construct was
Table 3. Approximate number of sequences in the 16 S and 23 S rRNA alignments
No. of sequences b
Alignment IDa
Phylogenetic group/organelle 16 S rRNA 23 S rRNA
B Bacteria 5850 325
A Archaea 260 40
C Chloroplast 180 100
E Eucarya 1050 265
M Mitochondria 160 310
Total All 8500 1040
a
Single-letter code used to identify the alignment in the base-pair frequency tables (Online Table 4 at CRW AA.AG).
b
Approximate number of sequences in each alignment at the time of this analysis.
750 A:A and A:G Base-pairs at the Ends of RNA Helices
solved in the presence and absence of protein or other
ligands.
Base-pairs were extracted from PDB ®les and superim-
posed using Insight II. The atoms in the base of each
adenine in A:G base-pairs were superimposed. For A:A
base-pairs, the atoms in one adenine were superimposed
so that the other adenine of the base-pair sat on the
major groove side of the superimposed adenines. Base
stacking was evaluated manually using Insight II and
RasMol.
A Curves analysis50,51
was used to assess if adjacent
helices were coaxial by determining the angle and axis
displacement between the best linear axes of these
helices. Linear axes were calculated for helices with three
or more base-pairs, including the terminal A:G or A:A
base-pair. When the A was not base-paired, this nucleo-
tide was not included in axis calculations. Coaxial helices
should theoretically have no axis displacement and little
or no angle between axes. The D stem and anticodon
stem are relatively coaxial in the tRNA three-dimen-
sional structure. In this case, the average angle between
the anticodon stem ending in an imino A:G base-pair
and the D stem axes is 17.17 
and the axis displacement
is 3.36 AÊ for the eight structures studied. These values
were used as a baseline to determine whether the axes in
other structures were also coaxially stacked, accounting
for a range of normal base-pair helicoidal parameters at
the junctions. For the analysis of the full set of 21
examples, we considered two helices to be coaxial when
the angle between them was less than 30 
and the helix
displacement was less than 5 AÊ .
Note Added in Proof
A re-analysis of the 50 S ribosomal crystal struc-
ture revealed that the 2650 helix in 23 S rRNA
(Figure 1(c), page 740) is coaxially stacked, and
thus should be colored yellow and not brown. The
counts of coaxially stacked helices on pages 748
and 749 have been corrected.
Acknowledgments
We greatly appreciate the constructive comments
from both reviewers. This work was supported by the
NIH (GM48207, awarded to R.R.G.; GM56544, awarded
to S.C.H.) and from startup funds from the Institute for
Cellular and Molecular Biology at the University of
Texas at Austin and the Welch Foundation (both
awarded to R.R.G.).
References
1. Mathews, D. H., Sabina, J., Zuker, M.  Turner,
D. H. (1999). Expanded sequence dependence of
thermodynamic parameters improves prediction of
RNA secondary structure. J. Mol. Biol. 288, 911-940.
2. Zuker, M., Mathews, D. H.  Turner, D. H. (1999).
Algorithms and thermodynamics for RNA second-
ary structure prediction: a practical guide. In RNA
Biochemistry and Biotechnology (Barciszewski, J. 
Clark, B. F. C., eds), pp. 11-43, Kluwer Academic
Publishers.
3. Konings, D. A. M.  Gutell, R. R. (1995). A compari-
son of thermodynamic foldings with comparatively
derived structures of 16 S and 16 S-like rRNAs.
RNA, 1, 559-574.
4. Fields, D. S.  Gutell, R. R. (1996). An analysis of
large rRNA sequences folded by a thermodynamic
method. Fold. Des. 1, 419-430.
5. Woese, C. R.  Pace, N. R. (1993). Probing RNA
structure, function, and history by comparative anal-
ysis. In The RNA World (Gesteland, R. F.  Atkins,
J. F., eds), pp. 91-118, Cold Spring Harbor
Laboratory Press, Plainview, New York.
6. Gutell, R. R., Larsen, N.  Woese, C. R. (1994).
Lessons from an evolving rRNA: 16 S and 23 S
rRNA structures from a comparative perspective.
Microbiol. Rev. 58, 10-26.
7. Gutell, R. R. (1999). Comparative analysis of RNA
sequences. Nucl. Acids Symp. Ser. 41, 48-53.
8. Gutell, R. R. (1996). Comparative sequence analysis
and the structure of 16 S and 23 S rRNA. In Riboso-
mal RNA. Structure, Evolution, Processing, and Func-
tion in Protein Biosynthesis (Zimmerman, R. A. 
Dahlberg, A. E., eds), pp. 111-128, CRC Press, Boca
Raton.
9. Gautheret, D.  Gutell, R. R. (1997). Inferring the
conformation of RNA base pairs and triples from
patterns of sequence variation. Nucl. Acids Res. 25,
1559-1564.
10. Michel, F., Costa, M., Massire, C.  Westhof, E.
(2000). Modeling RNA tertiary structure from pat-
terns of sequence variation. Methods Enzymol. 317,
491-510.
11. Woese, C. R., Winker, S.  Gutell, R. R. (1990).
Architecture of ribosomal RNA: constraints on the
sequence of tetra-loops. Proc. Natl Acad. Sci. USA,
87, 8467-8471.
12. Gutell, R. R., Noller, H. F.  Woese, C. R. (1986).
Higher order structure in ribosomal RNA. EMBO J.
5, 1111-1113.
13. Lehnert, V., Jaeger, L., Michel, F.  Westhof, E.
(1996). New loop-loop tertiary interactions in self-
splicing introns of subgroup IC and ID: a complete
3D model of the Tetrahymena thermophila ribozyme.
Chem. Biol. 3, 993-1009.
14. Gautheret, D., Konings, D.  Gutell, R. R. (1995).
G.U base pairing motifs in ribosomal RNA. RNA, 1,
807-814.
15. Gautheret, D., Konings, D.  Gutell, R. R. (1994). A
major family of motifs involving G.A mismatches in
ribosomal RNA. J. Mol. Biol. 242, 1-8.
16. Wimberly, B. (1994). A common RNA loop motif as
a docking module and its function in the hammer-
head ribozyme. Nature Struct. Biol. 1, 820-827.
17. Leontis, N. B.  Westhof, E. (1998). A common
motif organizes the structure of multi-helix loops in
16 S and 23 S ribosomal RNAs. J. Mol. Biol. 283, 571-
583.
18. Gutell, R. R., Cannone, J. J., Konings, D. 
Gautheret, D. (2000). Predicting U-turns in riboso-
mal RNA with comparative sequence analysis.
J. Mol. Biol. 300, 791-803.
19. Michel, F.  Westhof, E. (1990). Modeling of the
three-dimensional architecture of group I catalytic
introns based upon comparative sequence analysis.
J. Mol. Biol. 216, 585-610.
20. Gautheret, D., Damberger, S. H.  Gutell, R. R.
(1995). Identi®cation of base triples in RNA using
comparative sequence analysis. J. Mol. Biol. 248, 27-
43.
A:A and A:G Base-pairs at the Ends of RNA Helices 751
21. Jaeger, L., Michel, F.  Westhof, E. (1994). Involve-
ment of a GNRA tetraloop in long-range RNA
tertiary interactions. J. Mol. Biol. 236, 1271-1276.
22. Costa, M.  Michel, F. (1995). Frequent use of the
same tertiary motif by self-folding RNAs. EMBO J.
14, 1276-1285.
23. Costa, M.  Michel, F. (1997). Rules for RNA recog-
nition of GNRA tetraloops deduced by in vitro selec-
tion: comparison with in vivo evolution. EMBO J. 16,
3289-3302.
24. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K.,
Golden, B. L., Szewczak, A. A., Kundrot, C. E.,
Cech, T. R.  Doudna, J. A. (1996). RNA tertiary
structure mediation by adenosine platforms. Science,
273, 1696-1699.
25. Gutell, R. R., Cannone, J. J., Shang, Z., Du, Y. 
Serra, M. (2000). A story: unpaired adenosine bases
in ribosomal RNAs. J. Mol. Biol. 304, 335-354.
26. Hermann, T.  Patel, D. J. (1999). Stitching together
RNA tertiary architectures. J. Mol. Biol. 294, 829-849.
27. Moore, P. B. (1999). Structural motifs in RNA. Annu.
Rev. Biochem. 68, 287-300.
28. Traub, W.  Sussman, J. L. (1982). Adenine-guanine
base pairing ribosomal RNA. Nucl. Acids Res. 10,
2701-2708.
29. Woese, C. R., Gutell, R., Gupta, R.  Noller, H. F.
(1983). Detailed analysis of the higher-order
structure of 16 S-like ribosomal ribonucleic acids.
Microbiol. Rev. 47, 621-669.
30. Conn, G. L., Draper, D. E., Lattman, E. E.  Gittis,
A. G. (1999). Crystal structure of a conserved riboso-
mal protein-RNA complex. Science, 284, 1171-1174.
31. Ban, N., Nissen, P., Hansen, J., Moore, P. B. 
Steitz, T. A. (2000). The complete atomic structure of
the large ribosomal subunit at 2.4 AÊ resolution.
Science, 289, 905-920.
32. Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J.,
Gluehmann, M., Janell, D., Bashan, A., Bartels, H.,
Agmon, I., Franceschi, F.  Yonath, A. (2000).
Structure of functionally activated small ribosomal
subunit at 3.3 AÊ resolution. Cell, 102, 615-623.
33. Wimberly, B. T., Brodersen, D. E., Clemons, W. M.,
Jr, Morgan-Warren, R. J., Carter, A. P., Vonhein, C.,
Hartsch, T.  Ramakrishnan, V. (2000). Structure of
the 30 S ribosomal subunit. Nature, 407, 327-339.
34. Burkard, M. E., Turner, D. H.  Tinoco, I., Jr (1999).
Structures of base pairs involving at least two
hydrogen bonds. In The RNA World (Gesteland, R. F.,
Cech, T. R.  Atkins, J. F., eds), 2nd edit., pp. 675-
680, Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, New York.
35. Berman, H. M., Westbrook, J., Feng, Z., Gilliland,
G., Bhat, T. N., Weissig, H., Shindyalov, I. N. 
Bourne, P. E. (2000). The Protein Data Bank. Nucl.
Acids Res. 28, 235-242.
36. Golden, B. L., Gooding, A. R., Podell, E. R.  Cech,
T. R. (1998). A preorganized active site in the crystal
structure of the Tetrahymena ribozyme. Science, 282,
259-264.
37. Wu, M.  Turner, D. H. (1996). Solution structure of
(rGCGGACGC)2 by two-dimensional NMR and the
iterative relaxation matrix approach. Biochemistry,
35, 9677-9689.
38. Rowsell, S., Stonehouse, N. J., Convery, M. A.,
Adams, C. J., Ellington, A. D., Hirao, I., Peabody,
D. S., Stockley, P. G.  Phillips, S. E. (1998). Crystal
structures of a series of RNA aptamers complexed
to the same protein target. Nature Struct. Biol. 5, 970-
975.
39. Peterson, R. D.  Feigon, J. (1996). Structural change
in Rev responsive element RNA of HIV-1 on bind-
ing Rev peptide. J. Mol. Biol. 264, 863-877.
40. Battiste, J., Mao, H., Rao, N., Tan, R., Muhandiram,
D., Kay, L., Frankel, A.  Williamson, J. (1996).
Alpha helix-RNA major groove recognition in an
HIV-1 rev peptide-RRE RNA complex. Science, 273,
1547-1551.
41. Ye, X., Gorin, A., Ellington, A. D.  Patel, D. J.
(1996). Deep penetration of an alpha-helix into a
widened RNA major groove in the HIV-1 rev pep-
tide-RNA aptamer complex. Nature Struct. Biol. 3,
1026-1033.
42. Jucker, F. M., Heus, H. A., Yip, P. F., Moors, E. H.
 Pardi, A. (1996). A network of heterogeneous
hydrogen bonds in GNRA tetraloops. J. Mol. Biol.
264, 968-980.
43. SantaLucia, J. J.  Turner, D. H. (1993). Structure of
(rGGCGAGCC)2 in solution from NMR and
restrained molecular dynamics. Biochemistry, 32,
12612-12623.
44. Nagaswamy, U., Voss, N., Zhang, Z.  Fox, G. E.
(2000). Database of non-canonical base pairs found
in known RNA structures. Nucl. Acids Res. 28, 375-
376.
45. Shen, L. X.  Tinoco, I. J. (1995). The structure of an
RNA pseudoknot that causes ef®cient frameshifting
in mouse mammary tumor virus. J. Mol. Biol. 247,
963-978.
46. Kang, H., Hines, J. V.  Tinoco, I. J. (1996). Confor-
mation of a non-frameshifting RNA pseudoknot
from mouse mammary tumor virus. J. Mol. Biol. 259,
135-147.
47. Burkard, M. E., Kierzek, R.  Turner, D. H. (1999).
Thermodynamics of unpaired terminal nucleotides
on short RNA helixes correlates with stacking at
helix termini in larger RNAs. J. Mol. Biol. 290, 967-
982.
48. Sussman, J. L., Holbrook, S. R., Warrant, R. W.,
Church, G. M.  Kim, S.-H. (1978). Crystal structure
of yeast phenylalanine T-RNA. I. Crystallographic
re®nement. J. Mol. Biol. 123, 607-630.
49. Wimberly, B. T., Guymon, R., McCutcheon, J. P.,
White, S. W.  Ramakrishnan, V. (1999). A detailed
view of a ribosomal active site: the structure of the
L11-RNA complex. Cell, 97, 491-502.
50. Lavery, R.  Sklenar, H. (1988). The de®nition of
generalized helicoidal parameters and of axis curva-
ture for irregular nucleic acids. J. Biomol. Struct.
Dynam. 6, 63-91.
51. Lavery, R.  Sklenar, H. (1989). De®ning the
structure of irregular nucleic acids: conventions and
principles. J. Biomol. Struct. Dynam. 6, 655-667.
52. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K.,
Golden, B. L., Kundrot, C. E., Cech, T. R. 
Doudna, J. A. (1996). Crystal structure of a group I
ribozyme domain: principles of RNA packing.
Science, 273, 1678-1685.
53. Allain, F. H., Howe, P. W., Neuhaus, D.  Varani,
G. (1997). Structural basis of the RNA-binding speci-
®city of human U1A protein. EMBO J. 16, 5764-
5772.
54. Nissen, P., Ippolito, J. A., Ban, N., Moore, P. B. 
Steitz, T. A. (2001). RNA tertiary interactions in the
large ribosomal subunit: the A-minor motif. Proc.
Natl Acad. Sci. USA, 98, 4899-4903.
55. Doherty, E. A., Batey, R. T., Masquida, B. 
Doudna, J. A. (2001). A universal mode of helix
packing in RNA. Nature Struct. Biol. 8, 339-343.
752 A:A and A:G Base-pairs at the Ends of RNA Helices
56. Gutell, R. R., Weiser, B., Woese, C. R.  Noller, H. F.
(1985). Comparative anatomy of 16 S-like ribosomal
RNA. Prog. Nucl. Acid Res. Mol. Biol. 32, 155-216.
57. Varani, G., Wimberly, B.  Tinoco, I. J. (1989). Con-
formation and dynamics of an RNA internal loop.
Biochemistry, 28, 7760-7772.
58. Wimberly, B., Varani, G.  Tinoco, I. J. (1993). The
conformation of loop E of eukaryotic 5S ribosomal
RNA. Biochemistry, 32, 1078-1087.
59. Szewczak, A. A., Moore, P. B., Chang, Y. L.  Wool,
I. G. (1993). The conformation of the sarcin/ricin
loop from 28S ribosomal RNA. Proc. Natl Acad. Sci.
USA, 90, 9581-9585.
60. Correll, C. C., Munishkin, A., Chan, Y. L., Ren, Z.,
Wool, I. G.  Steitz, T. A. (1998). Crystal structure
of the ribosomal RNA domain essential for binding
elongation factors. Proc. Natl Acad. Sci. USA, 95,
13436-13441.
61. Correll, C. C.  Munishkin, W. I. (1999). The two
faces of the Escherichia coli 23 S rRNA Sarcin/Ricin
domain: the structure at 1.11 AÊ resolution. J. Mol.
Biol. 292, 275-287.
62. SantaLucia, J. J., Kierzek, R.  Turner, D. H. (1990).
Effects of GA mismatches on the structure and ther-
modynamics of RNA internal loops. Biochemistry, 29,
8813-8819.
63. Quigley, G. J.  Rich, A. (1976). Structural domains
of transfer RNA molecules. Science, 194, 796-806.
64. Kim, S.-H. (1979). Crystal structure of yeast tRNA-
phe and general structural features of other tRNAs.
In Transfer RNA: Structure, Properties, and Recognition
(Schimmel, P. R., Soll, D.  Abelson, J. N., eds), pp.
83-100, Cold Spring Harbor Laboratory Press, Cold
Spring Harbor, New York.
65. Walter, A. E.  Turner, D. H. (1994). Sequence
dependence of stability for coaxial stacking of RNA
helixes with Watson-Crick base paired interfaces.
Biochemistry, 33, 12715-12719.
66. Walter, A. E., Turner, D. H., Kim, J., Lyttle, M. H.,
MuÈller, P., Mathews, D. H.  Zuker, M. (1994).
Coaxial stacking of helixes enhances binding of
oligoribonucleotides and improves predictions of
RNA folding. Proc. Natl Acad. Sci. USA, 91, 9218-
9222.
67. Kim, J., Walter, A. E.  Turner, D. H. (1996). Ther-
modynamics of coaxially stacked helixes with GA
and CC mismatches. Biochemistry, 35, 13753-13761.
68. Carter, A. P., Clemons, W. M., Brodersen, D. E.,
Morgan-Warren, R. J., Wimberly, B. T. 
Ramakrishnan, V. (2000). Functional insights from
the structure of the 30S ribosomal subunit and its
interactions with antibiotics. Nature, 407, 340-348.
69. Carter, A. P., Clemons, W. M., Jr., Brodersen, D. E.,
Morgan-Warren, R. J., Hartsch, T., Wimberly, B. T.
 Ramakrishnan, V. (2001). Crystal structure of an
initiation factor bound to the 30S ribosomal subunit.
Science, 291, 498-501.
70. Ogle, J. M., Brodersen, D. E., Clemons, W. M., Jr,
Tarry, M. J., Carter, A. P.  Ramakrishnan, V.
(2001). Recognition of cognate transfer RNA by the
30 S ribosomal subunit. Science, 292, 897-902.
71. Woese, C. R. (1980). Just so stories and rube gold-
berg machines: speculations on the origins of the
protein synthetic machinery. In Ribosomes: Structure,
Function, and Genetics (Chambliss, G., Craven, G. R.,
Davies, J., Davis, K., Kahan, L.  Nomura, M., eds),
pp. 357-373, University Park Press, Baltimore,
Maryland.
72. Frank, J.  Agrawal, R. K. (2000). A ratchet-like
inter-subunit reorganization of the ribosome during
translocation. Nature, 406, 318-322.
73. Sayle, R. A.  Milner-White, E. J. (1995). RASMOL:
biomolecular graphics for all. Trends Biochem. Sci. 20,
374.
74. Bernstein, H. J. (2000). Recent changes to RasMol,
recombining the variants. Trends Biochem. Sci. 25,
453-455.
Edited by J. Doudna
(Received 27 December 2000; received in revised form 14 May 2001; accepted 29 May 2001)
A:A and A:G Base-pairs at the Ends of RNA Helices 753

Weitere ähnliche Inhalte

Andere mochten auch

Política y ciudadanía
Política y ciudadaníaPolítica y ciudadanía
Política y ciudadaníaDamian Tacconi
 
программа школы итог 1
программа школы итог 1программа школы итог 1
программа школы итог 1Сергей Гиль
 
Infografik Marketing: Vom Brainstorming zum ROI (Websiteboosting)
Infografik Marketing: Vom Brainstorming zum ROI (Websiteboosting)Infografik Marketing: Vom Brainstorming zum ROI (Websiteboosting)
Infografik Marketing: Vom Brainstorming zum ROI (Websiteboosting)Lena Saldern (Sikorska)
 
Ha07 t01 powerpoint
Ha07 t01 powerpointHa07 t01 powerpoint
Ha07 t01 powerpointAlmaGrimaldo
 
Crear Valor - Producir ofertas - Comenzar algo ORT 2016
Crear Valor - Producir ofertas - Comenzar algo  ORT 2016Crear Valor - Producir ofertas - Comenzar algo  ORT 2016
Crear Valor - Producir ofertas - Comenzar algo ORT 2016Fernando Johann
 
Black + white street style
Black + white street styleBlack + white street style
Black + white street styleAffinio
 
Portfolio new technologies
Portfolio new technologiesPortfolio new technologies
Portfolio new technologiescirauqui
 
Tema 6 la_organizacion_de_la_empresa
Tema 6 la_organizacion_de_la_empresaTema 6 la_organizacion_de_la_empresa
Tema 6 la_organizacion_de_la_empresaValentina Velez
 
Gutell 076.curr.genetics.2001.40.0082
Gutell 076.curr.genetics.2001.40.0082Gutell 076.curr.genetics.2001.40.0082
Gutell 076.curr.genetics.2001.40.0082Robin Gutell
 
Gutell 074.jmb.2000.304.0335
Gutell 074.jmb.2000.304.0335Gutell 074.jmb.2000.304.0335
Gutell 074.jmb.2000.304.0335Robin Gutell
 
Quadrode validacaodehipoteseseq18maozinha 4
Quadrode validacaodehipoteseseq18maozinha 4Quadrode validacaodehipoteseseq18maozinha 4
Quadrode validacaodehipoteseseq18maozinha 4João Paulo Lopes
 

Andere mochten auch (15)

Política y ciudadanía
Política y ciudadaníaPolítica y ciudadanía
Política y ciudadanía
 
программа школы итог 1
программа школы итог 1программа школы итог 1
программа школы итог 1
 
Infografik Marketing: Vom Brainstorming zum ROI (Websiteboosting)
Infografik Marketing: Vom Brainstorming zum ROI (Websiteboosting)Infografik Marketing: Vom Brainstorming zum ROI (Websiteboosting)
Infografik Marketing: Vom Brainstorming zum ROI (Websiteboosting)
 
Ha07 t01 powerpoint
Ha07 t01 powerpointHa07 t01 powerpoint
Ha07 t01 powerpoint
 
Crear Valor - Producir ofertas - Comenzar algo ORT 2016
Crear Valor - Producir ofertas - Comenzar algo  ORT 2016Crear Valor - Producir ofertas - Comenzar algo  ORT 2016
Crear Valor - Producir ofertas - Comenzar algo ORT 2016
 
Black + white street style
Black + white street styleBlack + white street style
Black + white street style
 
Portfolio new technologies
Portfolio new technologiesPortfolio new technologies
Portfolio new technologies
 
Kalyan kumar cv
Kalyan kumar cvKalyan kumar cv
Kalyan kumar cv
 
Tema 6 la_organizacion_de_la_empresa
Tema 6 la_organizacion_de_la_empresaTema 6 la_organizacion_de_la_empresa
Tema 6 la_organizacion_de_la_empresa
 
Gutell 076.curr.genetics.2001.40.0082
Gutell 076.curr.genetics.2001.40.0082Gutell 076.curr.genetics.2001.40.0082
Gutell 076.curr.genetics.2001.40.0082
 
Gutell 074.jmb.2000.304.0335
Gutell 074.jmb.2000.304.0335Gutell 074.jmb.2000.304.0335
Gutell 074.jmb.2000.304.0335
 
Quadrode validacaodehipoteseseq18maozinha 4
Quadrode validacaodehipoteseseq18maozinha 4Quadrode validacaodehipoteseseq18maozinha 4
Quadrode validacaodehipoteseseq18maozinha 4
 
Las fracciones
Las fraccionesLas fracciones
Las fracciones
 
IDCC 2397 Annexe 1
IDCC 2397 Annexe 1IDCC 2397 Annexe 1
IDCC 2397 Annexe 1
 
Tp matemática
Tp matemáticaTp matemática
Tp matemática
 

Ähnlich wie Gutell 075.jmb.2001.310.0735

Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Robin Gutell
 
Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Robin Gutell
 
Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Robin Gutell
 
Gutell 081.cosb.2002.12.0301
Gutell 081.cosb.2002.12.0301Gutell 081.cosb.2002.12.0301
Gutell 081.cosb.2002.12.0301Robin Gutell
 
Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Robin Gutell
 
Gutell 059.fold.design.01.0419
Gutell 059.fold.design.01.0419Gutell 059.fold.design.01.0419
Gutell 059.fold.design.01.0419Robin Gutell
 
Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Robin Gutell
 
Gutell 092.jmb.2004.344.1225
Gutell 092.jmb.2004.344.1225Gutell 092.jmb.2004.344.1225
Gutell 092.jmb.2004.344.1225Robin Gutell
 
Gutell 080.bmc.bioinformatics.2002.3.2
Gutell 080.bmc.bioinformatics.2002.3.2Gutell 080.bmc.bioinformatics.2002.3.2
Gutell 080.bmc.bioinformatics.2002.3.2Robin Gutell
 
Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Robin Gutell
 
Gutell 002.nar.1981.09.06167
Gutell 002.nar.1981.09.06167Gutell 002.nar.1981.09.06167
Gutell 002.nar.1981.09.06167Robin Gutell
 
Gutell 062.jmb.1997.267.1104
Gutell 062.jmb.1997.267.1104Gutell 062.jmb.1997.267.1104
Gutell 062.jmb.1997.267.1104Robin Gutell
 
Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Robin Gutell
 
Gutell 016.pnas.1989.086.03119
Gutell 016.pnas.1989.086.03119Gutell 016.pnas.1989.086.03119
Gutell 016.pnas.1989.086.03119Robin Gutell
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug designReihaneh Safavi
 
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocrGutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocrRobin Gutell
 
Protein Secondary Structure Prediction using HMM
Protein Secondary Structure Prediction using HMMProtein Secondary Structure Prediction using HMM
Protein Secondary Structure Prediction using HMMAbhishek Dabral
 
Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Robin Gutell
 
Gutell 083.jmb.2002.321.0215
Gutell 083.jmb.2002.321.0215Gutell 083.jmb.2002.321.0215
Gutell 083.jmb.2002.321.0215Robin Gutell
 

Ähnlich wie Gutell 075.jmb.2001.310.0735 (20)

Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065Gutell 085.jmb.2003.325.0065
Gutell 085.jmb.2003.325.0065
 
Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978Gutell 098.jmb.2006.360.0978
Gutell 098.jmb.2006.360.0978
 
Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430Gutell 068.rna.1999.05.1430
Gutell 068.rna.1999.05.1430
 
Gutell 081.cosb.2002.12.0301
Gutell 081.cosb.2002.12.0301Gutell 081.cosb.2002.12.0301
Gutell 081.cosb.2002.12.0301
 
Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559Gutell 061.nar.1997.25.01559
Gutell 061.nar.1997.25.01559
 
Gutell 059.fold.design.01.0419
Gutell 059.fold.design.01.0419Gutell 059.fold.design.01.0419
Gutell 059.fold.design.01.0419
 
Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701Gutell 054.jmb.1996.256.0701
Gutell 054.jmb.1996.256.0701
 
Gutell 092.jmb.2004.344.1225
Gutell 092.jmb.2004.344.1225Gutell 092.jmb.2004.344.1225
Gutell 092.jmb.2004.344.1225
 
Gutell 080.bmc.bioinformatics.2002.3.2
Gutell 080.bmc.bioinformatics.2002.3.2Gutell 080.bmc.bioinformatics.2002.3.2
Gutell 080.bmc.bioinformatics.2002.3.2
 
Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105Gutell 090.bmc.bioinformatics.2004.5.105
Gutell 090.bmc.bioinformatics.2004.5.105
 
Gutell 002.nar.1981.09.06167
Gutell 002.nar.1981.09.06167Gutell 002.nar.1981.09.06167
Gutell 002.nar.1981.09.06167
 
Gutell 062.jmb.1997.267.1104
Gutell 062.jmb.1997.267.1104Gutell 062.jmb.1997.267.1104
Gutell 062.jmb.1997.267.1104
 
Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785Gutell 025.nar.1992.20.05785
Gutell 025.nar.1992.20.05785
 
Gutell 016.pnas.1989.086.03119
Gutell 016.pnas.1989.086.03119Gutell 016.pnas.1989.086.03119
Gutell 016.pnas.1989.086.03119
 
Application of graph theory in drug design
Application of graph theory in drug designApplication of graph theory in drug design
Application of graph theory in drug design
 
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocrGutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
Gutell 053.book r rna.1996.dahlberg.zimmermann.p111-128.ocr
 
Protein Secondary Structure Prediction using HMM
Protein Secondary Structure Prediction using HMMProtein Secondary Structure Prediction using HMM
Protein Secondary Structure Prediction using HMM
 
Lanjutan kimed
Lanjutan kimedLanjutan kimed
Lanjutan kimed
 
Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010Gutell 034.mr.1994.58.0010
Gutell 034.mr.1994.58.0010
 
Gutell 083.jmb.2002.321.0215
Gutell 083.jmb.2002.321.0215Gutell 083.jmb.2002.321.0215
Gutell 083.jmb.2002.321.0215
 

Mehr von Robin Gutell

Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Robin Gutell
 
Gutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigGutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigRobin Gutell
 
Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Robin Gutell
 
Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Robin Gutell
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Robin Gutell
 
Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Robin Gutell
 
Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Robin Gutell
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Robin Gutell
 
Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Robin Gutell
 
Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Robin Gutell
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Robin Gutell
 
Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Robin Gutell
 
Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Robin Gutell
 
Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Robin Gutell
 
Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Robin Gutell
 
Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Robin Gutell
 
Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Robin Gutell
 
Gutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodGutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodRobin Gutell
 
Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Robin Gutell
 
Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Robin Gutell
 

Mehr von Robin Gutell (20)

Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803Gutell 123.app environ micro_2013_79_1803
Gutell 123.app environ micro_2013_79_1803
 
Gutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfigGutell 118.plos_one_2012.7_e38203.supplementalfig
Gutell 118.plos_one_2012.7_e38203.supplementalfig
 
Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473Gutell 114.jmb.2011.413.0473
Gutell 114.jmb.2011.413.0473
 
Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22Gutell 117.rcad_e_science_stockholm_pp15-22
Gutell 117.rcad_e_science_stockholm_pp15-22
 
Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011
 
Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011Gutell 115.rna2dmap.bibm11.pp613-617.2011
Gutell 115.rna2dmap.bibm11.pp613-617.2011
 
Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768Gutell 113.ploso.2011.06.e18768
Gutell 113.ploso.2011.06.e18768
 
Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497Gutell 112.j.phys.chem.b.2010.114.13497
Gutell 112.j.phys.chem.b.2010.114.13497
 
Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485Gutell 111.bmc.genomics.2010.11.485
Gutell 111.bmc.genomics.2010.11.485
 
Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195Gutell 110.ant.v.leeuwenhoek.2010.98.195
Gutell 110.ant.v.leeuwenhoek.2010.98.195
 
Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277Gutell 109.ejp.2009.44.277
Gutell 109.ejp.2009.44.277
 
Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200Gutell 107.ssdbm.2009.200
Gutell 107.ssdbm.2009.200
 
Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2Gutell 106.j.euk.microbio.2009.56.0142.2
Gutell 106.j.euk.microbio.2009.56.0142.2
 
Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043Gutell 105.zoologica.scripta.2009.38.0043
Gutell 105.zoologica.scripta.2009.38.0043
 
Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016Gutell 104.biology.direct.2008.03.016
Gutell 104.biology.direct.2008.03.016
 
Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535Gutell 103.structure.2008.16.0535
Gutell 103.structure.2008.16.0535
 
Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289Gutell 102.bioinformatics.2007.23.3289
Gutell 102.bioinformatics.2007.23.3289
 
Gutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.goodGutell 101.physica.a.2007.386.0564.good
Gutell 101.physica.a.2007.386.0564.good
 
Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533Gutell 100.imb.2006.15.533
Gutell 100.imb.2006.15.533
 
Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931Gutell 099.nature.2006.443.0931
Gutell 099.nature.2006.443.0931
 

Kürzlich hochgeladen

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Kürzlich hochgeladen (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Gutell 075.jmb.2001.310.0735

  • 1. AA.AG@Helix.Ends: A:A and A:G Base-pairs at the Ends of 16 S and 23 S rRNA Helices Tricia Elgavish1 , Jamie J. Cannone2 , Jung C. Lee3 , Stephen C. Harvey1 and Robin R. Gutell2 * 1 Department of Biochemistry and Molecular Genetics University of Alabama at Birmingham, Birmingham AL 35294, USA 2 Institute for Cellular and Molecular Biology, University of Texas at Austin, 2500 Speedway, Austin, TX 78712- 1095, USA 3 Division of Medicinal Chemistry, College of Pharmacy, University of Texas at Austin, Austin TX 78712, USA This study reveals that AA and AG oppositions occur frequently at the ends of helices in RNA crystal and NMR structures in the PDB database and in the 16 S and 23 S rRNA comparative structure models, with the G usually 3H to the helix for the AG oppositions. In addition, these opposi- tions are frequently base-paired and usually in the sheared conformation, although other conformations are present in NMR and crystal structures. These A:A and A:G base-pairs are present in a variety of structural environments, including GNRA tetraloops, E and E-like loops, interfaced between two helices that are coaxially stacked, tandem G:A base-pairs, U-turns, and adenosine platforms. Finally, given structural studies that reveal conformational rearrangements occurring in regions of the RNA with AA and AG oppositions at the ends of helices, we suggest that these conformationally unique helix extensions might be associated with functionally important structural rearrangements. # 2001 Academic Press Keywords: ribosomal RNA structure; comparative sequence analysis; A:A and A:G base-pairs (non-canonical pairs); structure motifs; computational biology/bioinformatics (coaxial stacking)*Corresponding author Introduction Our ultimate goal is to accurately predict RNA secondary and tertiary structure from its sequence. To begin to achieve this objective, we need a detailed set of RNA structure rules and principles that relate sequences to small structural elements as well as to global structure. Given that the num- ber of possible secondary structures for an RNA sequence is very large (http://www.rna.icmb.utex- as.edu/METHODS/) and the current set of RNA structure principles within the best of the RNA folding algorithms1,2 are not adequate to achieve these goals,3,4 we have utilized comparative sequence analysis5,6 to identify those base-pairs that would form similar structures for a set of sequences considered to be structurally and func- tionally equivalent. Traditionally, we have searched for positions in a sequence alignment with similar patterns of variation (also called co- variation). Due to the strong congruence between these covariation-based comparative structure models and crystal structure solutions7 (Gutell et al., unpublished results), we are very con®dent in the authenticity of these proposed base-pairs. While the majority of the positions that covary with one another are associated with secondary structure base-pairs, there are a few short- and long-range tertiary interactions in the rRNAs8 (CRW Site; see Materials and Methods). We now aspire to predict additional base-pairings at the positions that are not base-paired in the covaria- tion-based structure models. These base-pairs would add more secondary structure to the current comparative structure models and fold this model into a three-dimensional structure. Both of these latter aspirations will require a different type of comparative sequence analysis that goes beyond simple covariation analysis. Operationally, we de®ne comparative sequence analysis as the general method that identi®es struc- tures that are common to different sequences, while covariation analysis is the method that ident- i®es positions in a sequence alignment with similar patterns of variation. Covariation analysis will identify a subset of the total number of base-pairs that are in common to different sequences. While this latter type of analysis identi®es structurally E-mail address of the corresponding author: robin.gutell@mail.utexas.edu Abbreviations used: PDB, Protein Data Bank. doi:10.1006/jmbi.2001.4807 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 310, 735±753 0022-2836/01/040735±19 $35.00/0 # 2001 Academic Press
  • 2. isomorphic base-pairs (e.g. A:U, G:C, C:G, and U:A) from the identi®cation of positions with simi- lar patterns of variation in a sequence alignment, it is possible to form isomorphic base-pair confor- mations from two positions that have different pat- terns of variation. To identify these, we need to know, a priori, the base-pair exchanges (e.g. G:U to G:C or A:G to A:A) that will form isomorphic base-pair conformations within a speci®c structural context. A few years ago, we developed a compu- ter program that would return the isomorphic base-pair conformations that are possible for any known set of pairing types.9 However, this system by itself will not help us to identify new base-pairs at positions with no matching pattern of variation since, without additional information, we do not know which positions to base-pair. Ultimately, we need to have a larger set of structural constraints that will help us decipher the unique patterns of variation into isomorphic structures. Beyond the canonical base-pairs (A:U, G:C, G:U) that are arranged into the standard secondary structure helices and tertiary interactions, several other RNA structural motifs have been identi®ed with a sequence analysis perspective.5,6,8,10 These include tetraloops,11 lone-pair tri-loops,8 pseudoknots,6,12,13 dominant G:U base-pairs,14 tan- dem G:A base-pairs,15 E-loops,15± 17 U-turns,18 base triples,19,20 tetraloop receptors,21 ±23 adenosine platforms,24,25 and base-pairs arranged in parallel.6 A structural perspective of these RNA motifs is presented in two recent reviews.26,27 In addition to the comparative sequence analysis of these RNA motifs, it was ®rst observed in the early 1980's that helices in Escherichia coli 16 S rRNA were frequently ¯anked by AG oppositions.28,29 Consistent with this observation, it was observed that the majority of the 3H ends of loops are an adenosine while the 5H ends of loops are an adenosine or guanosine in the covariation- based 16 S and 23 S rRNA structure models.25 An AG opposition (where an opposition refers to two bases on opposite strands at the end of a helix that are in proximity with one another) at positions 1056:1103 (E. coli numbering) is base- paired in the crystal structure for the L11 binding fragment of 23 S rRNA.30 Position 1056 is a G in the majority of the Bacteria, Archaea, and chloro- plasts, while it is an A in the majority of the Eucar- ya. Position 1103 is an A in nearly all of the Bacteria, Archaea, Eucarya, and chloroplasts. Thus, from a comparative perspective, we expect the majority of the Eucarya with an A at position 1056 to form an A1056:A1103 base-pair. The experimen- tal support for this A:G base-pair, in addition to the earlier AG sightings at the ends of E. coli 16 S rRNA helices and the bias for unpaired As and Gs at the ends of helices, suggested that many helices in the rRNAs might be ¯anked with A:G and A:A base-pairs. During the preparation of this manu- script, high-resolution crystal structures were determined for the 30 S and 50 S ribosomal sub- units.31± 33 Our objectives for this paper are: (1) to identify the conserved AA and AG oppositions at the helix ends in the comparative structure models for 16 S and 23 S rRNA, (2) to determine if AA and AG oppositions are base-paired in all RNA crystal and NMR structures that contain an AA or AG at the end of a standard helix, and (3) to deter- mine the conformations for these A:A or A:G base- pairs. Results Comparative sequence analysis of the ends of rRNA helices The nucleotide frequencies at the positions ¯ank- ing the ends of all helices in our 16 S and 23 S rRNA alignments (see Materials and Methods and CRW Site) were determined for the nuclear encoded rRNAs from the three major phylogenetic groups (Bacteria, Archaea, and Eucarya) and the two Eucarya organelles (chloroplasts and mito- chondria). Only helix ends in the Bacteria with an AA, AG, or AA/AG in more than 90 % of the sequences were scored as candidates. Since approximately 90 % of the AG oppositions have the G 3H of the helix, we have focused on this orien- tation in this manuscript and in Table 1. However, a small number (eight in rRNA and 14 in the PDB structure database) of examples of AG oppositions where the G is 5H to the helix are discussed below. All oppositions were subdivided into two cat- egories: invariant and exchange. Invariant sites contain only AA or AG in the Bacterial alignment, while sites with both types of pairings (where the minimum for each pairing is 2 %) in at least one of the primary alignments (Archaea, Bacteria, Eucar- ya nuclear, chloroplast, or mitochondrial) were classi®ed as exchanges. These oppositions are mapped onto the December 1999 version of the E. coli 16 S and 23 S rRNA covariation-based struc- ture models (Figure 1; CRW Site). The base-pair frequencies for each of the AA and AG sites for each of the 16 S and 23 S alignments (Archaea, Bacteria, Eucarya nuclear, chloroplast, and mito- chondrial) are all available at our web site, CRW AA.AG (see Materials and Methods). There are 139 oppositions (as de®ned above) in the 16 S and 263 oppositions in the 23 S rRNA comparative structure models. In the hypothetical world where the frequency of each of the four nucleotides is 25 % at paired and unpaired pos- itions and there is no bias for any nucleotide pairs at these positions, for each opposition, we expect a 12.5 % (2/16) chance of ®nding an AA or AG. Thus, for any one rRNA sequence, we expect, based upon this random sampling, there to be approximately 17 (139 Â 0.125) sites in 16 S and 33 (263 Â 0.125) sites in 23 S rRNA with an AA or AG opposition at the end of a helix (referred to hereun- der as AA.AG@helix.ends). The expected number of AA and AG sites that occur at the same pos- itions in 90 % of 5850 Bacterial 16 S sequences is 1.7 Â 10À4755 , and for 325 Bacterial 23 S rRNA 736 A:A and A:G Base-pairs at the Ends of RNA Helices
  • 3. sequences the number is 7.0 Â 10À265 . Thus, we conclude that the odds of ®nding the same pattern in 90 % of the sequence sets by random chance are extremely low; however, 30 % of the oppositions at the ends of 16 S rRNA helices (42 of 139) and 28 % of the oppositions at the ends of 23 S rRNA helices (73 of 263) have an AA or AG opposition in at least 90 % of the sequences. Since the 1056:1103 base-pair in 23 S rRNA has a signi®cant number of AA and AG oppositions with a minimal number of alternative base-pairs, we have ¯agged this base-pair, along with other similar positions that also have a more signi®cant extent of A:A and A:G pairings. These sites are shown in Figure 1 with red and green asterisks on the 16 S and 23 S rRNA secondary structure dia- grams and within the AA/AG base-pair frequency tables (CRW AA.AG Online Table 4). The red asterisk sites contain only AA and AG in all of the Archaea, Bacteria, Eucarya nuclear and chloroplast alignments, with a minimum number of excep- tions. The 23 S rRNA 1056:1103 site contains sig- ni®cant amounts of AA/AG pairings in nearly all of the non-mitochondrial sequences; only a few sequences out of 582 do not have an AA or AG. The other red asterisk sites in 23 S rRNA are 627:636 and 2126:2162; sites with comparable nucleotide frequencies in 16 S rRNA are 780:802, 888:909, 959:976, 1408:1493, 1417:1483, and 1418:1482. The green asterisks (Figure 1; CRW AA.AG) reveal those sites with signi®cant amounts of AA/ AG exchanges with a minimal amount of other oppositions in at least one alignment while at least one other alignment contains a larger number of exceptions to the pure AA/AG exchange pattern. Green sites in the 16 S rRNA are: 26:557, 60:107, 197:220, 447:487 (with a large percentage of Wat- son-Crick/G:U base-pairs in the Archaea), 691:696, 860:869, 1157:1179, and 1304:1333. Green asterisk sites in 23 S rRNA are 244:254, 463:466, 602:655, 603:625, 637:651 (with a large percentage of Wat- son-Crick base-pairs in the Archaea), 861:916, 945:972, 975:988, 1000:1155, 1354:1377, 1655:2005, 1791:1828, 2125:2173, 2199:2224, 2287:2345, 2346:2371, 2358:2429, 2587:2607, and 2639:2775. Orientation of the AG oppositions There are two orientations possible for AG oppo- sitions relative to the helix to which they are adja- cent: the G can be 5H or 3H to the adjacent helix. The analysis of an early version of the E. coli 16 S rRNA comparative structure model revealed that Table 1. Distribution of AA/AG oppositions (with G 3H to helix for AG oppositions) in the bacterial 16 S and 23 S rRNA comparative structure models Loop type Hairpin Internal Multi-stem Opposition C[ ‡,ù, À ]a [S,I,O]b C[ ‡,ù, À ]a [S,I,O]b C[ ‡,ù, À ]a [S,I,O]b Coc Crd (%) 16 S rRNA Invariant 7[7,0,0] [7,0,0] 9[6,0,3] [3,2,1] 5[4,0,1] [0,2,2] 21 17 (81%) AA 0[0,0,0] [0,0,0] 5[2,0,3] [0,1,1] 1[1,0,0] [0,1,0] 6 3 (50%) AG 7[7,0,0] [7,0,0] 4[4,0,0] [3,1,0] 4[3,0,1] [0,1,2] 15 14 (93%) Exchange 2[2,0,0] [2,0,0] 10[9,1,0] [7,1,1] 9[4,0,5] [2,0,2] 20 15 (75%) Total 9[9,0,0] [9,0,0] 19[15,1,3] [10,3,2] 14[8,0,6] [2,2,4] 41 32 % xtal.str.e 9/9ˆ100% 15/18ˆ83% 8/14ˆ57% 32/41ˆ78% 23 S rRNA Invariant 11[9,2,0] [9,0,0] 13[10,2,1] [9,0,1] 13[6,1, 6] [5,1,0] 32 25 (78%) AA 0[0,0,0] [0,0,0] 4[1,2,1] [0,0,1] 4[0,0, 4] [0,0,0] 6 1 (17%) AG 11[9,2,0] [9,0,0] 9[9,0,0] [9,0,0] 9[6,1, 2] [5,1,0] 26 24 (92%) Exchange 4[2,1,1] [2,0,0] 12[8,3,1] [8,0,0] 20[9,6, 5] [7,0,2] 26 19 (74%) Total 15[11,3,1] [11,0,0] 25[18,5,2] [17,0,1] 33[15,7,11] [12,1,2] 58 44 % xtal.str.e 11/12ˆ92% 18/20ˆ90% 15/26ˆ58% 44/58ˆ76% rRNA Total 24[20,3,1] [20,0,0] 44[33,6,5] [27,3,3] 47[23,7,17] [14,3,6] 99 76 % xtal.str.e 20/21ˆ95% 33/38ˆ87% 23/40ˆ58% 76/99ˆ77% S: 20/20 (100%) S: 27/33 (82%) S: 14/23 (61%) S: 61/76 (80%) I: 3/33 (9%) I: 3/23 (13%) I: 6/76 (8%) O: 3/33 (9%) O: 6/23 (26%) O: 9/76 (12%) a C, number of predicted base-pairings based on the bacterial structure; ‡, number of predicted pairings in the crystal structure; ù, number of predicted pairings for which there is no homologous structure in the crystal structures (see the text for details); À, number of predicted pairings that are not present in the crystal structure. b Conformation of the base-pair: S, sheared; I, imino or imino-like; O, other. c Co, the total number of homologous base-pairs from that category in the comparative structure model. d Cr, the total number and percentage of base-pairs in the crystal structure. e The percentage of base-pairs predicted with comparative analysis that are present in the crystal structure [`` ‡ ``/(``C''-``ù``)]. Percentage of the base-pairs having the conformation: S, sheared; I, imino; O, other. A:A and A:G Base-pairs at the Ends of RNA Helices 737
  • 4. the G tends to be at the 3H end of the helix.28 Our analysis here of the most recent versions of a large number of phylogenetically diverse 16 S and 23 S rRNA comparative structure models is consistent with this earlier result. Of the invariant AG and AA/AG oppositions that ¯ank a helix, approxi- mately 87 are oriented with the G 3H to the helix, while eight AG oppositions have the G 5H to the helix. This result, as discussed later, is consistent with the types and frequencies of A:A and A:G base-pair conformations present in the crystal structures. 10 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 5’ 3’ I II III * * * * * * * * * * ** * * * * * * * * * a A A A U U G A A G A G U U U G A UCAUGGCUCAG A U U GA A C G C U GG C G G C A G G C C UA AC A C A U G C A A G U C G A A C G G U A A C A G G A A G A A G C U U GCUUCUUU G CUGAC G AGUGGC G G A CGG G U G A G U A A UG U C U G G G A A A C U G C C U G A U G G A G G G G G A U A A C U A C U G G A A ACGGUAGC U AAU A CCGC A U A A C G U C G CA A G A C C A A A GAGGGG GA CCU U C G G G C C U C U U G C C A U C G G A U G U G C C C A G A UG G G A UU A G C U A GU A G G U G G G G UA A C G G C U C A C C U A G G C G A C G A U C C C U A GCUG GUCU G A G A GGA U G A C C A GC C A C A CUGGAA CUG A G A CA C G G U C C A G A C U C C U A C G G G A G G C A G C A G U G G G G A A U AU U GCA CAA UGGGCG C A A G C C U G A U G C A GC C A U G C C G CGUGUAU G AAGA A GGCCU U C G G G U U G U A A A G U A C U U U C A G C G G GG A G GAA G G G A G U A A A GU U A A U A C C U U U G C U CA U U G A C G U U A C C C G C A G A A G A AG C A C CGGC UA A C U C C G ψ G C C A G C A G C C G C G G U A A U AC G G A G G G U G C A A G C G U U A A U C G G A A U U A C U G G G C GU A A A G C G C A CG CA G G C GGUUUGUU A AGUCAGAUGUG A AA U CCCCGGGCU C A A C C U G G G A A C U G C A U C U G A U A C U G G C A A G C U U G A G U C U C G U A G A G G G G G G U AGAAUUCCAGGU GUA GCGGU G A A A U G C G U A G A G A U C U G G A G G A A U A C C G G U G G C G A A GGCG G C C C C C U G G A C G A A G A C U G A C G C U C A G G U G C G A A A G C G U G GG G A G C A A A C A G G A U U A G A U A C C C U G G U A G U C C A C G C C G U A A A C G AU G U C G A C U U G G A G G U U G U G C C C U U G A G G C G U G G C U U C CG G A G C U A AC G CGU U A A GUCGAC C G C C U G G G G A G U A C G G C C G C A AGGUU AAAA CUC A A A U G A A U U G A C G G G G G C C C G C A C A A G C G G U G G A G C A U G U G G UU UAAU U C G A UGC A A C G C G A A G A A C C U U A C C U G G U CU U GA C A U C C A C G GAAGUUUUCAG A G A U G A G A A U G U G C C U U C G G G A A C C G U GA G A C A G G U G C U GC A U G G C U G U C G U C A GCUCGUG U U G UG A A A U G U U G G G U U A A G U C C C G C A A C G A G C G C A A C C C U U A U C C U U U G U U G C C A G C G G U C C G GCCGGG AACU CAAAGGA G A C U G C C A G U G AUA A A C U G G A G G A A G G UGGGGA U G A C G U C A A G U C A UC A U G G C C C U U A CG A C C A G G G C U A C A C A C G U G C U A C A A U G G C G C A U A C A A A G A G A A G C G A C C U C G C G A G A G C AA G C G G AC C U C A U AAAG U G C G U C G U A G U C C G G A U U G G A G U C U G C AAC U C G A C U C C A U G A A GU C G G A A U C G C U A G U A A U C G U G G A U C A GAA U G C C A C G G UG A A U A C GU U C C CGGGCCUUGU A CA C A C C G C C C G U C A C A C C A U G G G A G U G G G U U G C A A A A G A A G U A G G U A G C U U A A C C U U C G G G A G G G C G C U U A C C A C U U U G U G A U U C A U G A C U G G G G U GA AG U C G U A A C A A G G U A A C C G U A G G G G A ACCUGCGGUUG G A U C A C C U C C U U A Figure 1 (legend shown on page 741) 738 A:A and A:G Base-pairs at the Ends of RNA Helices
  • 5. II III 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 1640 2900 5’ 3’ 3’ half m1 m 5 (2407-2410) (2010-2011) (2018) (2057/2611 BP) (2016-2017) (2012) * * * * * * * * * * * * ** * * * * * * * * * b G G U U A A G C G A C UAAG C G U A C A C G G U G G A U G C C C U G G C A G U C A G A G G C G A U G A A G G AC G U G C UA A U C U G C G A U A A G C G U C G G U A A G G U G A U A U G A A C C GU U A UAA C C G G C G A U U U C C G A A U G G G G A A A C C C A G U G U G U U U C G A C A C A C U A U C A U U A A C U G A A U C C A U AG G U U A A U G A G G C G A A C C G G G G G A A C U G A A A C AUC UAAGU A CCCCGA G G A A A A G A A AU C A AC C G AGAU U C C C C C A G U A G C G G CG A G CG A A C G G G G A G C A G C C C A G A G C C U G A A U C A G U G U G U G U G U U A G U G G A A G C G U C U G G AA A G G C G C G C G A U AC A G G G U G ACA G C C C CG U A CAC AAA AAUGCACAUGCUG UGA GCUCGAUGA G U A G G G C G G G A C ACG U G G U AU C C U G U C U G A A U A U G G G G G G A C C A U C C U C C A A G G C U A A A U A CU C CUGACUG A CC G A U A GUGAACC A G U A CCG U G A G G G A A A G GCGAAAAGAACCCCG G C G A G G G GA GU GAA A A A GAA CC U G A A A C C G U G U A C G UACAAGCA G U G G G A G C A C G C UU A G G C G U G U G A C U G C G U A C C U UU U G U AUA AUGG GUCAGC G A C UU A U A U U C U G U A G C A A G G U U A A C C G A A U AGG GG AGCC G A AG G G AA A C C G AGUCUUA A C U G G G C G U U A A G U U G C A G G G U A U AG A C C CG A A AC C C G G U G A U C U A G C C A U G G G C A G G U U G A A G G U U G G G U A A CACUAACU G GA G GACC GAA C C G AC U A A U G ψU G A A A A A U U A G C G G A U G A C U U G U G G C U G G G GGU GA A A G GC C A A U C A AA C C G G GA G A UA G C UG G U U CUCCCC G A A A G C U A U U U AG G U A G CGC C U C G U G A A UU C A U C U C C G G G G G U A G A G C A CU G U U U C G G C A AG G G G G U C A UC C C G A C U U A C C A A C C C G A U G C A A A C U G C G A A U A C C G G A G A A U G U UA U C A C G G G AG A C A CACGGCGGGψGC U A A C G U C C G U C G U G A A G A G G G A A A C A A C C C A G A C C G C C A G C U A A G G UCC C A AA G U C A U G G U U A A G U G G G A A A C G A U G U G G G A A G G CCC A G A C A G C C A G G AUGUUGGC UUA G A A G C A G C C A U C A U U U A A A G A A A G C G U A A UA GCUC A C U G G U C G A G U C G G C C U G C G C G G A A G A U G U A A C G G G G CUAAA C C A U G C A C C G A A G C U G C G G C A G C G A C G C U U A U G C G U U G U U G G G U A G G G G A G C G U U C U G U A A G C C U G C G A A G G U G U G C U G U G A G G C A U G C U G G A G G U A U C A G A AG U G C G A A U G C U G A C A U A A G U A AC G A U A A A G C G G G U G A A A A G C C C G C U C G C C G G A A G A C C A A G GGUUCCUGUC CAA CGU U A A U C G G G G C A G G G U G A GU C G A CCCC UAA GGC G A G GCCG A A A G G C G U A G U C G A U G G G A A A C A G G U U A A U A U U C C U G U AC U U G G U G U U A C U G C G A A G G G G G G A C G G A G A A G G C U A U G U U G GCCGGG CGA C G G U U G U C C C G G U U U A AGCGU GUA GGCUGGUUUUCC A GGCA A A U C C G G A A A A U C A A G G C U G A G G C G U G A U G A C G A G G C A C U A C GGUGCUGAAGC A A C A A A U G C C C U G C U U C C A G GAAA A GCCUCUAAGC A UC A GGUAACAUCAAA U C G U A C CC CA A A C C G A C A CAGGUG G U C A G G U A G A G AAUACC A AG G C G C G C U U A A C C U U Figure 1 (legend shown on page 741) A:A and A:G Base-pairs at the Ends of RNA Helices 739
  • 6. IV V VI 5’ 3’ 1650 1700 1750 1800 1850 1900 1950 2000 2050 2100 2150 2200 2250 2300 2350 2400 2450 2500 2550 2600 2650 2700 2750 2800 2850 2900 5’ half m (1269-1270) (413-416) (1262-1263) (746) (531) (1268) * * * * * * * * * * * * * c G G U U A A G C U U G A GA G A A C U C G G G U G A A G GAACUAGGCAAAAUGGUGCC GUA ACU U C G G G A G A A G G C A C G C U G A U A U G U A GG U G A GG U C C C U C G C G G A U G G A G C U G A A A U C A G U C GA A G A U A C C A G C U G G C U G C A A C UGU UUA U U A A A A A C A C A G C A C U G U G C A A A C A C G A A A G U G G A C GU AU A C G G U G U G A C G C C U G C CC G G U G C C G GA A G G U U A A U U G A U G G G G U U A G C G C A A G C G A A G C U C U U G A U C G A A G C C C C G G U A AA C G G C G G C C G ψ A A C ψ A ψ A A C G G U C C U A A G G U A G C G A A A U U CCUUG U C G G G U AAG U U C C G A CCU G C A C G A A U GGCG U A AU GA U G G C C A G G C U G U C U C C A C C C G A G A C U C A GU G A A A U U G A A C U C GC U G UG A A G A UGCAGUG U A C C C G C G G C A A G A C G G A A A G A C C C C GU G A A C C U U U A C U A U A G C U U G A C A C U G A A C A U U G A G C C U U G A U G U G U A G G A U A G G U G G G A G G CU U U G A A G U G U G G A C G C C A G U C U G C A U G G A G C C G A C C U U GAAAU A CCACCC U U U A A U G U U U G A U G U U C U A A C G U U G A C C C G U A A UCCGGGUUGCG G ACAGU G U C U G G U G G G U A G U U U G A C U G GG G C G G U C U C C U C C U A A A G A G U A A C G G A G G A G C A C G A A G G U U G G C U A A U C C U G G U C G G A C A U C A G G A G G U U A G U GC A A U G G C A UA AG C C A G C U U G A C U G C G A G C G U G A C GGCGCGAGCAG G U G C G AA A G C A G GU C A U A GU G A U CC G G U G G U U C U G A A UG G A A G G G C C A U C GC U C A ACG G A U A AA A G G U A CU C C G G G G A D A A C A G G C ψ G A U A C C G C C C A A G A G U U C A UA UC GAC GGCGGUG UU UGGC A C C U C G A ψGUC G G C U C A U C A C A U C C U G G G G C U G A A G UAGGUCCC AA GGGU A U G G C U GUU C G C C A UU U A A A G U G G UA C GC GA G C ψ G GGUUU A G A A C G U C GU G A G A C A G U ψ C G G U C CC UA UCUGCCGUGGG C G C U G G A G A A C U G A G G G G GG C U G C U C C U A G U A C G A G A G GA C CG G A G U G G A C G C A UC A C U G GU G U U C G G G U U G U C A U G C CA A U G G C AC U G C C C GGU A G C U AA A U G C G G AAG A G A U AAG U G C U G A AAG C A U C U A A G C A C G A A A C U U G C C C C GAG A U G A G U U C U C C C U G A C C C U UU A A G G G U C CUGAAG G A A C G U U G A A G A C GA CGACG U U GAU A G G C C G G G U G U G U A AG C G C A G CG A U G C G U U G A G C U A A C C G G U A C U A A U G A A C CGUGA G G C U U A A C C U U Figure 1 (legend shown on page 741) 740 A:A and A:G Base-pairs at the Ends of RNA Helices
  • 7. An analysis of the helix ends in the crystal and NMR structures and in the 16 S and 23 S rRNA crystal structures AA.AG@helix.ends in rRNAs An analysis of approximately 6000 Bacterial 16 S and over 300 23 S rRNA sequences aligned for maximum structure similarity revealed 115 helix ends with AA, AG, and AA/AG oppositions in more than 90 % of the sequences (Table 1 and Figure 1). These are proportionately distributed in the 16 S and 23 S rRNAs, with 42 occurrences in 16 S and 73 in 23 S rRNA, and are present in the three loop categories, with 24 candidates in hair- pins, 44 in internal loops, and 47 in multi-stem loops. Invariant and exchange cases occur at nearly the same frequencies. 75 % of the invariant sites contain an AG opposition, while only 25 % have an AA (Table 1). In addition, there is a bias for invar- iant A:G base-pairs in hairpin loops (with the majority of these occurring in tetraloops11 ), and a slight bias for multi-stem loops to have AA/AG exchanges (Table 1). The nucleotide frequencies for a larger set of sequences (approximately 8500 16 S and over 1000 23 S rRNA sequences) that includes the nuclear encoded rRNAs in the three primary phylogenetic groups, Archaea, Bacteria and Eucar- ya, and the two Eucarya organelles, chloroplasts and mitochondria (see Online Table 4 at CRW AA.AG), reveal that the majority of the positions contain the AA and AG oppositions in all of the alignments and phylogenetic groups, while some of the AA and AG oppositions in the Bacteria con- tain AU/GC or other nucleotide sets in one or more of the non-bacterial alignments. For example, 23 S rRNA positions 637:651 and 713:718 both con- tain AG oppositions in nearly all of the Bacteria, and both exchange between G:C and C:G in the Archaea. During the preparation of this manuscript, the crystal structures for the 30 S32,33 and 50 S31 riboso- mal subunits were solved. We have analyzed these structures to determine if the AA and AG opposi- tions at the ends of helices that occur in more than 90 % of the known Bacterial rRNA sequences are base-paired in the crystal structures. A total of 99 of the 115 Bacterial-centric oppositions were resolved in the crystal structures and had homolo- gous positions in the Thermus thermophilus 16 S and Haloarcula marismortui 23 S rRNA crystal struc- tures; these are tabulated in Table 1 and high- lighted on the 16 S and 23 S rRNA secondary structure diagrams in Figure 1. Of these 99, 76 (77 %) form an A:A or A:G base-pair (78 % (32/41) in 16 S and 76 % (44/58) in 23 S rRNA). Invariant AG oppositions (41 examples) at the ends of helices occur more frequently than invariant AA opposi- tions (12 examples) in the 16 S and 23 S rRNAs (Table 1); our analysis of the rRNA crystal struc- tures reveals that the AG oppositions form base- pairs more frequently than the AA oppositions. The 99 homologous oppositions have a slightly biased distribution in the three unpaired loop cat- egories. A total of 40 % (40/99) occur in multi-stem loops, 38 % (38/99) in internal loops, and 21 % (21/ 99) in hairpin loops. A total of 20 of the 21 (95 %) homologous AA and AG candidates in hairpin loops are base- paired (Table 1 and Figure 1). GNRA tetraloops occur at 62 % (13/21) of these hairpin loops, and all of these have base-pairing between the ®rst and last nucleotide of this hairpin loop. As well, six of the seven (86 %) homologous hairpin loops with more than four nucleotides also have base-pairing at the two ends of the loop. Finally, all of these base-pairs are in the sheared conformation. For the AA and AG oppositions at the ends of helices in internal loops, 87 % (33/38) are base- paired (83 % and 90 % of the 16 S and 23 S rRNA candidates). In contrast with the hairpin loops, where 76 % (16/21) of the candidates have an invariant AG, 47 % (18/38) of the internal loops have an AA/AG exchange, while only 34 % (13/ 38) have an invariant AG. All of the invariant AG oppositions are base-paired, and all except one of these (92 %) form a sheared conformation. All but one of the 18 (94 %) AA/AG exchanges are also base-paired. 15 of the 17 (88 %) base-paired AA/ AG exchanges are in the sheared conformation, Figure 1. E. coli 16 S and 23 S rRNA comparative secondary structure models (based upon the sequences in Gen- Bank Accession no. J01695) showing the AA and AG oppositions at the ends of helices that occur in more than 90 % of the bacterial sequences. These opposed nucleotides are shown in red. Highlights indicate additional information from crystal structures: orange, opposition is base-paired in the crystal structure; green, candidate is not base-paired in the crystal structure; blue, candidate is not homologous, was not determined or is a Watson-Crick base-pair in the crystal structure (e. g. this region is deleted, or is not an AA or AG opposition in the sequence of the organism that was crystallized). Candidates with AA/AG exchanges are marked with asterisks: red, signi®cant exchanges in all alignments with minimal exceptions; green, signi®cant exchanges in at least one alignment with minimal exceptions but with more exceptions in at least one other alignment; blue, exchanges in at least one alignment (excluding mito- chondria). Nucleotides which are base-paired in the crystal structures but not in the comparative structure models which affect potential coaxial stacking and AA/AG oppositions that are not base-paired are colored blue and con- nected with blue lines and boxes to indicate the base-pairing. Highlights within helices indicate potential coaxial stacking: brown, not present in crystal structure; yellow, present in crystal structure. Base-pairs predicted with covar- iation analysis are denoted with - for canonical A:U and G:C base-pairs, small closed circles for G:U base-pairs, large open circles for G:A base-pairs, and large closed circles for non-canonical base-pairs. (a) 16 S rRNA (crystal structure: T. thermophilus33 ). (b) 23 S rRNA, 5H half (crystal structure: H. marismortui31 ). (c) 23 S rRNA, 3H half (crystal structure: H. marismortui31 ). A:A and A:G Base-pairs at the Ends of RNA Helices 741
  • 8. a cb f Front view of sheared A:G base-pairs Front view of imino A:G base-pairs Front view of A:A base-pairs Side view of sheared A:G base-pairs Side view of imino A:G base-pairs Side view of A:A base-pairsg h i d e Figure2(legendshownopposite)
  • 9. one is in the imino conformation, and the other is in the unusual A:G N3-amino base-pair confor- mation (see CRW AA.AG Online Figure 3 for chemical structure drawings and abbreviations used in other online materials). A lower percentage of base-pairing occurs with the invariant AA oppo- sitions. Here, base-pairing occurs in only three of the seven (43 %) homologous invariant AA opposi- tions. None of these form a sheared conformation, one forms an imino conformation, and two form unusual conformations. On the whole, the sheared conformation occurs in 82 % (27/33) of the paired oppositions in internal loops. 9 % (3/33) have the imino conformation and the remaining 9 % (3/33) have another type of conformation (Table 1). Of the three loop categories, the lowest percen- tage of base-pairs for AA/AG oppositions at the ends of helices occurs in multi-stem loops. Here, 58 % (23/40) of these candidates are base-paired in the 16 S and 23 S rRNA. Within this category, the highest percentage of base-pairings occurs for the invariant AG oppositions, where 75 % (9/12) are base-paired. Base-pairing occurs in 57 % (13/23) of the AA/AG exchanges, and for only one of ®ve (20 %) invariant AA oppositions. 61 % (14/23) of the AA/AG oppositions in multi-stem loops form sheared conformations, 13 % (3/23) form the imino conformation, and six (26 %) form other confor- mation types. For these rRNA oppositions, the highest percentage of base-pairs occur for the invariant AGs, followed by the AA/AG exchanges, with the lowest percentage of pairing in multi-stem loops (Table 1). 93 % (38/41) of the invariant AG oppositions are base-paired, 74 % (34/46) of the AA/AG exchanges are base-paired, and only 33 % (4/12) of the invariant AAs are base-paired. Several conformations are possible for these A:G base-pairs. The most common and well-character- ized are sheared and imino (Figure 2(a) and (d)34 ). The sheared conformation occurs in 80 % (61/76) of the base-paired oppositions of the 16 S and 23 S rRNAs. The sheared conformation forms in 87 % (33/38) of the invariant A:G base-pairs (Figure 2(a)), in 82 % (28/34) of the AA/AG exchanges, and does not occur in any of the four invariant A:A base-pairs (Figure 2(g), top). An imino or imino-like conformation occurs six times (6/76 ˆ 8 %) in the 16 S and 23 S rRNAs. They form in 8 % (3/38) of the invariant A:G base-pairs (Figure 2(d)), in just one of the 34 (3 %) AA/AG exchanges and in two of the four (50 %) invariant A:A base-pairs (Figure 2(g), bottom). Beyond these two well-characterized conformations, there are ®ve other conformations (CRW AA.AG Online Figure 3 and Online Table 4): (1) A:A N7-amino (``A7-1``; one in 16 S rRNA at positions 1248:1289); (2) A:A N7-amino symmetric (``A7``; one in 23 S rRNA at positions 1689:1698); (3) A:G N1-amino (``G1``; one in 16 S rRNA at positions 983:1222); (4) A:G N7-amino (``G7``; one in 23 S rRNA at pos- itions 149:177); and (5) A:G N3-amino (``G3``; four in 16 S rRNA at positions 60:107, 197:220, 687:700, and 1067:1108; one in 23 S rRNA at positions 627:636). There are eight examples of the A:G base-pair in the 16 S and 23 S rRNA crystal structures where the G is 5H to the helix. These occur at 16 S rRNA positions 112:315, 143:220, 321:332, 945:1236, 1160:1176, and 1357:1365, and at 23 S rRNA pos- itions 75:111 and 2547:2561 (Figure 1 and base-pair frequency tables at CRW AA.AG). Five of these base-pairs were already in the covariation-based rRNA structure models, with exchanges between the G:A and G:C/G:U/A:U or A:G base-pairs. The remaining three had minor exchanges with G:C/ G:U/A:U base-pairs. All eight of these rRNA base- pairs are in the imino conformation, which is con- sistent with the similarity between the G:A imino and Watson-Crick conformations. AA.AG@helix.ends in the PDB structure database To appreciate the conformation and structural details about these AA and AG oppositions at the ends of rRNA helices, and to establish a set of rules for RNA structure principles that de®ne them and will help us predict their occurrence in the future, we have also analyzed the ends of helices in the crystal and NMR structures available at the PDB structure database (http://www.rcsb.org/ pdb/35 ). The crystal and NMR RNA structures that are analyzed and discussed below are summarized in Table 2 and detailed in CRW AA.AG Online Table 5. These 29 crystal and 41 NMR structures contain 116 AA and AG oppositions (61 in crystal structures and 55 in NMR structures) at the end of a helix. The 70 structures can be divided by RNA molecule into the following categories: 12 rRNA structures (22 cases), 11 tRNA structures (22 cases), four group I intron structures (14 cases), and 43 Figure 2. Stereo views of A:G and A:A base-pairs at helix ends in different structural motifs from X-ray crystallo- graphy. NMR structures are omitted for clarity. The A in each base-pair is superimposed on the left of each panel. Chemical drawings were created using ISIS/Draw and stereo images were created using Insight II. (a) Chemical drawing of the G:A sheared base-pair (G:A N3-amino, amino-N7 base-pair34 ). (b) Front view of sheared A:G base- pairs: blue, GNRA tetraloop; yellow, E loop; green, tandem GA; red, helix end. (c) Side view of (b). (d) Chemical drawing of the G:A imino base-pair (G:A carbonyl-amino, imino-N1 base-pair34 ). (e) Front view of imino A:G base- pairs: blue, 5H helix end; yellow, 3H helix end. (f) Side view of (e). (g) Chemical drawings of the A:A sheared-like base- pair (top; A:A N3-amino base-pair44 ) and the A:A imino-like base-pair (bottom; A:A N1-amino base-pair44 ). (h) Front view of A:A base-pairs: yellow, N1-amino conformation; blue, N3-amino conformation; red, N7-amino conformation; green, tandem; gray, triple. (i) Side view of (h). A:A and A:G Base-pairs at the Ends of RNA Helices 743
  • 10. other RNA structures (58 cases), including one SRP structure (three cases), ®ve ribozyme structures (nine cases), ®ve pseudoknot structures (®ve cases), and four Rev response element structures (six cases). For the PDB structure database (Table 2), 80 % (93/116) of the oppositions are base-paired. AG oppositions at the ends of helices occur more fre- quently than AA oppositions in the PDB structure database (Table 2). Our analysis of the structure database reveals that the AG oppositions form base-pairs more frequently than the AA opposi- tions. These oppositions also have a biased distri- bution in the three loop categories. 44 % (51/116) occur in internal loops, 39 % (45/116) in hairpin loops, and 17 % (20/116) in multi-stem loops. There is an even distribution of oppositions that are base-paired in these loops: 76 % (34/45) in the hairpin loops, 82 % (42/51) in the internal loops, and 85 % (17/20) in the multi-stem loops. A total of 90 % (70/78) of the AG oppositions at the ends of helices in the PDB structure database (Table 2) are base-paired. These include both orien- tations (i.e. G 5H and 3H to the helix, and GA tan- dems). However, 70 % (54/78) have the G 5H to the helix. 67 % (47/70) of the A:G base-pairs are in the sheared conformation (Figure 2(a)), 30 % (21/70) are in the imino conformation (Figure 2(d)), and 3 % (2/70) form the G:A‡ carbonyl-amino, N7-N1 base-pair conformation (Online Figure 3(e)). When the G is 3H to the helix in the examples in Table 2, the sheared conformation is formed in 83 % (40/48) of the A:G base-pairs. 12 % (6/48) are in the imino conformation, and 4 % (2/48) form other conformations. These A:G sheared base-pairs are often a component of a larger motif that we currently recognize. All 16 examples of A:G base- pairs in GNRA tetraloops are in the sheared con- formation, and all of the A:G base-pairs in hairpin loops and at the end of a helix are in the sheared conformation (with the G 3H to the helix). All 11 of the A:G base-pairs in the E-loop and E-like loop cases that occur in internal and multi-stem loops are also sheared. 14 of the 22 other A:G base-pairs with the G 3H to the helix are also in the sheared conformation. The sheared conformation induces a bend in the backbone that does not distort the ¯anking helix when the G is 3H to the helix; how- ever, the ¯anking helix will be distorted when the G is 5H to the helix. The observed bias for sheared conformations for those A:G base-pairs oriented with the G 3H to the helix is consistent with this topological constraint. However, there is one example from a lower-resolution crystal structure of a sheared A:G base-pair when the G is 5H to the helix; this base-pair is at positions A299:G279 in the Tetrahymena thermophila group I intron, with 3-4 AÊ between the hydrogen bonding pairs.36 In contrast with the sheared conformation, A:G base-pairs at the ends of helices can adopt an imino conformation34 that can form at either end of a helix (with the G 5H or 3H to the helix) without dis- torting the surrounding base-pairs. There are six examples in Table 2 where an A:G base-pair (with the G 3H to the helix) forms an imino conformation. There are also a few examples where an A:G base- pair with this orientation in Table 2 adopts another conformation type (see below). As well, 71 % (15/ 21) of the A:G base-pairs in the imino conformation (including the two tandem GA cases) are oriented with the G 5H to the helix (Table 2). 93 % (13/14) of the single A:G base-pairs with the G 5H to the helix are in the imino conformation; the other is a sheared base-pair (see above). There are two examples of tandem G:A imino base-pairs where the G is 3H to the helix in one case and 5H to the helix in the other.37 A total of four of the six examples of imino A:G base-pairs with the G 3H to a helix are in single nucleotide bulges, adjacent to the A:G or A:A base-pair, where only one nucleo- tide remains unpaired.38-41 In these instances, an imino conformation, with its non-helix-distorting properties, may be preferred over the sheared con- formation. We have investigated the A:G base-pair confor- mations in different structural motifs to determine if the nucleotides surrounding the A:G base-pair in¯uence the conformation of this base-pair. The A:G base-pairs in Figure 2 are color-coded for the GNRA tetraloop, E loop, and GA tandem motifs Table 2. Distribution of AA and AG juxtapositions at the ends of helices in the structures in the PDB Structure Database Loop type Hairpin Internal Multi-stem Total Opposition C[ ‡ , À ]a [S,I,O]b C[ ‡ , À ]a [S,I,O]b C[ ‡ , À ]a [S,I,O]b C[ ‡ , À ]a [S,I,O]b AA 18[11,7] [11,0,0] 14[8,6] [4,1,3] 6[4,2] [1,1,2] 38[23,15] [16,2,5] AG c 27[23,4] [23,0,0] 24[23,1] [15,6,2] 3[2,1] [2,0,0] 54[48,6] [40,6,2] GA d 0[0,0] [0,0,0] 7[5,2] [0,5,0] 9[9,0] [1,8,0] 16[14,2] [1,13,0] GA tandem 0[0,0] [0,0,0] 6[6,0] [4,2,0] 2[2,0] [2,0,0] 8[8,0] [6,2,0] AG totals 27[23,4] [23,0,0] 37[34,3] [19,13,2] 14[13,1] [5,8,0] 78[70,8] [47,21,2] Total 45[34,11] [34,0,0] 51[42,9] [23,14,5] 20[17,3] [6,9,2] 116[93,23] [63,23,7] a C, number of examples of AA or AG juxtapositions at the ends of helices from crystal or NMR structures. ‡, Base-pair is present; À, base-pair is absent. b Conformation of AA or AG base-pairs present in the crystal or NMR structures: S, sheared; I, imino or imino-like; O, other. c G is 3H to the helix. d G is 5H to the helix. 744 A:A and A:G Base-pairs at the Ends of RNA Helices
  • 11. and the unincorporated A:G base-pairs when the G is 3H to the helix. Our analysis revealed that the conformations for the A:G base-pairs are nearly identical in all of these motifs except for the GNRA tetraloops (Figure 2(b) and (c), blue nucleotides), where the G of the GNRA tetraloop G:A sheared base-pair is shifted toward the major groove of the A. This shift is due to the additional hydrogen bonds between the guanosine base and the back- bone of A in the tetraloop, and between the back- bone atoms of G and other bases in the loop.42 There is a minimal amount of conformational ¯exi- bility in tandem G:A base-pairs with sheared and imino conformations (Figure 2(b), (c), (e) and (f)). Imino base-pairs showed much less conformational ¯exibility than sheared base-pairs, regardless of whether the base-pair was 5H or 3H to the helix (Figure 2(e) and (f)). Two consecutive A:G base-pairs can both form sheared base-pairs within a helix when the ®rst G:A base-pair is followed by another A:G base- pair. Both A:G base-pairs distort the helix; how- ever, they are oriented so that they offset or com- pensate one another to maintain the overall regularity of the helix.15,43 There are six examples of tandem sheared G:A base-pairs in the database. We have identi®ed conformations for A:A base- pairs that are analogous to the sheared and imino A:G base-pairs. 61 % (23/38) of the AA oppositions at the end of helices in the PDB NMR and crystal structure database (Table 2) are base-paired. There are ®ve different A:A base-pairing conformations; two are analogous to the conformations in the sheared and imino A:G base-pairs. The A:A N3- amino (A:A sheared) base-pair has one hydrogen bond between N3 of the ®rst adenosine and the amino group on the second (Figure 2(g), top44 ); in comparison, the sheared A:G base-pair forms two hydrogen bonds, one from the N3 of the guanosine to the adenosine amino group and the second between N7 of A and the amino group of G (Figure 2(a)). The A:A N1-amino (A:A imino-like) base-pair conformation forms a single hydrogen bond between N1 of one adenosine and the amino group of the second (Figure 2(g), bottom44 ); while the hydrogen bonding pattern is different, the overall shape of the base-pair resembles that of the A:G imino conformation and the orientation of the backbone (Figure 2(d)). 70 % (16/23) of the A:A base-pairs in the PDB structure database (Table 2) are in the sheared (A:A N3-amino) conformation, and 9 % (2/23) are in the A:A imino-like (A:A N1- amino) conformation. The sheared A:A (A:A N3- amino) base-pairs occur at the end of the D stem/ hairpin loop junction in tRNAs and within the A:A tandem base-pairs. Other sheared A:A (A:A N3- amino) base-pairs occur in a tetraloop and in the unincorporated 3H helix end category. All 11 of the hairpin loops with the A:A base-pair have the sheared conformation, while 50 % (4/8) of the internal loops and 33 % (1/3) of the multi-stem loops have this conformation for the A:A base-pair. The imino-like A:A (A:A N1-amino) base-pairs occur in the unincorporated 3H helix end category. The remaining 21 % (5/23) of the A:A base-pairs in the structure database have three other confor- mations, each with two hydrogen bonds, as opposed to a single hydrogen bond for the sheared (A:A N3-amino; Figure 2(g), top) and imino-like (A:A N1-amino; Figure 2(g), bottom) confor- mations. There are three ``A:A N7-amino, amino- N1`` base-pairs (with hydrogen bonds between the Watson-Crick and Hoogsteen faces of each A, one from N7 of the ®rst A to the amino group of the second, and one from N1 of the second A to the amino group of the ®rst34 ), one ``A:A N1-amino symmetric'' base-pair (similar to the imino-like A:A (A:A N1-amino) base-pair, but with one ade- nosine ¯ipped so that two hydrogen bonds can form between N1 on each adenosine and the amino group of its partner34 ), and one ``A:A N7- amino symmetric'' base-pair (with hydrogen bonds between the N7 and amino groups of each A44 ), which is analogous to a sheared A:G base-pair where the G is in the syn conformation. A:A and A:G base-pairs that stack onto the ends of helices Beyond the base-pairing of the AA and AG oppositions at the ends of helices, we have investi- gated the structures in the PDB structure database to determine if these non-canonical base-pairs stack onto the adjoining base-pair in the helix to which they are adjacent. The results are af®rma- tive: all but one of the 72 A:G and 23 A:A base- pairs are stacked, with stacking de®ned as one or both of the base-paired nucleotides overlapping with the adjoining base-pair in the helix. Examples of the three-dimensional structures for stacked A:G base-pairs in the sheared and imino conformations are shown in Online Figure 4. The one exception for the base stacking in the PDB structure database occurs in the mouse mam- mary tumor virus pseudoknot, where an A:A base- pair does not stack onto the end of the helix. This base-pair is composed of A14, situated between the two helices of the pseudoknot, and A6, located in one of the loops. This base-pair forms in one of the two constructs of the mouse mammary tumor virus. In the construct where A14 is unpaired, A14 stacks on G15 in the helix below.45 In the construct where A14 is base-paired to A6, the A14:A6 base- pair does not stack on the G15:C5 base-pair at the end of the helix.46 Burkard et al.47 analyzed the nucleotide stackings at the ends of helices in the PDB structure database and found that all AG oppositions at the ends of helices are base-paired and stacked when the G is 3H to the helix. Our analysis of the rRNA crystal structures31,33 revealed that both positions of the A:G base-pairs at the ends of helices are stacked in 78 % (21/27) of the cases in the 16 S rRNA and 88 % (36/41) of the cases in the 23 S rRNA (infor- mation about stacking is available from the base- pair frequency tables at CRW AA.AG). In the A:A and A:G Base-pairs at the Ends of RNA Helices 745
  • 12. remaining six 16 S rRNA and ®ve 23 S rRNA cases, one nucleotide of each A:G base-pair is stacked upon the neighboring base-pair. For A:A base-pairs, four of the ®ve 16 S rRNA cases and all three of the 23 S rRNA cases have both nucleotides stacked; in the lone exception, one nucleotide of the A:A base-pair is stacked upon the neighboring base-pair. In total, stacking occurs on both pos- itions in 78 % (25/32) of the 16 S rRNA base-pairs and 89 % (39/44) of the 23 S rRNA base-pairs; the remaining seven 16 S rRNA and ®ve 23 S rRNA A:A and A:G base-pairs have only one of the two base-paired positions involved in stacking (see Online Table 4 at CRW AA.AG). Coaxial stacking with A:A and A:G base-pairing at the helix interfaces The ends of helices have a propensity to stack onto one another. Transfer RNA contains two sets of coaxial helices, the acceptor and TÉC helices, and the D and anticodon helices.48 The two most common base-pairs at positions 26:44, at the top of the tRNA anticodon helix (and stacked onto the D stem), are G:A and A:G (see CRW AA.AG Online Table 7 for the base-pair frequencies for tRNA pos- ition numbers 26:44, Saccharomyces cerevisiae phenylalanine numbering). Other base-pairs pre- sent in more than 5 % of the sequences are A:A, A:U, A:C and U:A. More recently, two sets of coaxial helices were identi®ed in the crystal structure for the L11 bind- ing region of 23 S rRNA (Figure 1(b)30,49 ). The lone-pair 1082:1086 (E. coli numbering) is stacked onto the 1057-1059/1079-1081 helix. A second lone- pair, 1087:1102, is stacked onto the G1056:A1103 base-pair at the top of the 1051-1056/1103-1108 helix. Given these two precedents for A:G and A:A base-pairs at the interface between coaxially stacked helices, we questioned if (1) there are other examples in the RNA structure database for this motif and (2) if one of the functions of A:A and A:G base-pairs at the termini of helices is to be at the interface of two helices that are coaxially stacked. 23 of the 116 examples in the PDB structure database with an AA or AG at the end of a helix (Online Table 6) are adjacent to another helix. 21 of these are base-paired, while two are unpaired (Table 2). A Curves analysis was performed on these helix junctions to measure the angle between the two helices and the overall helix dis- placement.50,51 Helices are considered to be coaxial when both the angle between the helix axes and their displacement are minimal, as discussed in Materials and Methods. Eight of the 21 examples in the structure database with an A:A or A:G base- pair at the end of one helix and adjacent to another helix occur at the anticodon/D helix junction in tRNA. All eight of these tRNA examples are coaxi- ally stacked with the G 5H to the helix and the G:A base-pair in the imino conformation (the N1-amino conformation for the one A:A base-pair). In addition to the eight tRNA cases, eight more examples satisfy these strict criteria, including examples in 23 S rRNA and the RRE RNA. However, there are a few cases where an A:A or A:G base-pair at the end of a helix is not coaxial to a second helix. The P4-P6 domain of the group I intron contains tandem G:A base-pairs in a multi- stem loop at positions 139:164 and 140:163 that extends the P5b helix and ¯anks and adjoins the P5a and P5c helices (PDB ID 1GID52 ). The axis of the P5c helix (165-167/173-175) that is 3H to the A139:G164 base-pair continues at an angle of 94 to and is 11.7 AÊ displaced from the P5b helix end- ing in A:G. The axis of the P5a helix (136-138/180- 182), 5H to the A139:G164 base-pair, has an angle of 42 to and 9.15 AÊ displacement from the P5b helix ending in A:G. Helices P5a and P5c are not con- sidered to be coaxial with P5b. The second excep- tion also occurs in the group I intron, where the P3 and P7 helices that end with A:A base-pairs (A269:A306 and A270:A104) are not coaxial.36 Here, the two helices are separated by 3.9 AÊ and occur at an angle of 40 . While 21 of the 23 examples in the PDB database with an AA or AG opposition at the end of one helix and adjacent to another helix form an A:A or A:G base-pair, A:A or A:G base-pairs do not form in the remaining two examples. In both cases, the helices are not coaxial with one another. The RNA is kinked at the internal loop junction by 170 and the axis is displaced by 16 AÊ when the spliceo- somal U1A protein is bound to its RNA.53 Helices are also not stacked for the unpaired AA oppositions in the mouse mammary tumor virus pseudoknot junction. Here, the angle between the helices is 78 and the helix displacement is 5.3 AÊ .45 As noted earlier, there are 115 cases with an AA or AG opposition at the end of a helix in the Bac- terial 16 S and 23 S rRNA secondary structure models. A total of 99 of these are homologous and have their structures determined in the 16 S and 23 S rRNA crystal structures; 76 of these are base- paired in the two crystal structures. All additional base-pairs in the crystal structures that are not in the comparative structure models, adjacent to A:A and A:G base-pairs at the ends of helices, and immediately opposed to another helix with no intervening nucleotides were identi®ed on the sec- ondary structure diagrams in Figure 1. Two helices with an A:A or A:G base-pair at their interface and no unpaired nucleotides on the strand connecting them were considered as a possible coaxial helix and identi®ed in Figure 1; those stacked in the crystal structures were identi®ed. Discussion Our goal is to predict base-pairs for those pos- itions with similar patterns of variation (covaria- tion) and, more recently, for those positions with either unique patterns of variation or no variation. 746 A:A and A:G Base-pairs at the Ends of RNA Helices
  • 13. Toward this end, an earlier analysis of base-paired and unpaired nucleotides in covariation-based rRNA structure models has revealed that there is a signi®cant bias for adenosines to be unpaired, and a more pronounced bias for unpaired As at the 3H end of loops.25 The same analysis also determined that Gs and As are the two most frequent nucleo- tides at the 5H end of a loop. Given that the GA/ AA opposition at positions 1056:1103 is base- paired in the 23 S rRNA L11 crystal structures,30,49 we have searched for other examples of AA and AG oppositions at the ends of helices. AA and AG oppositions, base-pairs, and conformations at the ends of helices Our analysis of the 16 S and 23 S rRNA covaria- tion-based models revealed that AA and AG oppo- sitions that occur in more than 90 % of the rRNA sequences at the ends of helices are very common. Of the approximately 400 oppositions at the end of a helix, more than 100 of them have a very con- served AA, AG or an AA/AG exchange. Prior to the resolution of the 16 S and 23 S rRNA crystal structure solutions, our only examples with physi- cal evidence for A:G and A:A base-pairs at the ends of helices were in the NMR and crystal struc- ture solutions available from the PDB structure database. Our analysis of both databases revealed, as discussed earlier, the following trends. (1) More than 75 % of these AA and AG oppositions are base-paired. (2) Of the AA and AG oppositions, AG oppositions occur more frequently and are base-paired at a higher percentage. (3) For the two AG orientations, the G is 3H to the helix in approxi- mately 90 % of the cases. (4) For the three loop cat- egories, the highest percentage of base-pairing occurs in the hairpin loops, followed by internal and multi-stem loops. (5) Overall, the most com- mon conformation for the base-paired oppositions is sheared. The imino and several unusual confor- mations occur at a much lower frequency. The per- centage of sheared conformations is higher for A:G base-pairs (versus A:A) and higher when the G is 3H to the helix. In contrast, essentially all of the A:G base-pairs with the G 5H to the helix have the imino conformation. AA and AG oppositions that are not base-paired While 80 % (93/116) of the AA and AG opposi- tions at the ends of helices from the PDB structure database are base-paired, 23 are not. 65 % (15/23) of these involve AA oppositions and 35 % (8/23) have AG oppositions. For the 16 S and 23 S rRNA, we have similar percentages of unpaired AA and AG oppositions. 77 % (76/99) of the oppositions are base-paired while 23 are not. Here, the highest percentage of non-pairing occurs for the invariant AA oppositions (66 %; 8/12), followed by AA/AG exchanges (26 %; 12/46) and invariant AGs (7 %; 3/41). It is not obvious why these oppositions are not base-paired, while the majority of them are. A higher percentage of AA oppositions are not base- paired, and for the 16 S and 23 S rRNA a higher percentage of oppositions in multi-stem loops are not base-paired (42 % of the oppositions in multi- stem loops are not base-paired, versus 5 % and 13 % of the oppositions in hairpin and internal loops; Table 1). There are no obvious sequence pat- terns ¯anking the oppositions that distinguish the paired from the unpaired. Maybe there is a higher percentage of unpaired oppositions in the multi- stem loops since these regions of the RNA have more opportunities to form interactions with other positions in the multi-stem loop. And maybe the explanation for the higher frequency of unpaired AA oppositions is that these unpaired adenosines are inserting into the minor groove of helices, as recently documented in the A-minor motif54 and type I/II base triples.55 Alternatively, these AA and AG oppositions might not base-pair because one or both of these positions are involved in a standard base-base interaction with another region of the RNA or an interaction with a protein. Some of the unpaired oppositions in the PDB database are associated with protein binding to the RNA, pseudoknots and unusual base-pair conformations between one of the positions in the opposition and another pos- ition (entries with unpaired oppositions associated with proteins are: 1CN8, 1AUD, 1RNK, 1ZDI, 1ZDJ, 7MSF, 1YFG, 1C04; 1QA6, 1TLR, and 1GID). For the rRNAs, there are 23 oppositions that are not base-paired. The positions in only four of these are not involved in other intramolecular base-base interactions, while both positions in 12 oppositions are involved in other intramolecular RNA-RNA interactions, and one of the positions in seven of the oppositions is involved in another intramolecu- lar RNA-RNA interaction (Figure 1). However, in contrast, there are examples of A:A and A:G base-pairs at the ends of helices in the PDB database that are also interacting with pro- teins (entries with paired oppositions associated with proteins are: 1A4T, 1QFQ, 1D6 K, 1DFU, 1ETF, 1ULL, 484D, 2TOB, 1NEM, and 1PBR). For the rRNAs, there are examples of A:A and A:G base-pairs at the ends of helices that are interacting with other positions in the rRNA crystal structures.31,33 Thus, there is no simple explanation for why some of the AA and AG oppositions are not base-paired. However, there is an example of an A:A/A:G base-pair at the end of a helix in the 16 S rRNA that becomes unpaired during protein synthesis, suggesting that these AA and AG oppo- sitions might not be static, but instead involved in movement (see below). A:A and A:G base-pairs and conformations in larger motifs In 1985, it was observed that the majority of the adenosines were unpaired in the E. coli 16 S rRNA covariation-based structure model.56 More recently, A:A and A:G Base-pairs at the Ends of RNA Helices 747
  • 14. it was determined that this bias occurs in a large collection of 16 S and 23 S rRNA structure mod- els,25 and that there is an even stronger bias for unpaired adenosines to be at the 3H end of loops, and guanines and adenines to occur at the 5H end of loops. These biases are consistent with and aug- ment our identi®cation of AA and AG oppositions at the ends of helices. Other biases in the distri- butions of nucleotides in the loop structures with these dominant adenosines at the 3H ends of loops were identi®ed, with several different structural motifs mapped onto these regions of the 16 S and 23 S rRNA25 (see also 16 S and 23 S rRNA second- ary structure Figures with motifs mapped onto the oppositions at CRW AA.AG). These include adeno- sine platforms, E and E-like loops, tandem GAs, GNRA tetraloops, and U-turns. The AA and AG oppositions at the ends of helices are a component in these motifs, although not necessarily in all examples for each of these motifs. Sheared A:G base-pairs with the G 3H to the helix are present in GNRA tetraloops, the E loop, tandem A:G base- pairs, and in some of the U-turns. Thus, the sheared base-pairing conformation appears to be an important structural element utilized in these larger structural motifs. The GNRA tetraloop is a common structural element in various RNAs, including the rRNAs.11 The second motif is the E loop that was ®rst identi®ed in the 5 S rRNA and subsequently observed in several other RNAs.16,57 ± 61 The third motif is tandem G:A base-pairs. Here the A:G base-pairs that are arranged in tandem can be in the sheared or imino conformation. A single A:G base-pair in the sheared conformation and ¯anked by standard G:C or A:U base-pairs would distort the helix; however, a second A:G base-pair with a sheared conformation in the proper orien- tation would offset this original distortion and bring the helix back into register. An unexpectedly high number of tandem G:A base-pairs was ident- i®ed with comparative sequence analysis of the rRNAs15,62 (a revised list of tandem GA opposi- tions in the rRNAs is available at CRW A Story). The U-turn is the fourth motif, where the RNA backbone undergoes a sharp bend after the single- stranded U in a UNR sequence. This motif is most notably present in the anticodon and T loops of tRNAs.63,64 The UNR sequence, as revealed in a recent study of comparative structures of 16 S and 23 S rRNAs,18 is sometimes ¯anked by an A:G base-pair, and occurs within the three loop cat- egories: hairpin, internal, and multi-stem. (We have also noted that there is usually a AG or AA opposition that is adjacent to the G:U base-pair associated with the adenosine platform14,25 (see above).) Given that these A:G base-pairs at the ends of helices are associated with several larger motifs, we have analyzed here the conformation of the A:G base-pairs in various structural motifs and have determined that the conformations are identi- cal in all of these motifs, except for the GNRA tet- raloops, where it is shifted slightly (Figure 2(b) and (c)). A:A and A:G base-pair and coaxial stacking All but one of the A:A and A:G base-pairs at the ends of helices in the PDB database and the 16 S and 23 S rRNA crystal structures are stacked onto the end of the helix. The extension of these helices occurs for all of the A:A and A:G base-pairs in the structure database, except for one example in a conformationally constrained pseudoknot.45 This preponderance of stacking is maintained in the rRNAs, as noted earlier. Given the tendency for helices to coaxially stack onto one another when they are adjacent to one another, we have questioned if A:A and A:G base- pairs at the interface of two helices might in¯uence the coaxial stacking potential of these two helices. Our analysis of the structures in the PDB structure database was af®rmative: 76 % (16/21) of adjacent helices with an A:A or A:G base-pair between them are coaxially stacked. Previously, it has been shown that coaxial stacking at helix junctions stabilizes the structure by about 2 kcal/mol.65± 66 Additional studies con®rmed that A:G base-pairs at the junction between coaxially stacked helices contribute the same energy as U:A base-pairs, while tandem GAs are almost as stabilizing as single AGs in a junction.67 The analysis of the potential coaxial helices in the 16 S and 23 S rRNA revealed mixed results. A total of 11 of the 12 (92 %) potential coaxial helices are stacked in the 16 S rRNA crystal structure (Figure 1(a); base-pair frequency tables at CRW AA.AG). However, only 11 of the 22 (50 %) poten- tial coaxially stacked helices are actually stacked in the 23 S rRNA crystal structure (Figure 1(b) and (c); base-pair frequency tables at CRW AA.AG). Conformational changes in the 16 S rRNA A-site In our paper about unpaired adenosines in the covariation-based rRNA structure models,25 we observed that some of the positions involved in AA and AG oppositions at the ends of helices also occur in adenosine platforms, E and E-like loops, tandem GAs, and U-turn sequence motifs. We speculated that conformational rearrangements might be necessary if both of these sequence motifs fold into their respective structural motifs. The crystal structure of the A-site in 16 S rRNA has been determined in the presence and absence of the antibiotics paromomycin, streptomycin, and spectinomycin,68 initiation factor 1 (IF1),69 and mRNA/tRNA.70 The analysis of the crystal struc- ture revealed the status of the 1408:1493 AA/AG opposition at the end of a helix. This opposition is adjacent to the invariant C1407:G1494 base-pair. Position 1408 is an A in greater than 99 % of the bacteria, 98 % of the chloroplasts, and 96 % of the mitochondria (see Online Table 4(a) at CRW 748 A:A and A:G Base-pairs at the Ends of RNA Helices
  • 15. AA.AG, and the individual nucleotide frequency tables at the CRW Site). All of these sequences that do not have an A at position 1408 have a G. Great- er than 99 % of the Eucarya 16 S-like rRNA sequences have a G at position 1408; the remaining sequences have an A. 70 % of the Archaea 16 S rRNA sequences have an A at position 1408, while the remaining 30 % have a G. Position 1493 is an A in more than 99 % of all 16 S and 16 S-like rRNA sequences. Position 1492 is also equally conserved, with an adenosine in more than 99 % of all 16 S and 16 S-like rRNA sequences (CRW Site Single Base Frequency Tables). Thus, in the Bacteria, chloroplasts, and mitochondria, and 70 % of the Archaea, the 1408:1493 opposition is an AA, while it is a GA in the Eucarya and 30 % of the Archaea. Positions 1408:1493 form an A:A base-pair in the T. thermophilus 30 S ribosomal subunit crystal structure that is not complexed with antibiotics, IF1, or mRNA/tRNA33 (Online Table 8), while they are unpaired in the three different crystal structures that are complexed with the antibiotics paromomycin, streptomycin, and spectinomycin, IF1, and a mRNA/tRNA codon-anticodon helix. When positions 1408:1493 are not base-paired, the two invariant adenines at positions 1492 and 1493 are ¯ipped out of the helix and are available for interactions with IF1 and the codon-anticodon helix. In conjunction with the unpairing of the 1408:1493 base-pair and the movement of positions 1492 and 1493 from the inside to the outside of the helix, there are minor changes in the bend angle and the displacement of the coaxial stack ¯anking both sides of the 1408:1493 opposition (Online Table 8). The base-pairs in proximity to the 1408:1493 opposition (C1399:G1504, G1401:C1501, C1402:A1500, C1404:G1497, G1405:C1496, U1406:U1495, C1407:G1494, C1409:G1491, G1410:C1490, C1411:G1489, and C1412:G1488) are all base-paired in both the presence and absence of these molecules involved in protein synthesis (Online Table 8). The conserved, but not invariant, A1413:G1487 base-pair (see CRW Site base-pair fre- quency tables for 16 S rRNA; predominantly A:G in the Bacteria, Archaea, and chloroplasts, U:A in the Eucarya, and C:G in the mitochondria) is base- paired in the imino conformation in three of the four crystal structures, and is unpaired in the pre- sence of IF1. These results reveal that the 1408:1493 AA/AG opposition at the end of the helix is involved in a conformational rearrangement directly associated with protein synthesis. This region of the A-site contains a set of commonly occurring rRNA motifs, described earlier.25 More than 50 % (527 in total) of the 3H ends of loops in 16 S and 23 S rRNA contain a conserved adenosine in the covariation-based structure models. 56 (11 %) of these ``A-motifs'' are ¯anked by an A on its 5H end and a paired G on its 3H end. This highly conserved AAG motif occurs at 16 S rRNA pos- itions 1492-1494. While this sequence motif con- tains some of the features characteristic of the adenosine platform,24,25 we do not know if pos- itions 1492 and 1493 are base-paired at some stage in protein synthesis, as they are in the adenosine platform. Concluding statement Our analysis of the PDB structure database and the 16 S and 23 S rRNA crystal structures revealed general similarities in the higher than expected fre- quencies of AA and AG oppositions at the ends of helices, and, for both sets of data, similar extents of base-pairing (80 % for the PDB, 76 % for the two rRNAs). The frequencies of AG oppositions and oppositions that are base-paired were higher than the frequencies of AA oppositions and their base- pairs for both data sets. As well, the frequency of oppositions that are base-paired is highest for the hairpin loops for both data sets, followed by internal and multi-stem loops for the rRNAs. The frequencies of A:G base-pairs (when the G is 3H to the helix) in the sheared conformation are signi®- cantly higher than the frequency of imino confor- mations and other unusual conformations for both data sets, while essentially all of the A:G base-pairs with the G 5H to the helix are in the imino confor- mation for both data sets. The sheared confor- mation occurs in 100 % of the A:A/A:G base-pairs at the ends of helices in hairpin loops in both data sets, a lower percentage in internal loops (82 % (27/33) in rRNA, 55 % (23/42) in the PDB), and the lowest percentage in multi-stem loops (61 % (14/ 23) in rRNA, 35 % (6/17) in the PDB). In contrast, the imino conformation occurs at the lowest per- centage in hairpin loops (0 % in both data sets), a higher percentage in internal loops (9 % (3/33) in rRNA, 33 % (14/42) in the PDB), and the highest percentage in multi-stem loops (13 % (6/23) in rRNA, 53 % (9/17) in the PDB). Other confor- mations occur in both data sets, although limited to internal and multi-stem loops. For the rRNAs, they are more prevalent than imino conformations, especially in multi-stem loops (Table 1). All of these A:A/A:G base-pairs are stacked in some form onto the ¯anking helix. The one major, anom- alous difference between the two data sets is for coaxial stacking. 91 % (21/23) of the potential coax- ial stacks in the PDB database are coaxial. For 16 S rRNA, this number is 92 % (11/12). However, for 23 S rRNA, this number is only 50 % (11/22). The combined total for 16 S and 23 S rRNA is 65 % (22/34). A:A and A:G base-pairs at the ends of helices are associated with several different structural motifs, including E loops, U-turns, and GNRA tet- raloops. While the majority of the AA and AG oppositions are base-paired, approximately 25 % of them are not. The percentage of unpaired AA oppositions is higher than unpaired AG opposi- tions. For the ribosomal RNAs, the highest percen- tage of unpaired oppositions is for those that occur in the multi-stem loops. Currently, there is no obvious explanation for why 25 % of the opposi- tions are not base-paired. However, given that the A:A and A:G Base-pairs at the Ends of RNA Helices 749
  • 16. 16 S rRNA 1408:1493 AA/AG opposition is dynamic, changing its form from paired to unpaired during protein synthesis, we wonder if the state of other AA/AG oppositions at the ends of helices are also dynamic and associated with ribosomal movement during assembly and protein synthesis.71,72 Materials and Methods The rRNA sequence alignments used for this analysis are maintained by us at the University of Texas and are available from the CRW AA.AG Site (see below). Sequences were manually aligned with the alignment editor AE2 (T. Macke, Scripps Clinic, San Diego, CA). Our analysis of the AA and AG oppositions at the ends of helices was performed on this large collection of 16 S and 23 S rRNA sequences that span the three primary phylogenetic lineages and the two Eucarya organelles, as outlined in Table 3. The numbering systems from the E. coli 16 S and 23 S rRNA sequences (GenBank Acces- sion no. J01695) are used as the references for position numbers for both 16 S and 23 S rRNAs. AA and AG oppositions at the ends of helices in the most recent (December 1999) 16 S and 23 S rRNA E. coli covariation-based structure models (CRW Site; see below) were manually identi®ed. Each candidate was classi®ed into one of three loop types: hairpin, internal or multi-stem. The program query (Gutell et al., unpub- lished) was used to collect single nucleotide and base- pair frequency data from the (AE2) sequence alignments. Base frequencies for each candidate were computed independently from each of the alignments (16 S and 23 S rRNAs; bacterial, archaea, and eucarya nuclear, chloroplast, and mitochondrial). AA.AG@helix.ends can- didates with greater than 90 % AA, AG or AA/AG (with the G 3H to the helix for AG and AA/AG oppositions) in the bacterial alignment were considered further. The comparative sequence analysis data is summarized in Table 1 and presented in greater detail in Online Table 4 at CRW AA.AG (see below). Supplementary data that augments the Tables and Figures in this manuscript is available from the CRW AA.AG@helix.ends pages (abbreviated as CRW AA.AG; http://www.rna.icmb.utexas.edu/ANALYSIS/AAAG/), the CRW Site (http://www.rna.icmb.utexas.edu), and the CRW A Story pages (http://www.rna.icmb.utexas.e- du/ANALYSIS/A-STORY/). The information available at CRW AA.AG includes: base-pair frequency tables for all of the AA and AG oppositions at the ends of helices that occur in more than 90 % of the bacterial sequences (Online Table 4); tables of the PDB structures analyzed in Table 2 (Online Table 5) and for the coaxial stacking analysis (Online Table 6); chemical structure diagrams for all of the base-pair types described here (Online Figure 3); and 16 S and 23 S rRNA secondary structure diagrams showing the AA/AG oppositions, potential coaxial stackings (Figure 1) and multiple motifs (Online Figure 5). The tabulated information in Table 1 is culled from Online Table 4 (16 S and 23 S rRNA base-pair frequency tables), which includes: (1) the percent occurrences for all 16 base-pairing types (e.g., A:A, A:C, A:G, etc.) at each of the AA and AG sites in ®ve alignments (Bacteria, Archaea, Eucarya nuclear, chloroplasts and mitochon- dria); (2) the exchange patterns between AA and AG; (3) the loop type (hairpin, internal, or multi-stem); (4) any associated motifs (e.g. E loop); and (5) for all of the oppositions that are base-paired in the rRNA crystal structures, four additional entries: (a) a RasMol73,74 image of that base-pair created from the crystal struc- tures (16 S rRNA, PDB ID 1FJF;33 23 S rRNA, PDB ID 1FFK31 ); (b) the conformation of the base-pair;34,44 (c) identi®cation of the nucleotides of the opposition which stack onto the adjoining helix; and (d) the adjoining base-pair(s) upon which the opposition stacks. The online tables describing the PDB structures (Online Tables 5 and 6) present, for each of the NMR and crystal structures, an expanded description of the experimental systems, RasMol73,74 images highlighting the AA and AG oppositions, links to the MEDLINE abstract, and additional information pertinent to that analysis. The secondary structure Figures showing the AA.AG@helix.ends sites (Figure 1) and additional secondary structure diagrams at CRW AA.AG (Online Figure 5) were generated using the interactive graphics program XRNA (Weiser Noller, University of Califor- nia, Santa Cruz). Chemical structures were generated using ISIS/Draw and CS ChemDraw Std. 3D images were generated using Insight II. The PDB ®le for each rRNA crystal structure was visu- alized using RasMol.73,74 The conformation34,44 of each base-pair was assessed. We have analyzed the A:A and A:G oppositions at the ends of helices in the NMR and crystal structures from the PDB.35 Only one structure was analyzed when that structure was solved more than once with the same method. For NMR structures, we analyzed either the minimized average structure (when available) or the ®rst structure. Both NMR and crystal structures were ana- lyzed when a single structure was solved using both methods. For sequences determined by both X-ray crys- tallography and NMR spectroscopy, we analyzed one structure from each method. Both the free and bound forms were analyzed when the same RNA construct was Table 3. Approximate number of sequences in the 16 S and 23 S rRNA alignments No. of sequences b Alignment IDa Phylogenetic group/organelle 16 S rRNA 23 S rRNA B Bacteria 5850 325 A Archaea 260 40 C Chloroplast 180 100 E Eucarya 1050 265 M Mitochondria 160 310 Total All 8500 1040 a Single-letter code used to identify the alignment in the base-pair frequency tables (Online Table 4 at CRW AA.AG). b Approximate number of sequences in each alignment at the time of this analysis. 750 A:A and A:G Base-pairs at the Ends of RNA Helices
  • 17. solved in the presence and absence of protein or other ligands. Base-pairs were extracted from PDB ®les and superim- posed using Insight II. The atoms in the base of each adenine in A:G base-pairs were superimposed. For A:A base-pairs, the atoms in one adenine were superimposed so that the other adenine of the base-pair sat on the major groove side of the superimposed adenines. Base stacking was evaluated manually using Insight II and RasMol. A Curves analysis50,51 was used to assess if adjacent helices were coaxial by determining the angle and axis displacement between the best linear axes of these helices. Linear axes were calculated for helices with three or more base-pairs, including the terminal A:G or A:A base-pair. When the A was not base-paired, this nucleo- tide was not included in axis calculations. Coaxial helices should theoretically have no axis displacement and little or no angle between axes. The D stem and anticodon stem are relatively coaxial in the tRNA three-dimen- sional structure. In this case, the average angle between the anticodon stem ending in an imino A:G base-pair and the D stem axes is 17.17 and the axis displacement is 3.36 AÊ for the eight structures studied. These values were used as a baseline to determine whether the axes in other structures were also coaxially stacked, accounting for a range of normal base-pair helicoidal parameters at the junctions. For the analysis of the full set of 21 examples, we considered two helices to be coaxial when the angle between them was less than 30 and the helix displacement was less than 5 AÊ . Note Added in Proof A re-analysis of the 50 S ribosomal crystal struc- ture revealed that the 2650 helix in 23 S rRNA (Figure 1(c), page 740) is coaxially stacked, and thus should be colored yellow and not brown. The counts of coaxially stacked helices on pages 748 and 749 have been corrected. Acknowledgments We greatly appreciate the constructive comments from both reviewers. This work was supported by the NIH (GM48207, awarded to R.R.G.; GM56544, awarded to S.C.H.) and from startup funds from the Institute for Cellular and Molecular Biology at the University of Texas at Austin and the Welch Foundation (both awarded to R.R.G.). References 1. Mathews, D. H., Sabina, J., Zuker, M. Turner, D. H. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. 288, 911-940. 2. Zuker, M., Mathews, D. H. Turner, D. H. (1999). Algorithms and thermodynamics for RNA second- ary structure prediction: a practical guide. In RNA Biochemistry and Biotechnology (Barciszewski, J. Clark, B. F. C., eds), pp. 11-43, Kluwer Academic Publishers. 3. Konings, D. A. M. Gutell, R. R. (1995). A compari- son of thermodynamic foldings with comparatively derived structures of 16 S and 16 S-like rRNAs. RNA, 1, 559-574. 4. Fields, D. S. Gutell, R. R. (1996). An analysis of large rRNA sequences folded by a thermodynamic method. Fold. Des. 1, 419-430. 5. Woese, C. R. Pace, N. R. (1993). Probing RNA structure, function, and history by comparative anal- ysis. In The RNA World (Gesteland, R. F. Atkins, J. F., eds), pp. 91-118, Cold Spring Harbor Laboratory Press, Plainview, New York. 6. Gutell, R. R., Larsen, N. Woese, C. R. (1994). Lessons from an evolving rRNA: 16 S and 23 S rRNA structures from a comparative perspective. Microbiol. Rev. 58, 10-26. 7. Gutell, R. R. (1999). Comparative analysis of RNA sequences. Nucl. Acids Symp. Ser. 41, 48-53. 8. Gutell, R. R. (1996). Comparative sequence analysis and the structure of 16 S and 23 S rRNA. In Riboso- mal RNA. Structure, Evolution, Processing, and Func- tion in Protein Biosynthesis (Zimmerman, R. A. Dahlberg, A. E., eds), pp. 111-128, CRC Press, Boca Raton. 9. Gautheret, D. Gutell, R. R. (1997). Inferring the conformation of RNA base pairs and triples from patterns of sequence variation. Nucl. Acids Res. 25, 1559-1564. 10. Michel, F., Costa, M., Massire, C. Westhof, E. (2000). Modeling RNA tertiary structure from pat- terns of sequence variation. Methods Enzymol. 317, 491-510. 11. Woese, C. R., Winker, S. Gutell, R. R. (1990). Architecture of ribosomal RNA: constraints on the sequence of tetra-loops. Proc. Natl Acad. Sci. USA, 87, 8467-8471. 12. Gutell, R. R., Noller, H. F. Woese, C. R. (1986). Higher order structure in ribosomal RNA. EMBO J. 5, 1111-1113. 13. Lehnert, V., Jaeger, L., Michel, F. Westhof, E. (1996). New loop-loop tertiary interactions in self- splicing introns of subgroup IC and ID: a complete 3D model of the Tetrahymena thermophila ribozyme. Chem. Biol. 3, 993-1009. 14. Gautheret, D., Konings, D. Gutell, R. R. (1995). G.U base pairing motifs in ribosomal RNA. RNA, 1, 807-814. 15. Gautheret, D., Konings, D. Gutell, R. R. (1994). A major family of motifs involving G.A mismatches in ribosomal RNA. J. Mol. Biol. 242, 1-8. 16. Wimberly, B. (1994). A common RNA loop motif as a docking module and its function in the hammer- head ribozyme. Nature Struct. Biol. 1, 820-827. 17. Leontis, N. B. Westhof, E. (1998). A common motif organizes the structure of multi-helix loops in 16 S and 23 S ribosomal RNAs. J. Mol. Biol. 283, 571- 583. 18. Gutell, R. R., Cannone, J. J., Konings, D. Gautheret, D. (2000). Predicting U-turns in riboso- mal RNA with comparative sequence analysis. J. Mol. Biol. 300, 791-803. 19. Michel, F. Westhof, E. (1990). Modeling of the three-dimensional architecture of group I catalytic introns based upon comparative sequence analysis. J. Mol. Biol. 216, 585-610. 20. Gautheret, D., Damberger, S. H. Gutell, R. R. (1995). Identi®cation of base triples in RNA using comparative sequence analysis. J. Mol. Biol. 248, 27- 43. A:A and A:G Base-pairs at the Ends of RNA Helices 751
  • 18. 21. Jaeger, L., Michel, F. Westhof, E. (1994). Involve- ment of a GNRA tetraloop in long-range RNA tertiary interactions. J. Mol. Biol. 236, 1271-1276. 22. Costa, M. Michel, F. (1995). Frequent use of the same tertiary motif by self-folding RNAs. EMBO J. 14, 1276-1285. 23. Costa, M. Michel, F. (1997). Rules for RNA recog- nition of GNRA tetraloops deduced by in vitro selec- tion: comparison with in vivo evolution. EMBO J. 16, 3289-3302. 24. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Szewczak, A. A., Kundrot, C. E., Cech, T. R. Doudna, J. A. (1996). RNA tertiary structure mediation by adenosine platforms. Science, 273, 1696-1699. 25. Gutell, R. R., Cannone, J. J., Shang, Z., Du, Y. Serra, M. (2000). A story: unpaired adenosine bases in ribosomal RNAs. J. Mol. Biol. 304, 335-354. 26. Hermann, T. Patel, D. J. (1999). Stitching together RNA tertiary architectures. J. Mol. Biol. 294, 829-849. 27. Moore, P. B. (1999). Structural motifs in RNA. Annu. Rev. Biochem. 68, 287-300. 28. Traub, W. Sussman, J. L. (1982). Adenine-guanine base pairing ribosomal RNA. Nucl. Acids Res. 10, 2701-2708. 29. Woese, C. R., Gutell, R., Gupta, R. Noller, H. F. (1983). Detailed analysis of the higher-order structure of 16 S-like ribosomal ribonucleic acids. Microbiol. Rev. 47, 621-669. 30. Conn, G. L., Draper, D. E., Lattman, E. E. Gittis, A. G. (1999). Crystal structure of a conserved riboso- mal protein-RNA complex. Science, 284, 1171-1174. 31. Ban, N., Nissen, P., Hansen, J., Moore, P. B. Steitz, T. A. (2000). The complete atomic structure of the large ribosomal subunit at 2.4 AÊ resolution. Science, 289, 905-920. 32. Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M., Janell, D., Bashan, A., Bartels, H., Agmon, I., Franceschi, F. Yonath, A. (2000). Structure of functionally activated small ribosomal subunit at 3.3 AÊ resolution. Cell, 102, 615-623. 33. Wimberly, B. T., Brodersen, D. E., Clemons, W. M., Jr, Morgan-Warren, R. J., Carter, A. P., Vonhein, C., Hartsch, T. Ramakrishnan, V. (2000). Structure of the 30 S ribosomal subunit. Nature, 407, 327-339. 34. Burkard, M. E., Turner, D. H. Tinoco, I., Jr (1999). Structures of base pairs involving at least two hydrogen bonds. In The RNA World (Gesteland, R. F., Cech, T. R. Atkins, J. F., eds), 2nd edit., pp. 675- 680, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York. 35. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. Bourne, P. E. (2000). The Protein Data Bank. Nucl. Acids Res. 28, 235-242. 36. Golden, B. L., Gooding, A. R., Podell, E. R. Cech, T. R. (1998). A preorganized active site in the crystal structure of the Tetrahymena ribozyme. Science, 282, 259-264. 37. Wu, M. Turner, D. H. (1996). Solution structure of (rGCGGACGC)2 by two-dimensional NMR and the iterative relaxation matrix approach. Biochemistry, 35, 9677-9689. 38. Rowsell, S., Stonehouse, N. J., Convery, M. A., Adams, C. J., Ellington, A. D., Hirao, I., Peabody, D. S., Stockley, P. G. Phillips, S. E. (1998). Crystal structures of a series of RNA aptamers complexed to the same protein target. Nature Struct. Biol. 5, 970- 975. 39. Peterson, R. D. Feigon, J. (1996). Structural change in Rev responsive element RNA of HIV-1 on bind- ing Rev peptide. J. Mol. Biol. 264, 863-877. 40. Battiste, J., Mao, H., Rao, N., Tan, R., Muhandiram, D., Kay, L., Frankel, A. Williamson, J. (1996). Alpha helix-RNA major groove recognition in an HIV-1 rev peptide-RRE RNA complex. Science, 273, 1547-1551. 41. Ye, X., Gorin, A., Ellington, A. D. Patel, D. J. (1996). Deep penetration of an alpha-helix into a widened RNA major groove in the HIV-1 rev pep- tide-RNA aptamer complex. Nature Struct. Biol. 3, 1026-1033. 42. Jucker, F. M., Heus, H. A., Yip, P. F., Moors, E. H. Pardi, A. (1996). A network of heterogeneous hydrogen bonds in GNRA tetraloops. J. Mol. Biol. 264, 968-980. 43. SantaLucia, J. J. Turner, D. H. (1993). Structure of (rGGCGAGCC)2 in solution from NMR and restrained molecular dynamics. Biochemistry, 32, 12612-12623. 44. Nagaswamy, U., Voss, N., Zhang, Z. Fox, G. E. (2000). Database of non-canonical base pairs found in known RNA structures. Nucl. Acids Res. 28, 375- 376. 45. Shen, L. X. Tinoco, I. J. (1995). The structure of an RNA pseudoknot that causes ef®cient frameshifting in mouse mammary tumor virus. J. Mol. Biol. 247, 963-978. 46. Kang, H., Hines, J. V. Tinoco, I. J. (1996). Confor- mation of a non-frameshifting RNA pseudoknot from mouse mammary tumor virus. J. Mol. Biol. 259, 135-147. 47. Burkard, M. E., Kierzek, R. Turner, D. H. (1999). Thermodynamics of unpaired terminal nucleotides on short RNA helixes correlates with stacking at helix termini in larger RNAs. J. Mol. Biol. 290, 967- 982. 48. Sussman, J. L., Holbrook, S. R., Warrant, R. W., Church, G. M. Kim, S.-H. (1978). Crystal structure of yeast phenylalanine T-RNA. I. Crystallographic re®nement. J. Mol. Biol. 123, 607-630. 49. Wimberly, B. T., Guymon, R., McCutcheon, J. P., White, S. W. Ramakrishnan, V. (1999). A detailed view of a ribosomal active site: the structure of the L11-RNA complex. Cell, 97, 491-502. 50. Lavery, R. Sklenar, H. (1988). The de®nition of generalized helicoidal parameters and of axis curva- ture for irregular nucleic acids. J. Biomol. Struct. Dynam. 6, 63-91. 51. Lavery, R. Sklenar, H. (1989). De®ning the structure of irregular nucleic acids: conventions and principles. J. Biomol. Struct. Dynam. 6, 655-667. 52. Cate, J. H., Gooding, A. R., Podell, E., Zhou, K., Golden, B. L., Kundrot, C. E., Cech, T. R. Doudna, J. A. (1996). Crystal structure of a group I ribozyme domain: principles of RNA packing. Science, 273, 1678-1685. 53. Allain, F. H., Howe, P. W., Neuhaus, D. Varani, G. (1997). Structural basis of the RNA-binding speci- ®city of human U1A protein. EMBO J. 16, 5764- 5772. 54. Nissen, P., Ippolito, J. A., Ban, N., Moore, P. B. Steitz, T. A. (2001). RNA tertiary interactions in the large ribosomal subunit: the A-minor motif. Proc. Natl Acad. Sci. USA, 98, 4899-4903. 55. Doherty, E. A., Batey, R. T., Masquida, B. Doudna, J. A. (2001). A universal mode of helix packing in RNA. Nature Struct. Biol. 8, 339-343. 752 A:A and A:G Base-pairs at the Ends of RNA Helices
  • 19. 56. Gutell, R. R., Weiser, B., Woese, C. R. Noller, H. F. (1985). Comparative anatomy of 16 S-like ribosomal RNA. Prog. Nucl. Acid Res. Mol. Biol. 32, 155-216. 57. Varani, G., Wimberly, B. Tinoco, I. J. (1989). Con- formation and dynamics of an RNA internal loop. Biochemistry, 28, 7760-7772. 58. Wimberly, B., Varani, G. Tinoco, I. J. (1993). The conformation of loop E of eukaryotic 5S ribosomal RNA. Biochemistry, 32, 1078-1087. 59. Szewczak, A. A., Moore, P. B., Chang, Y. L. Wool, I. G. (1993). The conformation of the sarcin/ricin loop from 28S ribosomal RNA. Proc. Natl Acad. Sci. USA, 90, 9581-9585. 60. Correll, C. C., Munishkin, A., Chan, Y. L., Ren, Z., Wool, I. G. Steitz, T. A. (1998). Crystal structure of the ribosomal RNA domain essential for binding elongation factors. Proc. Natl Acad. Sci. USA, 95, 13436-13441. 61. Correll, C. C. Munishkin, W. I. (1999). The two faces of the Escherichia coli 23 S rRNA Sarcin/Ricin domain: the structure at 1.11 AÊ resolution. J. Mol. Biol. 292, 275-287. 62. SantaLucia, J. J., Kierzek, R. Turner, D. H. (1990). Effects of GA mismatches on the structure and ther- modynamics of RNA internal loops. Biochemistry, 29, 8813-8819. 63. Quigley, G. J. Rich, A. (1976). Structural domains of transfer RNA molecules. Science, 194, 796-806. 64. Kim, S.-H. (1979). Crystal structure of yeast tRNA- phe and general structural features of other tRNAs. In Transfer RNA: Structure, Properties, and Recognition (Schimmel, P. R., Soll, D. Abelson, J. N., eds), pp. 83-100, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York. 65. Walter, A. E. Turner, D. H. (1994). Sequence dependence of stability for coaxial stacking of RNA helixes with Watson-Crick base paired interfaces. Biochemistry, 33, 12715-12719. 66. Walter, A. E., Turner, D. H., Kim, J., Lyttle, M. H., MuÈller, P., Mathews, D. H. Zuker, M. (1994). Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding. Proc. Natl Acad. Sci. USA, 91, 9218- 9222. 67. Kim, J., Walter, A. E. Turner, D. H. (1996). Ther- modynamics of coaxially stacked helixes with GA and CC mismatches. Biochemistry, 35, 13753-13761. 68. Carter, A. P., Clemons, W. M., Brodersen, D. E., Morgan-Warren, R. J., Wimberly, B. T. Ramakrishnan, V. (2000). Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics. Nature, 407, 340-348. 69. Carter, A. P., Clemons, W. M., Jr., Brodersen, D. E., Morgan-Warren, R. J., Hartsch, T., Wimberly, B. T. Ramakrishnan, V. (2001). Crystal structure of an initiation factor bound to the 30S ribosomal subunit. Science, 291, 498-501. 70. Ogle, J. M., Brodersen, D. E., Clemons, W. M., Jr, Tarry, M. J., Carter, A. P. Ramakrishnan, V. (2001). Recognition of cognate transfer RNA by the 30 S ribosomal subunit. Science, 292, 897-902. 71. Woese, C. R. (1980). Just so stories and rube gold- berg machines: speculations on the origins of the protein synthetic machinery. In Ribosomes: Structure, Function, and Genetics (Chambliss, G., Craven, G. R., Davies, J., Davis, K., Kahan, L. Nomura, M., eds), pp. 357-373, University Park Press, Baltimore, Maryland. 72. Frank, J. Agrawal, R. K. (2000). A ratchet-like inter-subunit reorganization of the ribosome during translocation. Nature, 406, 318-322. 73. Sayle, R. A. Milner-White, E. J. (1995). RASMOL: biomolecular graphics for all. Trends Biochem. Sci. 20, 374. 74. Bernstein, H. J. (2000). Recent changes to RasMol, recombining the variants. Trends Biochem. Sci. 25, 453-455. Edited by J. Doudna (Received 27 December 2000; received in revised form 14 May 2001; accepted 29 May 2001) A:A and A:G Base-pairs at the Ends of RNA Helices 753