This document summarizes a study of transposable elements (TEs) in the plant order Asparagales. The researchers sequenced and assembled TEs from several Asparagales taxa using low coverage genome sequencing and de novo assembly. They classified the assembled TEs and estimated their relative abundances. Their results showed that different taxa varied in their proportions of different TE types, independent of genome size and phylogeny. The study provides insights into TE diversity and evolution across Asparagales genomes.
Transposable element proliferation and genome size evolution in Asparagales
1. Transposable element proliferation and genome size evolution
in Asparagales
Naturehills.com
Kate L. Hertweck Erica Wheeler ag.arizona.edu
National Evolutionary Synthesis Center (NESCent), Durham, NC, USA wikicommons
k8hertweck@gmail.com, Twitter: @k8hert, http://k8hert.blogspot.com, http://www.slideshare.net/katehertweck
Subfamily # of Chromosome # Genome size (pg/1C)
species (taxa sampled) (taxa sampled)
Introduction Xeronemataceae 2 34
(1)
3.28
(1) outgroup
Amaryllidaceae Xanthorrhoeaceae
Asparagales as a model system Asphodeloideae* 785 12–78
(128)
5.25-38.3
(139)
● 14 families, 1122 genera, ~26000 species; diverged over 100 mya (Stevens,
2001 onwards) Hemerocallodoideae 85 32
(1)
0.76
(1)
● Many edible and ornamental species
Xanthorrhoeoideae* 30 22 1.04
● Variation in karyotype and genome size (Pires et al., 2006; Figure 1) (1) (1)
● Paucity of genomic resources, especially for TEs (see below)
Agapanthoideae 9 30 11.23-23.78
(7) (9)
● “Core” Asparagales represents monophyletic lineage of three closely related
Allioideae 795 10-66 7.6-74.5
families (153) (162)
Transposable elements (TEs) Amaryllidoideae 800 10-72
(93)
6.15-82.15
(112)
● Mobile genetic elements able to replicate and move throughout a genome
Lomandroideae 178 8-32 1.25-25.3
● Represent at least 50% of the DNA in many eukaryotic genomes (6) (8)
● Both fine and coarse scale implications in genomic and organismal evolution Asparagoideae 165-295 20-112 1.28-4.18
(3) (3)
● Contribute to increases in genome size independent of, but sometimes in
Asparagaecae
conjunction with, polyploidy and other types of sequence duplication Nolinoideae* 475 30-108
(11)
0.93-53.5
(33)
(Federoff, 2012)
Aphyllanthoideae 1 N/A 0.65
Research objectives (1)
● Assemble consensus sequences of the most abundant (recently proliferated) Agavoideae* 637 16-180
(56)
2.55-19.6
(98)
TEs in Asparagales genomes
● Estimate the relative abundance of each type of TE Scilloideae 770-1000 6-54 2.6-75.9
(75) (109)
Brodiaeoideae 62 4 10.65-18.15
(1) (3)
Filter out scaffolds that BLAST to reference organellar genomes Figure 1. Phylogeny of subfamilies in core Asparagales based on all plastome genes (Steele et al.,
2012). Classification based on APGIII (2009). Subfamilies in green were included in sampling for the
present study. Species estimates for each subfamily are from the Angiosperm Phylogeny Website
Raw fastq files from low coverage, anonymous, sequencing of total genomic DNA (Stevens, 2001 onwards). Chromosome number and genome size ranges obtained from the Plant DNA
C-values Database (Bennett, 2010). Asterisks (*) indicate subfamilies containing taxa with bimodal
De novo genome assembly karyotypes.
(MSR-CA, http://www.genome.umd.edu/SR_CA_MANUAL.htm)
Results, conclusions, future directions
Run RepeatMasker to identify similarity to known repeats ● Assembly of all classes of TEs possible from GSS
(3110 repeats, 98.7% are from grasses ) ● Most scaffolds are partial sequences, although full-length TEs occur
● Proportion of different TEs varies independent of genome size and
Discard unknown scaffolds and “unimportant” repeats, categorize others by type phylogeny
● What variation is there among families of each TE type?
● Are there unique TE families in Asparagales?
Map raw reads back to scaffolds to estimate relative proportion of TE
● What is the sequence variation of reads mapping to these scaffolds?
Figure 2. Diagram of the bioinformatics pipeline to assemble and annotate TEs from Illumina GSS data in ● Are there correlations between TE presence/abundance and life
this study.
history traits?
100 25
Methods 90
genome size (pg/1C)
Sequencing 80 20
percentage
● Genome survey sequences (GSS): anonymous, low-coverage sequencing 70
from total genomic DNA
● Illumina GAIIx, single-end, 80 bp reads (Steele et al., 2012)
60 15
● Proof-of-concept and quality control with six Poaceae taxa, the monocot
50
genomic model system (data not shown) 40 10
30
Bioinformatics
● De novo genome assembly, TE annotation, scaffold filtering, read mapping 20 5
(Figure 2) 10
● Custom scripts available at http://github.com/k8hertweck/AsparagalesTEscripts
0 0
Taxon Subfamily Genome size Average MSR % organellar % repeat LINEs
(pg/1C) genome scaffolds scaffolds scaffolds Copia LTRs
coverage Gypsy LTRs
Asphodeloideae Haworthia 15.2 0.02X 1360 2.6 29.1 DNA TEs
Figure 3. Percentage of different TE types of total repetitive fraction of
Agapanthoideae Agapanthus 10.5 0.01X 438 7.8 34.9 representative core Asparagales taxa, arranged in order of increasing other (RC, satellite, low
complexity, simple repeats)
Allioideae Allium 13.2 0.03X 1858 7.6 23.9 total genome size.
Genome size (pg/1C)
Amaryllidoideae Scadoxus 22.1 0.02X 1336 4.1 30.0
Lomandroideae Lomandra 1.15 0.33X 1491 7.6 29.2 Acknowledgements
Asparagoideae Asparagus 1.36 0.30X 1977 2.6 26.5 I acknowledge the National Science Foundation for funding (DEB 0829849 and DEB 1146603), as well as
collaborators on the Monocot AToL project.
Nolinoideae Sansevieria 1.25 0.32X 835 6.9 26.7
Aphyllanthoideae Aphyllanthes 0.65 0.34X 436 15.3 38.0 References
Agavoideae Hosta 19.6 N/A 1084 6.1 34.5 APGIII. 2009. An update of the Angiosperm Phylogeny Group classification for the orders and families of
flowering plants: APG III. Botanical Journal Of The Linnean Society 161: 105-121.
Scilloideae Ledebouria 8.85 0.04X 2481 4.4 24.8
Bennett, M. D., and I. J. Leitch. 2010. Angiosperm DNA C-values database. http://www.kew.org/cvalues.
Brodiaeoideae Dichelostemma 9.35 0.03X 1706 1.5 27.7 Fedoroff, N. V. 2012. Transposable Elements, Epigenetics,and Genome Evolution. Science 338:758-767.
Pires, J. C., I. J. Maureira, T. J. Givnish, K. J. Sytsma, O. Seberg, G. Petersen, J. I. Davis, et al.
Table 1. Results of TE assembly and annotation from Asparagales taxa following the bioinformatics methods 2006. Phylogeny, genome size, and chromosome evolution of Asparagales. Aliso 22: 285-302.
in Figure 2. Genome size data for samples sequenced is described in Steele et al. (2012). Average genome Steele, P. R., K. L. Hertweck, D. Mayfield, M. R. McKain, J. Leebens-Mack, and J. C. Pires. 2012.
coverage calculated from 1C genome size, read length, and number of reads for each sample; coverage data for Quality and quantity of data recovered from massively parallel sequencing: Examples in Asparagales and
Hosta is unavailable as sequencing was performed on DNA enriched for plastome. Percentages represent Poaceae. American Journal Of Botany 99: 330-348.
proportion of reads belonging to organelles and annotated repeats from total number of MSR scaffolds. Stevens, P. F. 2001 onwards. Angiosperm Phylogeny Website
http://www.mobot.org/MOBOT/research/APweb/ [accessed Jan 2013].