2. But first...a teaching interlude
●
Teaching half time for Duke Bio 202 (genetics and
evolution)
●
Responsible for one lab section, lab development,
and lecturing
●
Interesting integration of Duke course with Coursera
next semester
3. Overview
1. Transposable elements as a model system
2. Genomic contributions to life history evolution in
Asparagales
3. TEs and aging in Drosophila
4. What is in a genome?
●
The first step in analyzing genomes is usually to mask or filter repetitive
sequences, which often comprise a large portion of the nuclear genome
●
Repetitive sequences include satellites, telomeres, and other “junk” DNA
elements
●
“Selfish” DNA (or mobile genetic elements) is a category of repetitive
sequences representing transposable elements (parasitic self-replicating
derived from viruses)
●
Growing evidence (including ENCODE) supports that “junk” DNA
contains essential function and provides material for evolutionary
innovation
Class I: Retrotransposons Class II: DNA transposons
LTR TIR
LINE Crypton
SINE Helitron
ERV Maverick
SVA
www.virtualsciencefair.org
TEs Asparagales Drosophila
5. TEs directly affect organisms as they move throughout a genome
●
TEs interact with genes
●
TE insertion within a gene disrupts function
●
Exaptation of TEs into genes: Alu elements contributed to
evolution of three color vision (Dulai, 1999)
●
Gene expression and regulatory changes
●
TEs affect molecular evolution
●
Indels
●
increased recombination (chromosomal restructuring)
●
Links between TEs and adaptation/speciation
TEs
Kate Hertweck, Genomic effects of repetitive DNA DNA
NESCent, Genomic effects of junk
Asparagales Drosophila
6. TEs indirectly affect organisms through changes in genome size
Changes in overall genome size
Physical-mechanical effects of nuclear size and mass
Many historical hypotheses about relationships between genome size
and life history (complexity, mean generation time, ecology, growth
form)
TEs Asparagales Drosophila
7. Research questions and goals
●
What are patterns of genome expansion and contraction
throughout the evolutionary history of organisms?
●
Patterns in genome size change
●
Proliferation of TEs within lineages
Evolutionnews.org
TEs Asparagales Drosophila
8. Research questions and goals
●
What are patterns of genome expansion and contraction
throughout the evolutionary history of organisms?
●
Patterns in genome size change
●
Proliferation of TEs within lineages
●
Do genomic patterns correlate with changes in
life history?
●
Improving methods for comparative genomics
across broad taxonomic levels
●
Application of phylogenetic comparative
methods to genomic data
Evolutionnews.org
TEs Asparagales Drosophila
9. Overview
1. Transposable elements as a model system
2. Genomic contributions to life history evolution in
Asparagales
3. TEs and aging in Drosophila
Collaborators:
J. Chris Pires and lab (U of Missouri)
Patrick Edger
Dustin Mayfield
10. Genomic evolution in Asparagales
●
Many edible species (onion, asparagus, agave) and ornamentals
(orchid, amaryllis, yucca)
●
Lots of variation in life history traits: physiology, growth habit,
habitat
●
Interesting patterns of genomic evolution
●
Wide variation genome size
●
Bimodal karyotypes
●
Despite possessing some of the largest angiosperm genomes, we
know little about the TEs in Asparagales
●
Possibility to test hypotheses of correlations between genomic
changes and life history traits
ag.arizona.edu Naturehills.com
TEs Asparagales Drosophila
15. Our data
●
Illumina (80-120 bp single end), 6 taxa per lane
●
GSS (Genome Survey Sequences): total genomic DNA!
●
Data originally collected for systematics
●
Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of
data (Steele et al 2012)
●
Poaceae (family of grasses, model system)
●
Medium-sized genomes
●
Well-annotated library of repeats
●
Asparagales (order of petaloid monocots, non-model system)
●
Very large genomes
●
Discovery of novel repeats
TEs Asparagales Drosophila
16. Our data
●
Illumina (80-120 bp single end), 6 taxa per lane
●
GSS (Genome Survey Sequences): total genomic DNA!
●
Data originally collected for systematics
●
Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of
data (Steele et al 2012)
●
Poaceae (family of grasses, model system)
●
Medium-sized genomes
●
Well-annotated library of repeats
●
Asparagales (order of petaloid monocots, non-model system)
●
Very large genomes
●
Discovery of novel repeats
●
Is there a way to characterize repeats when the genome
is a big black box?
TEs Asparagales Drosophila
17. Bioinformatics approach
●
Sequence assembly:
●
Ab initio repeat construction: use raw sequence reads to build
pseudomolecules or ancestral sequences
●
De novo sequence assembly: standard genome assembly
methods, screen resulting contigs
TEs Asparagales Drosophila
18. Bioinformatics approach
●
Sequence assembly:
●
Ab initio repeat construction: use raw sequence reads to build
pseudomolecules or ancestral sequences
●
De novo sequence assembly: standard genome assembly
methods, screen resulting contigs
●
Annotation method:
Motif searching
●
Reference library
TEs Asparagales Drosophila
19. Bioinformatics approach
●
Sequence assembly:
●
Ab initio repeat construction: use raw sequence reads to build
pseudomolecules or ancestral sequences
●
De novo sequence assembly: standard genome assembly
methods, screen resulting contigs
●
Annotation method:
Motif searching
●
Reference library
Sidenote: improving the ontology for transposable elements
(classification and annotation)
Sequence Ontology (SO)
Comparative Data Analysis Ontology (CDAO)
TEs Asparagales Drosophila
20. Pipeline
Scripts available on GitHub: Raw fastq files
AsparagalesTEscripts
De novo genome assembly (MSR-CA)
Filter out scaffolds that BLAST to reference organellar genomes
Run RepeatMasker to identify similarity to known repeats
(3110 repeats, 98.7% are from grasses )
Discard unknown scaffolds and “unimportant” repeats, categorize others by type
Map raw reads back to scaffolds to estimate relative proportion of TE
TEs Asparagales Drosophila
21. Pipeline
Scripts available on GitHub: Raw fastq files
AsparagalesTEscripts
De novo genome assembly (MSR-CA)
Filter out scaffolds that BLAST to reference organellar genomes
Run RepeatMasker to identify similarity to known repeats
(3110 repeats, 98.7% are from grasses )
Discard unknown scaffolds and “unimportant” repeats, categorize others by type
Map raw reads back to scaffolds to estimate relative proportion of TE
TEs Asparagales Drosophila
22. Quality control: Poaceae
●
Largest scaffolds with deepest coverage are from the chloroplast and
mitochondrial genomes, but are easily identified for exclusion
●
All relevant classes of repeats are present in scaffolds from a single genome
●
Even long repeats can be reconstructed into a single scaffold
●
Characterization of repeats is not dependent on sequence coverage
●
Estimates of quantity repeats are not very accurate-- but there is little
consensus of TE quantification in published literature!
●
Decision: use a dataset constructed from similar data and analyzed in the
same pipeline so any error is systematic and shared among all taxa
●
How well do these methods work for non-model systems?
TEs Asparagales Drosophila
23. Example: LTR from Hosta
●
Reads map across scaffold: assembly is reliable
●
Some divergence in reads: measure of diversity?
TEs Asparagales Drosophila
24. REs in Core Asparagales
TEs Asparagales Drosophila
30. Developing genomic traits for comparative biology
●
Genomic traits can be treated just like any other phenotype
• Number of gene copies of a single family
• Genome size, intron size, GC content, number of chromosomes,
polyploidy, karyotype (sex chromosomes)
• Sometimes genomic traits evolve in such a way that models need to
be altered to accommodate their variation
●
We finally have enough information to be able to apply these methods
across robust phylogenies of organisms!
●
What about transposable elements?
TEs Asparagales Drosophila
31. So what?
●
You can peek into the black box of large plant genomes with even very
limited genomic sequence data
●
There is a great deal of variation in TE compliments among closely
related plant species
●
These methods can easily be applied to extant datasets to summarize
TEs
TEs Asparagales Drosophila
32. So what?
●
Data available for most plants are low coverage, with little known about
the TEs present and their direct effects on the genome and organism
●
Plant genomes tolerate more plasticity than animal genomes
• Polyploidy, chromosomal restructuring more common in plants
• Repetitive compliment comprises a higher proportion of plant
genomes
• Differences in gene silencing
●
Pretty plants are great, but what if we want a more applied approach?
TEs Asparagales Drosophila
33. Overview
1. Transposable elements as a model system
2. Genomic contributions to life history evolution in
Asparagales
3. TEs and aging in Drosophila
Collaborators:
Joseph Graves (UNCG, NC A&T)
Michael Rose (UC Irvine)
Mira Han (NESCent)
34. Genomics of aging
●
Aging as “detuning” of adaptation
●
Age-related genes and expression patterns
●
Does the movement of TEs throughout a genome correspond to how
long an organism lives?
●
Previously discussed life history traits only involve TE proliferation in
gametic tissue
●
Questions about aging involve changes in organisms throughout
lifespan, especially if results can be transferred to human research
TEs Asparagales Drosophila
35. Experimental data
●
Replicate populations of fruit flies selected for both short and long life
spans (Burke et al 2010)
●
Next-gen sequencing of pooled populations
●
SNP analysis indicates allele frequency changes at many loci, but
little evidence for selective sweeps
●
Extensive gene expression change
TEs Asparagales Drosophila
36. Experimental approach
●
Does the frequency of a TE differ between control and treatment
populations?
●
Are there patterns consistent with type of TE
●
T-lex: perl script for identifying presence and absence of annotated
transposable elements
●
2947 transposable elements from publicly available genome
sequence
Scripts available on GitHub: FB
flyTEscripts MITE
LINE
LTR
TIR
TEs Asparagales Drosophila
37. Preliminary results
●
Controls and populations selected for shorter lifespan
●
All population pairs are statistically the same (Kruskal-Wallis,
p=0.9414)
700
600
500
number of TEs
400 NA
0
300 100
final
200
100
0
1 2 3 4 5
population
TEs Asparagales Drosophila
38. Preliminary results
●
Controls and populations selected for shorter lifespan
●
153 TEs vary in one or more population
●
70 TEs vary in all five populations
●
some TE frequencies move to fixation
TEs Asparagales Drosophila
39. Finishing the job...
●
What are patterns from other population pairs (selection for longer
lifespan)?
●
Formal statistical testing for variation
●
Where are TEs of interest located in the genome? What genes are
located nearby?
●
T-lex de novo: searching for unannotated insertions
– Are there unique TE insertions related to longer life spans?
TEs Asparagales Drosophila
40. Conclusions
●
What are general patterns of TE evolution?
●
Different TEs contribute to genome size obesity.
●
We still need better methods to compare genomes.
●
Are there common patterns between TEs and life history trait evolution?
●
Yes, very specific insertions, at least in Drosophila.
●
How can comparative methods be appropriated for genomic
characeristics?
●
Does TE proliferation contribute to diversification or shifts in rates of
molecular evolution?
●
We are getting closer to possessing enough data to answer these
questions.
TEs Asparagales Drosophila
41. Conclusions
●
There are many interesting questions to be investigated using other
folks' genomic trash!
●
A little sequencing data can tell you a lot about a genome.
●
Many markers for systematic purposes
●
You can characterize major groups of repeats even in the absence
of a robust reference library for the species.
●
Informatics tools and resources abound!
TEs Asparagales Drosophila
42. Acknowledgements
NESCent (National Evolutionary Synthesis Center)
Allen Roderigo
Karen Cranston (and bioinformatics group!)
www.nescent.org
k8hert.blogspot.com
Find me:
Twitter @k8hert
Google+ k8hertweck@gmail.com
Kate Hertweck, TE ontology effects of junk DNA
Evolutionary