1. A community-assembled, continually updated evolutionary
history of all life
Karen A. Cranston
National Evolutionary Synthesis Center
Duke University
7. DATA AVAILABILITY
High archival rate of sequence data
~4% of all published
phylogenetic trees
8. Most trees published
as (beautiful) figures
in PDF files
EVOLUTION
not reusable!
Weigmann et al. PNAS, 2011
Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =
344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–88%). Nodes with im-
proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95–100%, gray/bp = 88–94%, white/bp = 80–
10. • Ideas Lab = 5-day workshop
• Self-assembly into groups
• Pitched pre-proposals and end of lab
• NSF invited full proposals
11. Karen Cranston, lead PI (Duke)
Gordon Burleigh (Florida)
Keith Crandall (BYU)
Karl Gude (MSU)
David Hibbett (Clark)
Mark Holder (Kansas)
Laura Katz (Smith)
opentreeoflife.org Rick Ree (FMNH)
Stephen Smith (Michigan)
Doug Soltis (Florida)
Tiffani Williams (TAMU)
AVAToL: Assembling, Visualizing and Analysis of
the Tree of Life
12.
13. Tree of life
• 1.8
million named
species
• Millions
more
unnamed / undiscovered
14. COMPARATIVE BIOLOGY
Conventional Evolutionary
statistics assume: trees provide:
Modified from Garland and Carter, 1994
17. 1. Build the first complete draft tree of life
2. Engage the community in refinement and
annotation
3. Promote a culture of data sharing through software
products
4. Develop novel methods for phylogenetic
synthesis
18. + taxonomies of living and extinct species
+ any digital phylogenetic data we can get:
NSF Assembling the Tree of Life projects
recent high-profile phylogenies
ribosomal RNA trees for Bacteria and Archaea
TreeBASE and Dryad trees
Graph database holding a ‘cloud’ of thousands
of input trees with millions of nodes
19. Graph database holding thousands of input
trees with millions of nodes
Filter / weight input data (number of taxa, size
of alignment, year of publication, etc)
Synthesis (supertrees, grafting)
20. Graph database holding a ‘cloud’
of thousands of input trees with
millions of nodes • filter input trees
• synthesize into summary
trees
• compare to previous trees
• invite annotation
• input new data sets
21. Ability to annotate
and improve
Clear links to source
data and methods
Compare your
Flag
results with synthetic
Get citations tree
Annotate
Upload
alternate
Tree image modified from Tree of Life Web Project page http://tolweb.org/Nymphalidae/12172 Pictures by Katja Schulz (queen butterfly;
CCAttribution-NonCommercial) and Charles Lam (via Flicker;CCAttribution-ShareAlike)
23. NESCent hackathon to architect and implement a
phylogenetic pruning service for megatrees
http://www.evoio.org/wiki/Phylotastic
24. YEAR 2 & 3: SMART GENERATION OF
FIGURES FOR PUBLICATION
• Semantic annotation layers
• Collaborative editing
EVOLUTION
• Integrated submission of
topology, branch lengths
and annotations to archives
ig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (−lnL =
25. YEAR 2 & 3: AUTOMATIC UPDATING
update trees
with new
sequence data
detect and incorporate
newly published trees
26. Community assembly of the
tree of life (Open Tree of Life)
Next generation Phenomics
(PI O’Leary)
Arbor: Comparative Analysis
Workflows (PI Harmon)
27. POTENTIAL
IMPACTS
• Phylogenies for any set of species easily available
• Benchmark for current state of phylogenetic knowledge
• Increasing rate of data archive
• Placing “dark taxa” in global informatics framework
28. BIGGEST
CHALLENGES?
• Lack of digitally-available trees
• Visualization
• Engaging community to annotate and update
• Producing usable and visually appealing software