Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Â
OpenTree at NESCent Academy 2012
1. A community-assembled, continually updated evolutionary
history of all life
Karen A. Cranston
National Evolutionary Synthesis Center
Duke University
7. DATA AVAILABILITY
High archival rate of sequence data
~4% of all published
phylogenetic trees
8. Most trees published
as (beautiful) ďŹgures
in PDF ďŹles
EVOLUTION
not reusable!
Weigmann et al. PNAS, 2011
Fig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (âlnL =
344155.6169) calculated in RAxML. Circles indicate bootstrap support >80% (black/bp = 95â100%, gray/bp = 88â94%, white/bp = 80â88%). Nodes with im-
proved bootstrap values resulting from postanalysis pruning of unstable taxa are marked by stars (black/bp = 95â100%, gray/bp = 88â94%, white/bp = 80â
10. ⢠Ideas Lab = 5-day workshop
⢠Self-assembly into groups
⢠Pitched pre-proposals and end of lab
⢠NSF invited full proposals
11. Karen Cranston, lead PI (Duke)
Gordon Burleigh (Florida)
Keith Crandall (BYU)
Karl Gude (MSU)
David Hibbett (Clark)
Mark Holder (Kansas)
Laura Katz (Smith)
opentreeoďŹife.org Rick Ree (FMNH)
Stephen Smith (Michigan)
Doug Soltis (Florida)
Tiffani Williams (TAMU)
AVAToL: Assembling, Visualizing and Analysis of
the Tree of Life
12.
13. Tree of life
⢠1.8
million named
species
⢠Millions
more
unnamed / undiscovered
14. COMPARATIVE BIOLOGY
Conventional Evolutionary
statistics assume: trees provide:
ModiďŹed from Garland and Carter, 1994
17. 1. Build the ďŹrst complete draft tree of life
2. Engage the community in reďŹnement and
annotation
3. Promote a culture of data sharing through software
products
4. Develop novel methods for phylogenetic
synthesis
18. + taxonomies of living and extinct species
+ any digital phylogenetic data we can get:
NSF Assembling the Tree of Life projects
recent high-proďŹle phylogenies
ribosomal RNA trees for Bacteria and Archaea
TreeBASE and Dryad trees
Graph database holding a âcloudâ of thousands
of input trees with millions of nodes
19. Graph database holding thousands of input
trees with millions of nodes
Filter / weight input data (number of taxa, size
of alignment, year of publication, etc)
Synthesis (supertrees, grafting)
20. Graph database holding a âcloudâ
of thousands of input trees with
millions of nodes ⢠ďŹlter input trees
⢠synthesize into summary
trees
⢠compare to previous trees
⢠invite annotation
⢠input new data sets
21. Ability to annotate
and improve
Clear links to source
data and methods
Compare your
Flag
results with synthetic
Get citations tree
Annotate
Upload
alternate
Tree image modiďŹed from Tree of Life Web Project page http://tolweb.org/Nymphalidae/12172 Pictures by Katja Schulz (queen butterďŹy;
CCAttribution-NonCommercial) and Charles Lam (via Flicker;CCAttribution-ShareAlike)
23. NESCent hackathon to architect and implement a
phylogenetic pruning service for megatrees
http://www.evoio.org/wiki/Phylotastic
24. YEAR 2 & 3: SMART GENERATION OF
FIGURES FOR PUBLICATION
⢠Semantic annotation layers
⢠Collaborative editing
EVOLUTION
⢠Integrated submission of
topology, branch lengths
and annotations to archives
ig. 1. Combined molecular phylogenetic tree for Diptera. Partitioned ML analysis of combined taxon sets of tier 1 and tier 2 FLYTREE data samples (âlnL =
25. YEAR 2 & 3: AUTOMATIC UPDATING
update trees
with new
sequence data
detect and incorporate
newly published trees
26. Community assembly of the
tree of life (Open Tree of Life)
Next generation Phenomics
(PI OâLeary)
Arbor: Comparative Analysis
WorkďŹows (PI Harmon)
27. POTENTIAL
IMPACTS
⢠Phylogenies for any set of species easily available
⢠Benchmark for current state of phylogenetic knowledge
⢠Increasing rate of data archive
⢠Placing âdark taxaâ in global informatics framework
28. BIGGEST
CHALLENGES?
⢠Lack of digitally-available trees
⢠Visualization
⢠Engaging community to annotate and update
⢠Producing usable and visually appealing software