Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
1. Towards a Simple,
Standards Compliant, and
Generic Phylogenetic
Database Module
Hilmar Lapp and Todd Vision
National Evolutionary Synthesis Center
(NESCent)
3. Most data is not online
Syst. Biol.
Data Archive
Clark J.R. et al. (2008) A Comparative Study
in Ancestral Range Reconstruction Methods:
Retracing the Uncertain Histories of Insular
Lineages. Systematic Biology,57:5,693-707
5. Accelerating knowledge
dissemination: A Story
• Jane and her lab have accumulated molecular
data to resolve the phylogeny of a certain clade
of frogs, many of which are endangered species.
• Her group assembles a multiple alignment and
reconstructs the phylogeny using a variety of
methods, some developed by her lab, resulting
in 1000s of trees.
• The results show overwhelming support for
several new branch points. The results are
interesting and solid enough to be useful for
others working on those species.
6. Accelerating knowledge
dissemination: A Story
• Jane and her lab have accumulated molecular
data to resolve the phylogeny of a certain clade
of frogs, many of which are endangered species.
• Her group assembles a multiple alignment and
reconstructs the phylogeny using a variety of
methods, some developed by her lab, resulting
in 1000s of trees.
• The results show overwhelming support for
several new branch points. The results are
interesting and solid enough to be useful for
others working on those species.
7. Accelerating knowledge
dissemination: A Story
• Jane and her lab have accumulated molecular
data to resolve the phylogeny of a certain clade
of frogs, many of which are endangered species.
• Her group assembles a multiple alignment and
reconstructs the phylogeny using a variety of
methods, some developed by her lab, resulting
in 1000s of trees.
• The results show overwhelming support for
several new branch points. The results are
interesting and solid enough to be useful for
others working on those species.
8. Accelerating knowledge
dissemination: A Story
• Jane and her lab have accumulated molecular
data to resolve the phylogeny of a certain clade
of frogs, many of which are endangered species.
• Her group assembles a multiple alignment and
reconstructs the phylogeny using a variety of
methods, some developed by her lab, resulting
in 1000s of trees.
• The results show overwhelming support for
several new branch points. The results are
interesting and solid enough to be useful for
others working on those species.
9. • Jane downloads and installs PhyloDOM, a
freely available open source software package.
The software creates a database and Jane uses
the programs that come with it to import all
her data.
• As a result, Jane’s lab now has a web-interface
to her results that others can use to query for
novel topologies and to explore her data.
• Her lab also updates the database from their
on-going work, and uses it to add provenance
data and links to protocols, publications, and
taxonomic concepts.
10. • Jane downloads and installs PhyloDOM, a
freely available open source software package.
The software creates a database and Jane uses
the programs that come with it to import all
her data.
• As a result, Jane’s lab now has a web-interface
to her results that others can use to query for
novel topologies and to explore her data.
• Her lab also updates the database from their
on-going work, and uses it to add provenance
data and links to protocols, publications, and
taxonomic concepts.
11. • Jane downloads and installs PhyloDOM, a
freely available open source software package.
The software creates a database and Jane uses
the programs that come with it to import all
her data.
• As a result, Jane’s lab now has a web-interface
to her results that others can use to query for
novel topologies and to explore her data.
• Her lab also updates the database from their
on-going work, and uses it to add provenance
data and links to protocols, publications, and
taxonomic concepts.
12. • Jane downloads and installs PhyloDOM, a
freely available open source software package.
The software creates a database and Jane uses
the programs that come with it to import all
her data.
• As a result, Jane’s lab now has a web-interface
to her results that others can use to query for
novel topologies and to explore her data.
• Her lab also updates the database from their
on-going work, and uses it to add provenance
data and links to protocols, publications, and
taxonomic concepts.
13. • Other researchers easily download and
integrate her results in their own analyses.
• Even where Jane used new methods, other
software understands the meaning of the
metadata and can take advantage of it.
• Within shortly, her results appear in data
aggregators such as iSpecies, EOL, or
Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map
her trees onto geo-coordinates and to link
branches to ecological and biodiversity
parameters of respective areas.
14. • Other researchers easily download and
integrate her results in their own analyses.
• Even where Jane used new methods, other
software understands the meaning of the
metadata and can take advantage of it.
• Within shortly, her results appear in data
aggregators such as iSpecies, EOL, or
Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map
her trees onto geo-coordinates and to link
branches to ecological and biodiversity
parameters of respective areas.
15. • Other researchers easily download and
integrate her results in their own analyses.
• Even where Jane used new methods, other
software understands the meaning of the
metadata and can take advantage of it.
• Within shortly, her results appear in data
aggregators such as iSpecies, EOL, or
Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map
her trees onto geo-coordinates and to link
branches to ecological and biodiversity
parameters of respective areas.
16. • Other researchers easily download and
integrate her results in their own analyses.
• Even where Jane used new methods, other
software understands the meaning of the
metadata and can take advantage of it.
• Within shortly, her results appear in data
aggregators such as iSpecies, EOL, or
Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map
her trees onto geo-coordinates and to link
branches to ecological and biodiversity
parameters of respective areas.
17. • Other researchers easily download and
integrate her results in their own analyses.
• Even where Jane used new methods, other
software understands the meaning of the
metadata and can take advantage of it.
• Within shortly, her results appear in data
aggregators such as iSpecies, EOL, or
Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map
her trees onto geo-coordinates and to link
branches to ecological and biodiversity
parameters of respective areas.
18. How to get there?
Embeddable Tools
Client-based Query Data Aggregators,
(PhyloWidget,
Interfaces Mash-up Applications
GBrowse TreeWidget)
Data and other services API (PhyloWS)
supporting exchange standards (NeXML, CDAO)
Data Middleware: Query & Persistence Management
Management
Tools
Topology-
oriented
Phylogenetic Database supporting Queries
- ontologies
- arbitrary metadata Precompute
(PhyloDB / BioSQL) Query
Optimization
Molecular
Data
(Sequences, Language binding for database model Data loading tools
Annotation)
(BioPerl, Biojava, Biopython, Bioruby) (BioSQL)
Parser libraries for data and semantics
Ontologies
standards (NeXML, CDAO)
Phylogenetic Metadata
Character (Evolutionary, ITIS, NCBI
Trees Taxonomies
(Gene, Species)
Data Biodiversity, Taxonomies
Computational)
19. Achieving the Vision:
Coordinated & open
development,
nurturing & harnessing
existing efforts
20. Database:
PhyloDB module
Edge_Qualifier_ Node_Qualifier_
Node_Path Value Value
-Value -Value
- distance -Rank -Rank
Edge
Node_Dbxref
Tree_Root
-Is_Alternate
Node -Significance Tree_Dbxref
Node_Taxon -Label
-Rank -Left_Idx
-Right_Idx
Node_Bioentry Tree_Qualifier_
-Rank
Value
-Value
-Rank
Tree
-Name
-Identifier
Taxon
-Is_Rooted Term
Bioentry Dbxref
Biodatabase Ontology
29. • James Estill (U. Georgia):
“A Perl-based Command Line Interface to a
Topological Query Application for BioSQL in Support
of High Throughput Classification and Analysis of LTR
Retrotransposons in Plant Genomes”
30. Acknowledgments
• Phyloinformatics • Sponsors & support:
Hackathon
participants • NESCent
• BioHackathon 2008 • BioSynC
participants
• TDWG
• EvoInformatics
Working Group • DBCLS, CBRC (Japan)
participants
• Google Summer of
Code Students:
Jamie Estill