Franz 2014 ESA Aligning Insect Phylogenies Perelleschus and Other Cases
1. Aligning insect phylogenies:
Perelleschus and other cases
Nico M. Franz 1,2
Arizona State University
http://taxonbytes.org/
1 Concepts and tools developed jointly with members of the Ludäscher Lab (UC Davis & UIUC):
Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers & Bertram Ludäscher
2 Systematics, Evolution and Biodiversity Section, Ten Minute Papers
Annual Meeting of the Entomological Society of America
November 18, 2014 - Portland, Oregon
On-line @ http://www.slideshare.net/taxonbytes/franz-2014-esa-aligning-insect-phylogenies-perelleschus-and-other-cases-41654235
2. Research motivation: 1
How can we represent, and reason over,
taxonomic concept provenance,
based on varying input classifications
and differentially sampled phylogenies?
1 This presentation concentrates on the "how?"; though the "why?" is addressed in the References (listed at the end).
3. Definitional preliminaries, 1
Taxonomic concept: 1
The circumscription of a perceived
(or, more accurately, hypothesized)
taxonomic group, as advocated by
a particular author and source.
1Not the same as species concepts, which are theories about what species are, and/or how they are recognized.
4. Definitional preliminaries, 2
Provenance: 1
Information describing the origin, derivation,
history, custody, or context of an entity (etc.).
Provenance establishes the authenticity, integrity
and trustworthiness of information about entities.
1 See, e.g.: http://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance
5. Definitional preliminaries, 3
Alignment ("merge"):
A comprehensive, logically consistent, and
(where possible) well-specified reconciliation
of shared and unique Euler regions that result from
integrating two or more taxonomic concept
hierarchies ("trees") with RCC-5 articulations.1
1 RCC-5 = Region Connection Calculus (set theory relationships: congruence, inclusion, overlap, exclusion, etc.).
8. Perelleschus salpinflexus Cardona-Duque & Franz sec. Franz & Cardona-Duque (2013)
⢠Habitus, mouthparts One might call this string a Taxonomic Concept Label.
Female
,
habitu
s
Labium Maxill
a
9. Perelleschus salpinflexus Cardona-Duque & Franz sec. Franz & Cardona-Duque (2013)
⢠Male & female terminalia, showing putative synapomorphies
Synapomorphy (genus-level): Spermatheca
with an acute, sclerotized appendix at
insertion of the collum (character 17:1).
"11"
Synapomorphy (subclade-level):
Aedeagus with endophallic
sclerites extending in apical
half of aedeagus (character
11:1).
"17"
19. Introducing the Euler/X software toolkit (Open Source)
"A toolkit for consistently aligning
sets of hierarchically arranged entities
under (relaxable) logic constraints,
and using RCC-5 articulations."
Desktop tool @ https://bitbucket.org/eulerx
Euler server @ http://euler.asu.edu
21. Euler/X uses Answer Set Programming.
The reasoner asks, and solves, the question:
"Which possible worlds can be generated
that satisfy (i.e., are consistent with)
a given set of input constraints?" 1
22. Euler/X uses Answer Set Programming.
The reasoner asks, and solves, the question:
"Which possible worlds can be generated
that satisfy (i.e., are consistent with)
a given set of input constraints?" 1
1 Input constraints:
⢠T1 â taxonomy 1
⢠T2 â taxonomy 2
⢠A â user-asserted articulations
⢠C â additional 'tree' constraints
23. Alignment 1 - Perelleschus sec. WOB (1986) versus sec. FOB (2001)
T1: Perelleschus sec. 1986
⢠Traditional classification
⢠1 genus-level concept
⢠3 species-level concepts
24. Alignment 1 - Perelleschus sec. WOB (1986) versus sec. FOB (2001)
T1: Perelleschus sec. 1986
⢠Traditional classification
⢠1 genus-level concept
⢠3 species-level concepts
T2: Perelleschus sec. 2001
⢠Phylogenetic revision
⢠2 genus-level concepts
⢠7 clade-level concepts
⢠9 species-level concepts
25. Format for alignment input file (constraints: T1, T2, A, C)
Year Source
T2
Parent
concept
Child
concepts
T1
T2 to T1
Articulations
(as provided
by the user)
26. Input visualization
Six1 user-asserted input articulations (pink lines) are sufficient to yield a single,
well-specified alignment.
1Actually, three (species-level) articulations are sufficient to achieve this for the 2001/1986 alignment.
27. Alignment (merge) visualization
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
28. Alignment (merge) visualization
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
3 congruent 2001/1986 species-level concepts.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
29. Alignment (merge) visualization
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
3 congruent 2001/1986 species-level concepts.
6 species-level concepts unique sec. FOB (2001).
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
30. Alignment (merge) visualization
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
3 congruent 2001/1986 species-level concepts.
6 species-level concepts unique sec. FOB (2001).
6 clade-level concepts unique to FOB (2001).
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
31. Alignment (merge) visualization
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
3 congruent 2001/1986 species-level concepts.
6 species-level concepts unique sec. FOB (2001).
6 clade-level concepts unique to FOB (2001).
2001.PER & 2001.PHY in overlap with 1986.PER.
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
32. Alignment (merge) visualization
Reasoner infers 66 additional, logically implied articulations (MIR).1
2001.Perelleschus >< 1986.Perelleschus; provenance of overlapping articulation
is explained in the merge taxonomy.
We can 'zoom in' on the overlap
and resolve the resulting subregions
in the "merge concept view".
1 MIR = Maximally Informative Relations (among paired concepts of T1, T2).
Legend
33. Merge concept view (in part)
"2001.PER and 1986.PER share a region (2001.PER * 1986.PER) constituted (at lower
levels) by 2001/1986.P_rectirostris; this latter region is that which is entailed in
1986.PER and excluded from 2001.PHY. (1986.PER2001.PHY)."
2001 concepts
2001/1986 concepts
34. Merge concept view (in part)
"2001.PHYsubcin/1986.Psubcin differentially 'participates' in 2001.PHY and
1986.PER; but not 2001.PER (or any of its children)."
2001 concepts
2001/1986 concepts
35. Alignment 2 - Perelleschus sec. FOB (2001) versus sec. F (2006)
T1: Perelleschus sec. 2001
⢠Phylogenetic revision
⢠8 ingroup species concepts
⢠2 outgroup concepts
⢠18 concepts total
36. Alignment 2 - Perelleschus sec. FOB (2001) versus sec. F (2006)
T1: Perelleschus sec. 2001
⢠Phylogenetic revision
⢠8 ingroup species concepts
⢠2 outgroup concepts
⢠18 concepts total
T2: Perelleschus sec. 2006
⢠Exemplar analysis
⢠2 ingroup species concepts
⢠1 outgroup concept
⢠7 concepts total
37. Logic representation challenge:
Perelleschus sec. 2001 & 2006 concepts
have incongruent sets of subordinate members,
yet each concept has congruent synapomorphies.
38. Definitional preliminaries, 4 1
Ostensive alignment: the congruence among higher-level
concepts is assessed in relation to their entailed members.
ď Ostension: giving meaning through an act of pointing out.
1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/
39. Definitional preliminaries, 4 1
Ostensive alignment: the congruence among higher-level
concepts is assessed in relation to their entailed members.
ď Ostension: giving meaning through an act of pointing out.
Intensional alignment: the congruence among higher-level
concepts is assessed in relation to their properties.
ď Intension: giving meaning through the specification of properties.
1 See Bird & Tobin. 2012. Natural Kinds. URL: http://plato.stanford.edu/archives/win2012/entries/natural-kinds/
40. Ostensive alignment â members are all that counts
Input constraints Challenge 1: Ostensive alignment
Ostensive alignment
2001 & 2006
41. Ostensive alignment â members are all that counts
Challenge 1: Ostensive alignment
Solution: 11 ingroup concept articulations
are coded ostensively â either as
<, ><, or | â to represent non-congruence
in the representation
of child concepts
Input constraints
Ostensive alignment
2001 & 2006
42. Ostensive alignment â members are all that counts
Challenge 1: Ostensive alignment
Solution: 11 ingroup concept articulations
are coded ostensively â either as
<, ><, or | â to represent non-congruence
in the representation
of child concepts
Result: 2006.PER < 2001.PER
2006.PER | 2001.[5 species concepts]
etc.
Input constraints
Ostensive alignment
2001 & 2006
5 x |
2 x ><
44. Intensional alignment â representation of congruent synapomorphies
Input constraints
Challenge 2: Intensional alignment
Solution: An Implied Child (_IC) concept is
added to the undersampled (2006)
clade concept; and the (5) "missing"
species-level concepts are included
within this Implied Child
Intensional alignment
2001 & 2006
"17"
"11"
45. Intensional alignment â representation of congruent synapomorphies
Input constraints
Challenge 2: Intensional alignment
Solution: An Implied Child (_IC) concept is
added to the undersampled (2006)
clade concept; and the (5) "missing"
species-level concepts are included
within this Implied Child
11 ingroup concept articulations are
coded intensionally â as == or > â
to reflect congruent synapomorphies
(chars. 11, 17) of 2001 & 2006
Intensional alignment
2001 & 2006
"17"
"11"
46. Intensional alignment â representation of congruent synapomorphies
Input constraints
Challenge 2: Intensional alignment
Result: The genus- and ingroup clade-level
concepts are inferred as congruent:
2006. PER == 2001.PER
2006.PcarPeve == 2001.PcarPsul
etc.
Intensional alignment
2001 & 2006
47. Review â representing ostensive versus intensional alignments
Ostensive alignment
2001.PER includes more
species-level concepts
than 2006.PER [>].
48. Review â representing ostensive versus intensional alignments
Ostensive alignment
2001.PER includes more
species-level concepts
than 2006.PER [>].
Intensional alignment
2006.PER reconfirms the
synapomorphies inferred
in 2001.PER [==].
50. Use case: Alternative phylogenetic schemes of higher-level weevils
T1: Curculionoidea sec. Kuschel (1995)
⢠Cladistic analysis
⢠41 concepts
51. Use case: Alternative phylogenetic schemes of higher-level weevils
T1: Curculionoidea sec. Kuschel (1995)
⢠Cladistic analysis
⢠41 concepts
T2: Curculionoidea sec. Marvaldi &
Morrone (2000)
⢠Cladistic analysis
⢠25 concepts
52. Alignment: Curculionoidea sec. K (1995) versus sec. MM (2000)
Initial visual impression: Lots of green rectangles, yellow octagons, and overlap (><).
Much taxonomic concept incongruence.
53. Use case: Dwarf lemurs sec. 1993 & 2005 1
Chirogaleus furcifer sec. MĂźhel (1890) â Brehms Tierleben.
Public Domain: http://books.google.com/books?id=sDgQAQAAMAAJ
1 Franz et al. 2014. Taxonomic provenance: Two influential primate classifications logically aligned. (in preparation)
54. The 2nd & 3rd Editions of the Mammal Species of the World
1993 2005
Primates sec. Groves (1993)
ď 317 taxonomic concepts,
233 at the species level.
Primates sec. Groves (2005)
ď 483 taxonomic concepts,
376 at the species level.
Î = 143
species-level
concepts
55. Alignment of Primates sec. Groves 1993 / 2005
Primates: 800 concepts
402
articulations
153,111 MIR
ď ~ 380x information gain!
Strepsirrhini sec. MSW3
Haplorrhini sec. MSW3
Catarrhini sec. MSW3
56. Taxonomic provenance ď quantify name/meaning dissociation
'Dissociation' means that either un-identical names are paired with congruent concepts,
or that identical names are paired with incongruent concepts.
"Reliable names" "Unreliable names"
57. In summary (1) â What this approach can provide:
So, given an input set of [T1, T2, A, C], one gains:
(1) Logical consistency in the alignment;
(2) Intended degree of alignment resolution;
(3) Additional, logically implied articulations;
(4) Visualizations of taxonomic provenance;
(5) Quantifications of name/meaning relations.
58. In summary (2) â Representation and reasoning abilities
⢠Compatibility with contemporary Linnaean nomenclature (and PhyloCode too);
⢠Integration of many-to-many name/circumscription relationships across taxonomies;
⢠Reconciliation of traditional classifications with fully bifurcated phylogenies;
⢠Representation of monotypic concept lineages with congruent taxonomic extensions;
59. In summary (2) â Representation and reasoning abilities
⢠Compatibility with contemporary Linnaean nomenclature (and PhyloCode too);
⢠Integration of many-to-many name/circumscription relationships across taxonomies;
⢠Reconciliation of traditional classifications with fully bifurcated phylogenies;
⢠Representation of monotypic concept lineages with congruent taxonomic extensions;
⢠Accounting for insufficiently specified higher-level entities:
⢠Undersampled outgroup entities;
⢠Differentially sampled ingroup entities;
⢠Resolution of taxonomically overlapping entities and merge concepts;
⢠Differentiation of ostensive versus intensional readings of concept articulations;
⢠Representation of topologically localized resolution versus ambiguity in alignments.
60. In summary (2) â Representation and reasoning abilities
⢠Compatibility with contemporary Linnaean nomenclature (and PhyloCode too);
⢠Integration of many-to-many name/circumscription relationships across taxonomies;
⢠Reconciliation of traditional classifications with fully bifurcated phylogenies;
⢠Representation of monotypic concept lineages with congruent taxonomic extensions;
⢠Accounting for insufficiently specified higher-level entities:
⢠Undersampled outgroup entities;
⢠Differentially sampled ingroup entities;
⢠Resolution of taxonomically overlapping entities and merge concepts;
⢠Differentiation of ostensive versus intensional readings of concept articulations;
⢠Representation of topologically localized resolution versus ambiguity in alignments.
⢠Next critical step(s): accessible, scalable, usable, integrated web instance of Euler/X
61. In summary (3) â Take-home message
We can explain (much of)
taxonomy's legacy to computers (e.g.)
for superior name/meaning resolution.
Well, then, should we?
And at what cost?
65. Select references on concept taxonomy and the Euler/X toolkit
⢠Franz et al. 2008. On the use of taxonomic concepts in support of biodiversity
research and taxonomy. In: The New Taxonomy; pp. 63â86. Link
⢠Franz & Peet. 2009. Towards a language for mapping relationships among
taxonomic concepts. Systematics and Biodiversity 7: 5â20. Link
⢠Franz & Thau. 2010. Biological taxonomy and ontology development: Scope and
limitations. Biodiversity Informatics 7: 45â66. Link
⢠Chen et al. 2014. Euler/X: a toolkit for logic-based taxonomy integration. WFLP
2013 â 22nd International Workshop on Functional and (Constraint) Logic
Programming. Link
⢠Chen et al. 2014. A hybrid diagnosis approach combining Black-Box and White-
Box reasoning. Lecture Notes in Computer Science 8620: 127â141. Link
⢠Franz et al. 2014. Names are not good enough: Reasoning over taxonomic change in
the Andropogon complex. Semantic Web â Interoperability, Usability, Applicability â
Special Issue on Semantics for Biodiversity. (in press) Link
⢠Franz et al. 2014. Reasoning over taxonomic change: Exploring alignments for the
Perelleschus use case. PLoS ONE. (in press) Link
⢠Franz et al. 2015. Taxonomic provenance: Two influential primate classifications
logically aligned. (in preparation)
72. R32 lattice of RCC-5 articulations (lighter color = less certainty)
73. The other piece in the puzzle: Concept-to-voucher identifications
Source: Baskauf & Webb. 214. Darwin-SW. URL: http://www.semantic-web-journal.net/system/files/swj635.pdf