Presentation on reconciling taxonomic concepts using the Euler approach, given at the 2012 Annual Meeting of Entomological Society of America, Knoxville, TN.
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Franz et. al. 2012. Reconciling Succeeding Classifications, ESA 2012
1. Reconciling succeeding
taxonomic classifications
Nico M. Franz
School of Life Sciences, Arizona State University
Mingmin Chen, Shizhuo Yu, Bertram Ludäscher *
Department of Computer Science, University of California at Davis
ESA Annual Meeting 2012
November 14, 2012 – Knoxville, TN
* PI – NSF-IIS 1118088: A logic-based, provenance-aware system for merging scientific data under context and classification constraints.
2. Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Source: Weakley. 2005. Flora of the Carolinas, Virginia, and Georgia. Available at http://www.herbarium.unc.edu/flora.htm
3. Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual columns represent past classifications of Andropogon.
Source: Weakley. 2005. Flora of the Carolinas, Virginia, and Georgia. Available at http://www.herbarium.unc.edu/flora.htm
4. Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual rows represent equivalent taxonomic entities, (almost)
regardless of their name labels.
5. Challenge – describing classification provenance beyond synonymy
Andropogon spp. in the Carolinas, from Hackel 1889 to Weakley 2005
Individual rows represent equivalent taxonomic entities, (almost)
regardless of their name labels.
Name/synonymy relationships are not sufficiently granular to
capture this evolution of taxonomic views of Andropogon species.
6. Tracking classification provenance with concepts and articulations
Definition: A taxonomic concept is the underlying meaning of a scientific name as stated
by a particular author and publication. It represents the author's full-blown
view of how the name reaches out to un-/observed objects in nature.
Labeling: The abbreviation sec. for the Latin secundum, or "according to", is preceded by
the full Linnaean name and followed by the specific author and publication.
Source: Berendsohn. 1995. The concept of "potential taxa" in databases. Taxon 44: 207–212.
7. Tracking classification provenance with concepts and articulations
Definition: A taxonomic concept is the underlying meaning of a scientific name as stated
by a particular author and publication. It represents the author's full-blown
view of how the name reaches out to un-/observed objects in nature.
Labeling: The abbreviation sec. for the Latin secundum, or "according to", is preceded by
the full Linnaean name and followed by the specific author and publication.
Examples: Andropogon virginicus L. sec. Radford et al. (1968)
Andropogon virginicus L. sec. Weakley (2005)
[earlier, wider concept]
[later, narrower concept]
Utility: Representing multiple classifications (revisions) through concepts makes it possible
to track their similarities and differences through articulations.
Source: Berendsohn. 1995. The concept of "potential taxa" in databases. Taxon 44: 207–212.
8. Five basic articulations between two concepts C1, C2 (set theory)
equivalence
inverse proper
inclusion
exclusion
proper inclusion
overlap
Use of "OR" to express uncertainty.
Example: C1 == OR > C2
Source: Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Syst. Biodiv. 7: 5–20.
9. How does it work? Connecting Hackel 1889 and Small 1933
Step 1: Transcribe two concept hierarchies…
Hackel 1889 (1-12)
Small 1933 (13-16)
…and add unique IDs
10. How does it work? Connecting Hackel 1889 and Small 1933
Step 2: Create a table with all concept labels
Hackel 1889 (1-12)
Small 1933 (13-16)
11. How does it work? Connecting Hackel 1889 and Small 1933
Step 3: Create a table with corresponding parent/child relationships ('is_a')
Hackel 1889 (1-12)
Small 1933 (13-16)
12. How does it work? Connecting Hackel 1889 and Small 1933
Step 4: Create a table with a suitable set of articulations
Hackel 1889 (1-12)
Small 1933 (13-16)
13. How does it work? Connecting Hackel 1889 and Small 1933
Step 4: Create a table with a suitable set of articulations
Hackel 1889 (1-12)
Small 1933 (13-16)
Translation
Congruence
15. Technical challenges to creating articulations
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
16. Technical challenges to creating articulations
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
Input of articulations (goal: achieve a complete and consistent mapping)
Taxonomic experts will not input ∞ articulations
Taxonomic experts will miss relevant articulations ("mir")
Taxonomic experts could be uncertain of articulations ("possible worlds")
Taxonomic experts could posit logically inconsistent articulations
17. Technical challenges to creating articulations
Input of concept hierarchies
Lack of a server-based platform (e.g. Global Names Architecture)
Lack of user-friendly classification input / visualization tools
Input of articulations (goal: achieve a complete and consistent mapping)
Taxonomic experts will not input ∞ articulations
Taxonomic experts will miss relevant articulations ("mir")
Taxonomic experts could be uncertain of articulations ("possible worlds")
Taxonomic experts could posit logically inconsistent articulations
"CleanTax" is being developed to explore solutions to these challenges. 1
1
There is continuation/overlap with the "Exploring Taxonomic Concepts" project that focuses on character matching (DBI-1147266).
18. CleanTax – technical specifications
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
19. CleanTax – technical specifications
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
CleanTax assesses consistency and completeness of articulations
Output of the set of maximally informative relationships – "mir"
Report , causal explanation, interactive repair of inconsistent articulations
Calculate multiple possible worlds (if ambiguous articulations are present)
20. CleanTax – technical specifications
CleanTax = a set of Python programming scripts stored on bitbucket.org
(initially developed by Dave Thau; now being developed further on many fronts)
CleanTax reads in concept/articulation tables from a PostgreSQL database
CleanTax transforms the input for processing by logic reasoners; including:
Prover9 / Mace4 theorem provers – first-order logic [thorough, yet slow]
OWL / HermiT – description logic , knowledge representation [complex]
DLV System – propositional logic, answer set programming [promising!]
CleanTax assesses consistency and completeness of articulations
Output of the set of maximally informative relationships – "mir"
Report , causal explanation, interactive repair of inconsistent articulations
Calculate multiple possible worlds (if ambiguous articulations are present)
CleanTax creates multiple user-preferred views of the input and merge taxonomies
Reduced Containment Graph – RCG; and Directed Acyclic Graph – DAG
22. 'Training' CleanTax on abstract examples
Input
Output – raw hmtl list of articulations ("look-up" + inferred)
23. 'Training' CleanTax on abstract examples
Input
Output – 72 maximally informative relationships = mir
Based on the mir, all theoretically possible articulations
of the R32 lattice can be logically deduced.
24. Abstract Example 1 – Reduced Contained Graph of the merge
Input
Blue circles
Black circles
shared concepts
unique concepts
Black solid arrows expert input
Grey dashed arrows deducible
Red solid arrows newly inferred
25. More CleanTax training… our infamous Abstract Example 4
Example 4 – representing multiple 'possible worlds'
3/5 articulations
are disjoint (OR)
26. Reduced Containment Graphs of 7 'possible worlds' (combined or's)
Example 4 – CleanTax infers 7 possible worlds (user can view / select / repair / rerun)
Asserted by expert
Implied articulations
Inferred by CleanTax
Shared concepts
Unique concepts
Reduced Containment Graphs (RCGs)
27. Exploring "views" of the merge - circular Euler diagrams of PW1
Table of mir
Corresponding Euler diagram (circular)
Identical
information
content
28. Correspondence of circular and Directed Acyclic Diagrams
PW1: Typical Euler circles
Euler-DAG of PW1
Identical
information
content
30. Real-life examples, I – reconciling two weevil classifications 1
Curculionoidea sec. Kuschel 1995
Curculionoidea sec. Marvaldi & Morrone 2000
Concepts 348-372
Concepts 117-157
1
Initial articulations provided by NMF.
31. Merge taxonomy of Kuschel 1995 / Marvaldi & Morrone 2000
CleanTax RCG – 1 newly inferred articulation (
) + several inconsistencies
Microcerinae sec. M&M 2000 [363] are included in Brachycerinae sec. KU 1995 [148]
(yes, I missed that; Kuschel 1995 only mentions it in the text, not in the main taxon list)
32. Real-life examples, II – reconciling two weevil classifications
Curculionoidea sec. Crowson 1981
Curculionoidea sec. Marvaldi & Morrone 2000
Concepts 348-372
Concepts 1-17
33. Merge taxonomy of Crowson 1981 / Marvaldi & Morrone 2000
CleanTax RCG – 4 newly inferred articulations (
) / does not depict overlap (><)
e.g. {Aglycyderidae [2], Allocorynidae [3], Oxycorynidae [17]} sec. Crowson 1981
are included in Belidae [353] sec. M&M 2000
34. Euler-DAG of the Crowson / Marvaldi & Morrone merge taxonomy
Solid lines – proper inclusion
Black solid line given
Green solid line inferred
Orange solid line explanatory
[Red solid line inconsistent]
Dashed lines - overlap
Black dashed line given
Green dashed line inferred
Orange dashed line explanatory
Red dashed line inconsistent
Concept boxes - concepts
Orange square box shared
Black square box unique
Dashed square box combined
Dashed oval box inconsistent
35. DAGs generate "combined concepts"
Belidae
sec. MM2000
Belidae
sec. Cro1981
intersections of overlaps
"Belidae"
INT(Cro/MM)
Shared - [2,3,17,357]
36. New naming/viewing conventions – simple merges (shared, unique) *
Input
Concept B
A
Attelabidae CR81
AttCR81 [9]
Output
Concept A
B
Attelabidae MM00
AttMM00 [55]
Concept A – Concept B
AB
Attelabidae CR81 – Attelabidae MM00
AttCR81.AttMM00
* Simple extension to three or more congruent concepts.
37. New naming/viewing conventions – combined merges (overlap; T1, T2)
Input
Concept A
Concept B
A
Belidae CR81
BelCR81 [10]
B
Belidae MM00
BelMM00 [353]
Euler
Ab
BelCR81.
belMM00
AB
BelCR81.
BelMM00
A
aB
BelMM00.
belCR81
B
DAG
Ab
AB
aB
38. Input
Concept A
Concept C
A
Curculionidae CR81
CurCR81
T1, T2, T3
Concept B
B
Curculionidae KU95
CurKU95
C
Curculionidae s.s. MM00
CurMM00
Euler
ABc
Abc
aBc
CurCR81.
CurKU95.
curMM00
CURCR81.
curKU95.
curMM00
CurKU95.
curCR81.
curMM00
ABC
AbC
aBC
CurCR81.
CurKU95.
CurMM00
CurCR81.
CurMM00.
curKU95
CurKU95.
CurMM00.
curCR81
abC
CurMM00.
curCR81.
curKU95
DAG
A
Abc
B
ABc
C
aBc
AbC
ABC
aBC
abC
40. Current workflow / "usability" (CleanTax on "Lore" server, UC Davis)
Input script
Possible worlds
Visualization
Euler-DAG
Output file
Inconsistency
Repair, explanation
Interactive
reduction of PWs
(decision tree)
41. Shared, real use cases (Perelleschus) with ETC feature-based project
5 taxonomies, 48 concepts, expert articulations, plus textual feature diagnoses
42. Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
43. Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
The current approach treats concepts as a 'black box' – the input data are simple and
make no reference to type specimens, synapomorphies, diagnostic features, etc.
"Exploring Taxonomic Concepts" project will develop tools for a balanced view.
Nevertheless, the articulations can expose deep and varied semantic links among
succeeding classifications.
44. Conclusions and outlook
Improvements to CleanTax will remove many of the technical challenges towards a
full-blown taxon concept approach ( improved tracking of classification provenance).
Other technical challenges are being addressed (server platform, algorithmic
scalability, intensional/ostensive articulations, visualization [Euler, combined
concepts], workflow integration).
Many non-technical challenges remain (in short: transparent/consistent use).
The current approach treats concepts as a 'black box' – the input data are simple and
make no reference to type specimens, synapomorphies, diagnostic features, etc.
"Exploring Taxonomic Concepts" project will develop tools for a balanced view.
Nevertheless, the articulations can expose deep and varied semantic links among
succeeding classifications.
CleanTax may be the first attempt to 'explain' classification provenance to logic
reasoners. This could have considerable implications for future data integration.
45. Acknowledgments
Shawn Bowers, Dave Thau, Alan Weakley
NSF-IIS 1118088:
"III-SMALL: A logic-based, provenance-aware system for merging scientific data under
context and classification constraints"
"Euler" team, UC Davis