SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
Integra(ng	data	with	phylogenies,	
at	scale	
Nico	Cellinese	
University	of	Florida	
&	
Hilmar	Lapp	
Duke	University
WHAT’S	IN	A	NAME?
What’s	in	a	name?	
Chaos!	
•  Names	and	Concepts	do	not	
reconcile	that	easily	
•  Names	are	text	strings	
•  Context	is	lacking	or	subjec(ve	
•  Meaning	is	not	computable
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
I	don’t	understand	any	of	those	concepts	
whether	in	LaDn	or	English,	but	I	can	sDll	
link	them	to	their	names,	as	in	one	object	
to	one	object
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
…and	200+	
…and	400+
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
From	a	human	perspecDve,	we	lose	track	of	concepts.	Hard	to	reconcile	all	of	them.	We	need	
help!	Can	we	compute	them?	
Idiosyncratic Russian dolls syndrome
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
…and	200+	
…and	400+
•  We	can	uncluNer	concepts,	and	thereby	
nomenclature	
•  How	do	we	navigate	along	the	Tree	of	Life	
repurposing	Linnean	names,	which	are	
linked	to	tradi(onal	concepts?
Dark	taxa!
Dark	taxa!	
How	do	we	integrate	data	with	this	tree?
Tree-thinking	
Common	descent	Ă evoluDon	at	the	center	of	taxonomy	
B	 C	 D	
Branches	
Synapomorphies	
A	
Clades	=	taxa	
Discovery
Tree-thinking	
Common	descent	Ă evoluDon	at	the	center	of	taxonomy	
Discovery	
CommunicaDon	How??	
0147
Density
0.07
0.22
0.72
Diversification rate
Tree-thinking	
Berberidopsidaceae	
Opiliones	
Zingiberaceae	
Hamamelidaceae	
Sarcolaenaceae	
Lingulidae	
Hymenoptera	
Mammalia	
Apocynaceae	
Galliformes	
Rubiaceae 	
Anarthriaceae	
Lineidae	
Crocodylidae	
Stylosiphonia
Andrenidae Cracidae
Gavialis
Globba
Micrella
Rhodoleia
Phalangiidae Tachyglossa
Lyginia
Mediusella
Chamaeclitandra
Tree-thinking	
Berberidopsidaceae	
Opiliones	
Zingiberaceae	
Hamamelidaceae	
Sarcolaenaceae	
Lingulidae	
Hymenoptera	
Mammalia	
Apocynaceae	
Galliformes	
Rubiaceae 	
Anarthriaceae	
Lineidae	
Crocodylidae	
Stylosiphonia
Andrenidae Cracidae
Gavialis
Globba
Micrella
Rhodoleia
Phalangiidae Tachyglossa
Lyginia
Mediusella
Chamaeclitandra
These	names	are	not	generated	in	an	evoluDonary-based	framework	
(Groups	dened	by	character	similarity	vs.	common	descent)
Both	the	Encyclopedia	of	Life	(EOL)	and	the	Open	Tree	of	Life	suggest	that	
Campanuloideae	is	a	misspelling	of	Campaniloidea	(marine	gastropods!)		
GBIF	does	not	currently	have	Campanuloideae	in	its	backbone	taxonomy.
Are	you	kidding	me?	
These	are	the	Campanuloideae!	
Wang	et	al.	2014
Life	as	a	street	map	How	to	navigate	life	as	a	machine
Mapping	data	to	phylogene(c	
knowledge	space
Street	signs	serve	people,	not	machines
•  How	do	we	build	a	reliable	GPS	for	phylogenies?	
•  How	do	we	reproducibly	find	the	right	nodes?	
	
Mapping	data	to	phylogene(c	
knowledge	space
FEED
Textual Definition –
The hyoglossus is a muscle that attaches to
the hyoid and tongue and is innervated by
Cranial Nerve XII.
Computable Definition –
('attached to' some 'hyoid bone')
and ('attached to' some tongue)
and ('innervated by' some 'hypoglossal
nerve') and
spatially disjoint with 'intrinsic tongue
muscle'
Druzinsky	et	al	(2015):	Logic	deniDons	of	mammalian	
feeding	muscles	by	means	of	necessary	and	sucient	
condiDons	true	for	all	mammals	
Nomenclature	≠	Seman(cs
Phyloreference	
=	
Logic	deni(on	of	a	clade,	
using	the	property	common	to	
all	of	life
Phyloreferences	
Statements	formally	expressing	the	paaerns	we	discover	
(analogous	to	map	coordinates)	
	
Node-Based Branch-Based Apomorphy-Based
A B C A B C A B C
X
The	clade	originaDng	
with	the	last	common	
ancestor	of	B	and	C.	
The	clade	originaDng	
with	the	rst	ancestor	of	
B	that	is		not	an	
ancestor	of	A.	
The	clade	originaDng	
with	the	rst	ancestor	
of	C	to	evolve	X.
Phyloreferences	yield	a	
coordinate	system	for	the	Tree	of	Life	
•  Any	node,	branch,	subtree	is	referenceable	
•  References	are	unambiguous	
•  References	are	computable	
•  References	are	portable	
•  Adapts	to	new	and	changing	knowledge
Many	needed	technologies	already	exist	
•  OWL	ontologies	designed	
for	
–  PhylogeneDc	knowledge:	
CDAO	
–  Phenotypic	knowledge:	
Uberon,	PATO,	…	
–  Efficient	and	expressive	
reasoners:	FaCT++,	HermiT,	
Racer,	ELK
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_1889_to_1980	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Crysanthemum
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_1980	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Lobelia
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_aier_1995	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Sphenoclea
Phyloreferences	as	ontological	expressions	
Phyloreference	expressions	
can	be:		
•  Easily	generated	by	
anyone	
•  Can	work	on	any	tree	
•  Named	and	registered	
– To	promote	reuse	and	
consistency	
– To	improve	usability	
and	accessibility	
Class:	Campanulaceae	
Annota(ons:	
				rdfs:label	“Campanulaceae_aier_1995”	
				dc:descripDon	“the	clade	that	includes	
Campanula	laDfolia	but	not	Sphenoclea”	
EquivalentTo:		
cdao:has_Descendant	value	
taxon:Campanula_laDfolia	and	
phyloref:excludes_lineage	value	taxon:Sphenoclea	
Class:	AGF4-SHRU-3560	
EquivalentTo:		
	cdao:has_Descendant	value	
taxon:Campanula_laDfolia	and	
phyloref:excludes_lineage	value	taxon:Sphenoclea	
vs.
Challenges	
•  OWL-based	data	model	to	saDsfy	phylogeneDc	
taxonomy,	reasoning	expressivity,	scalability	
•  ConvenDons	for	data	transformaDon,	and	
consequences	of	different	choices	
•  Least	common	ancestor	reasoning	for	OWL	
data	
•  Lack	of	canonical	specimen	idenDfier	system	
•  Specifier	mapping	ontologies
Tree	of	Life,	ontologized:	
A	universal	coordinate	system	
•  The	Tree	of	Life	is	itself	an	aggregaDon	and	
integraDon	of	our	phylogeneDc	knowledge.	
•  Phyloreferencing	is	addressing	into	a	knowledge	
universe.	
•  Ontologies,	reasoning,	and	other	KR	techniques	
are	powerful	tools	for	this.
Acknowledgements	
•  NaDonal	Science	FoundaDon	(DBI-1458484)	
•  Ken	and	Linda	McGurn	
•  Phenoscape	
•  EvoIO

Weitere ähnliche Inhalte

Mehr von Hilmar Lapp

The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
Hilmar Lapp
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Hilmar Lapp
 
Lapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing SymposiumLapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing Symposium
Hilmar Lapp
 

Mehr von Hilmar Lapp (14)

Open Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some IntrospectionOpen Bioinformatics Foundation: 2014 Update & Some Introspection
Open Bioinformatics Foundation: 2014 Update & Some Introspection
 
Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014 Reproducible Science - Panel at iEvoBio 2014
Reproducible Science - Panel at iEvoBio 2014
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...
 
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 
Bringing reason to phenotype diversity, character change, and common descent
Bringing reason to phenotype diversity, character change, and common descentBringing reason to phenotype diversity, character change, and common descent
Bringing reason to phenotype diversity, character change, and common descent
 
Phyloinformatics VoCamp
Phyloinformatics VoCampPhyloinformatics VoCamp
Phyloinformatics VoCamp
 
Reasoning over phenotype diversity, character change, and evolutionary descent
Reasoning over phenotype diversity, character change, and evolutionary descentReasoning over phenotype diversity, character change, and evolutionary descent
Reasoning over phenotype diversity, character change, and evolutionary descent
 
Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?Open science, open-source, and open data: Collaboration as an emergent property?
Open science, open-source, and open data: Collaboration as an emergent property?
 
Liberating Our Beautiful Trees: A Call to Arms.
Liberating Our Beautiful Trees: A Call to Arms.Liberating Our Beautiful Trees: A Call to Arms.
Liberating Our Beautiful Trees: A Call to Arms.
 
OBF Address at BOSC 2012
OBF Address at BOSC 2012OBF Address at BOSC 2012
OBF Address at BOSC 2012
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Lapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing SymposiumLapp, ISCB Software Sharing Symposium
Lapp, ISCB Software Sharing Symposium
 

KĂźrzlich hochgeladen

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
SĂŠrgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 

KĂźrzlich hochgeladen (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 

Integrating data with phylogenies, at scale