SlideShare ist ein Scribd-Unternehmen logo
1 von 36
The structure of insect—plant host data
as derived from museum collections:
An analysis based on data from the
NSF-funded Tritrophic Database —
Thematic Collections Network
(TTD-TCN)
Randall T. Schuh
Katja Seltmann
Christine A. Johnson
American Museum of Natural History
TTD-TCN Rationale
“The data captured via ADBC funding will
dramatically improve our understanding of the
relationships among the more than 11,000
species of North American Hemiptera (scale
insects, aphids, leafhoppers, true bugs, and
relatives), their food plants, and the wasps that
parasitize the hemipterans.”
The data we will evaluate today were captured
through a Web-based application developed with
NSF Planetary Biodiversity Inventory funding and
used by the TTD-TCN. This software application,
known as Arthropod Easy Capture (AEC), is built in
open-source code, is being implemented as an
appliance by the ADBC-funded Home Uniting Biocollections (HUB, iDigBio), and through that
implementation will be able to be installed with a
“one-click” installation application. Server code is online at Source Forge:
http://sourceforge.net/projects/arthropodeasy/
Specimen Count by Project
(1,144,240)
Sources of Insect—Plant Host Data
Data on insect-plant relationships is available
primarily from labels on insect specimens—as
opposed to labels on plant specimens.
Substantial amounts of data were captured for
the family Miridae on a world basis under NSF
Planetary Biodiversity Inventory funding
between 2003—2011.
The TTD-TCN is a collaboration among 17 US
entomological institutions. The institutional
contributions from these two projects, as
represented by numbers of specimen records,
are seen in the following graph.
The TTD-TCN is defining the field structure for
host data as used by the iDigBio and for other
Web-aggregators such as DiscoverLife.org.
Choice of Groups for Analysis
In order to evaluate the nature of insect-host plant data
derived from collections, we need to look at groups
that offer large data sets. Necessary attributes are:
1.Large numbers of specimen records with host
information
2.Large numbers of collecting events
3.Substantial diversity of host taxa
At the present time the following taxa in our database
meet those criteria:
Hemiptera
Sternorrhyncha
Aphididae (4400 species worldwide)
Auchenorrhyncha
Membracidae (3200 species worldwide)
Heteroptera
Miridae (11,000 species worldwide)
Raw data for each taxon are distributed as seen in
the following four graphs.
Collection Events
Miridae

Aphididae

Membracidae

Combined data

Year Specimens Collected
Host Records as a Proportion of Collecting Events

Hosts unique
Hosts non-unique

Without hosts
aa
aa
aa
aa
aa
aa
aa
a

Aphididae

Miridae
Miridae

Aphididae

Membracidae

Membracidae
Algorithmic Assessment of
Data Quality
COLLECTING EVEN DATA:
The occurrence of an insect
species on a plant genus

ANALYSIS: evaluate insect/plant
ANALYSIS: evaluate insect/plant
associations with different scores
associations with different scores

Modify algorithm to improve fit
of model to data based on results

Compute frequency
of occurrence on a
particular plant genus

Compare with all insect
collecting events on any plant

Scores: High, Medium, or Low
confidence in insect--plant
association
HEURISTIC DATA:
Larvae present?
Multiple specimens?
Voucher specimen available?
f(y) ≥ 15.00%
y≥5

f(y) ≥ 2.00%
y≥3
∨
f(y) ≥ 15.00%
y≥2

)
n
m
p
#
s
h
u
,
e
v
r
:
a
c
g
l
o
i
b
(

x=y′ +y

c
t
s
i
r
u
e
H

not high or medium

v
g
l
o
n
m
i
c
e
p
s
:
t
a
D

x=1

Analysis
Results of Analyses
Using Larrea (creosote bush) as a
example host
Miridae/Larrea Association Network
Miridae/Larrea Association Network with High Confidence
Reasons for Low Host Scores and
Methods for Improving Data Quality
Reasons for Low Scores
1. Actual low host specificity: Indicated when a large number of
collecting events are distributed across many plant taxa.
Reasons for Low Scores
1. Actual low host specificity: Indicated when a large number of
collecting events are distributed across many plant taxa.
2. Movement of adult specimens to alternative food sources:
Algorithm points out apparent vagility when there are multiple
hosts and little or no host repetition across collecting events.
Reasons for Low Scores
1. Actual low host specificity: Indicated when a large number of
collecting events are distributed across many plant taxa.
2. Movement of adult specimens to alternative food sources:
Algorithm points out apparent vagility when there are multiple
hosts and little or no host repetition across collecting events.
3. Commingling of specimens in the field: Algorithm points out
problem when insect specimen numbers are low for a host
taxon and when there is lack of repetition of host occurrence.
Reasons for Low Scores
1. Actual low host specificity: Indicated when a large number of
collecting events are distributed across many plant taxa.
2. Movement of adult specimens to alternative food sources:
Algorithm points out apparent vagility when there are multiple
hosts and little or no host repetition across collecting events.
3. Commingling of specimens in the field: Algorithm points out
problem when insect specimen numbers are low for a host
taxon and when there is lack of repetition of host occurrence.
4. Mislabeling of insects for hosts from a collecting event: Difficult
to distinguish from actual polyphagy in cases where all
specimens from an event are mislabeled. Often seen as a
unique host for a given insect taxon. More fieldwork needed.
Reasons for Low Scores

1. Actual low host specificity: Indicated when a large number of
collecting events are distributed across many plant taxa.
2. Movement of adult specimens to alternative food sources:
Algorithm points out apparent vagility when there are multiple
hosts and little or no host repetition across collecting events.
3. Commingling of specimens in the field: Algorithm points out
problem when insect specimen numbers are low for a host
taxon and when there is lack of repetition of host occurrence.
4. Mislabeling of insects for hosts from a collecting event: Difficult
to distinguish from actual polyphagy in cases where all
specimens from an event are mislabeled. Often seen as a
unique host for a given insect taxon. More fieldwork needed.
5. Single collecting events: Indistinguishable from absolute host
fidelity based on multiple events, except no confidence limit can
be assessed. Heuristics such as presence of larvae and large
numbers of specimens give credence to presumed association.
Resolved only by further fieldwork.
Implication of Results
Conclusions
1. Insect collections offer substantial data on host
relationships even though a majority of the specimens
lack such information.
2. Our algorithm demonstrates a method for assessing
data quality on a large scale. Our initial analyses show
that:
-

We can have confidence in a significant proportion
of the available information
The data demonstrate a substantial degree of host
specificity in our three target groups.

3. Degree of host specificity requires a scoring method
that takes into account biological attributes, collecting
techniques, and approaches to data capture in the field.
Acknowledgments
•Participating TCN and PBI Institutions
•iDigBio
•AMNH Database Data-entry Personnel
•Participating TCN Data-entry Personnel
•Michael D. Schwartz
•National Science Foundation

Weitere ähnliche Inhalte

Was ist angesagt?

Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...ExternalEvents
 
Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformaticsebiquity
 
Added Value of Open data sharing using examples from GenomeTrakr
Added Value of Open data sharing using examples from GenomeTrakrAdded Value of Open data sharing using examples from GenomeTrakr
Added Value of Open data sharing using examples from GenomeTrakrExternalEvents
 
Aspergillosis Patient Support Meeting March 2011 - Jane Mabey Gilsenan
Aspergillosis Patient Support Meeting March 2011 - Jane Mabey GilsenanAspergillosis Patient Support Meeting March 2011 - Jane Mabey Gilsenan
Aspergillosis Patient Support Meeting March 2011 - Jane Mabey GilsenanGraham Atherton
 
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...ExternalEvents
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesSurya Saha
 
Grassbase: the data volume challenge
Grassbase: the data volume challengeGrassbase: the data volume challenge
Grassbase: the data volume challengeeMonocot
 
Project Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant BreedingProject Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant BreedingPhenome Networks
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)Andrew Su
 
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...ExternalEvents
 
Dr. Joel Nerem - Science and Practice - How does the Science of Antibiotic Re...
Dr. Joel Nerem - Science and Practice - How does the Science of Antibiotic Re...Dr. Joel Nerem - Science and Practice - How does the Science of Antibiotic Re...
Dr. Joel Nerem - Science and Practice - How does the Science of Antibiotic Re...John Blue
 
High-throughput sequencing and latent variable modelling of within-host paras...
High-throughput sequencing and latent variable modelling of within-host paras...High-throughput sequencing and latent variable modelling of within-host paras...
High-throughput sequencing and latent variable modelling of within-host paras...Tuomas Aivelo
 
EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...
EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...
EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...EukRef
 
Comparing Typing Methods : Do's and Don't's
Comparing Typing Methods : Do's and Don't'sComparing Typing Methods : Do's and Don't's
Comparing Typing Methods : Do's and Don't'sJoão André Carriço
 
NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019Chirag Patel
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
 
Electron and light micrographs
Electron and light micrographsElectron and light micrographs
Electron and light micrographsLyndsae Drury
 

Was ist angesagt? (20)

Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
 
Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
 
Added Value of Open data sharing using examples from GenomeTrakr
Added Value of Open data sharing using examples from GenomeTrakrAdded Value of Open data sharing using examples from GenomeTrakr
Added Value of Open data sharing using examples from GenomeTrakr
 
Aspergillosis Patient Support Meeting March 2011 - Jane Mabey Gilsenan
Aspergillosis Patient Support Meeting March 2011 - Jane Mabey GilsenanAspergillosis Patient Support Meeting March 2011 - Jane Mabey Gilsenan
Aspergillosis Patient Support Meeting March 2011 - Jane Mabey Gilsenan
 
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
 
Grassbase: the data volume challenge
Grassbase: the data volume challengeGrassbase: the data volume challenge
Grassbase: the data volume challenge
 
Project Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant BreedingProject Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant Breeding
 
bioinformatics
bioinformaticsbioinformatics
bioinformatics
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
Application of Whole Genome Sequencing in the infectious disease’ in vitro di...
 
Dr. Joel Nerem - Science and Practice - How does the Science of Antibiotic Re...
Dr. Joel Nerem - Science and Practice - How does the Science of Antibiotic Re...Dr. Joel Nerem - Science and Practice - How does the Science of Antibiotic Re...
Dr. Joel Nerem - Science and Practice - How does the Science of Antibiotic Re...
 
High-throughput sequencing and latent variable modelling of within-host paras...
High-throughput sequencing and latent variable modelling of within-host paras...High-throughput sequencing and latent variable modelling of within-host paras...
High-throughput sequencing and latent variable modelling of within-host paras...
 
EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...
EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...
EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...
 
Bioinfo
BioinfoBioinfo
Bioinfo
 
Comparing Typing Methods : Do's and Don't's
Comparing Typing Methods : Do's and Don't'sComparing Typing Methods : Do's and Don't's
Comparing Typing Methods : Do's and Don't's
 
PAMCA
PAMCA PAMCA
PAMCA
 
NCI systems epidemiology 03012019
NCI systems epidemiology 03012019NCI systems epidemiology 03012019
NCI systems epidemiology 03012019
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
 
Electron and light micrographs
Electron and light micrographsElectron and light micrographs
Electron and light micrographs
 

Andere mochten auch

Solis ecn2013 usfws
Solis ecn2013 usfwsSolis ecn2013 usfws
Solis ecn2013 usfwsECNOfficer
 
Boston University Alumni Association Webinar
Boston University Alumni Association WebinarBoston University Alumni Association Webinar
Boston University Alumni Association WebinarDan Green
 
Paul hu bupdate_i_digbio_ecn_2012
Paul hu bupdate_i_digbio_ecn_2012Paul hu bupdate_i_digbio_ecn_2012
Paul hu bupdate_i_digbio_ecn_2012ECNOfficer
 
Hoekman ecn 2012
Hoekman ecn 2012Hoekman ecn 2012
Hoekman ecn 2012ECNOfficer
 
Barclay ecn 2012
Barclay ecn 2012Barclay ecn 2012
Barclay ecn 2012ECNOfficer
 
Gil ecn2013 ppt
Gil ecn2013 pptGil ecn2013 ppt
Gil ecn2013 pptECNOfficer
 
Dm smith ecn2013
Dm smith ecn2013Dm smith ecn2013
Dm smith ecn2013ECNOfficer
 
Deans mikó ecn2013
Deans mikó ecn2013Deans mikó ecn2013
Deans mikó ecn2013ECNOfficer
 
Jones ecn2013 the_goodbadugly conabio
Jones ecn2013 the_goodbadugly conabioJones ecn2013 the_goodbadugly conabio
Jones ecn2013 the_goodbadugly conabioECNOfficer
 
Dombroskie ecn2013
Dombroskie ecn2013Dombroskie ecn2013
Dombroskie ecn2013ECNOfficer
 
Principales estrofas
Principales estrofasPrincipales estrofas
Principales estrofasreglisanchez
 

Andere mochten auch (13)

PAFO 09
PAFO 09PAFO 09
PAFO 09
 
Solis ecn2013 usfws
Solis ecn2013 usfwsSolis ecn2013 usfws
Solis ecn2013 usfws
 
Boston University Alumni Association Webinar
Boston University Alumni Association WebinarBoston University Alumni Association Webinar
Boston University Alumni Association Webinar
 
Paul hu bupdate_i_digbio_ecn_2012
Paul hu bupdate_i_digbio_ecn_2012Paul hu bupdate_i_digbio_ecn_2012
Paul hu bupdate_i_digbio_ecn_2012
 
Hoekman ecn 2012
Hoekman ecn 2012Hoekman ecn 2012
Hoekman ecn 2012
 
Barclay ecn 2012
Barclay ecn 2012Barclay ecn 2012
Barclay ecn 2012
 
Gil ecn2013 ppt
Gil ecn2013 pptGil ecn2013 ppt
Gil ecn2013 ppt
 
Dm smith ecn2013
Dm smith ecn2013Dm smith ecn2013
Dm smith ecn2013
 
Deans mikó ecn2013
Deans mikó ecn2013Deans mikó ecn2013
Deans mikó ecn2013
 
Jones ecn2013 the_goodbadugly conabio
Jones ecn2013 the_goodbadugly conabioJones ecn2013 the_goodbadugly conabio
Jones ecn2013 the_goodbadugly conabio
 
Dombroskie ecn2013
Dombroskie ecn2013Dombroskie ecn2013
Dombroskie ecn2013
 
Escucha Activa
Escucha ActivaEscucha Activa
Escucha Activa
 
Principales estrofas
Principales estrofasPrincipales estrofas
Principales estrofas
 

Ähnlich wie Schuh ecn2013 tcn_data_structure

Programmatic detection of spatial behaviour in an agent-based model
Programmatic detection of spatial behaviour in an agent-based modelProgrammatic detection of spatial behaviour in an agent-based model
Programmatic detection of spatial behaviour in an agent-based modelITIIIndustries
 
Evolution of plant animal interaction via receiver bias
Evolution of plant animal interaction via receiver biasEvolution of plant animal interaction via receiver bias
Evolution of plant animal interaction via receiver biasUASB and ICAR IIHR
 
The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population VariabilityJoaquin Dopazo
 
Cobb, Seltmann, Franz. 2014. The Current State of Arthropod Biodiversity Data...
Cobb, Seltmann, Franz. 2014. The Current State of Arthropod Biodiversity Data...Cobb, Seltmann, Franz. 2014. The Current State of Arthropod Biodiversity Data...
Cobb, Seltmann, Franz. 2014. The Current State of Arthropod Biodiversity Data...taxonbytes
 
Unlocking the Legacy: The untapped value of data in taxonomic literature #pibmei
Unlocking the Legacy: The untapped value of data in taxonomic literature #pibmeiUnlocking the Legacy: The untapped value of data in taxonomic literature #pibmei
Unlocking the Legacy: The untapped value of data in taxonomic literature #pibmeimillerjeremya
 
A survey on the ectoparasites and haemoparasites of
A survey on the ectoparasites and haemoparasites ofA survey on the ectoparasites and haemoparasites of
A survey on the ectoparasites and haemoparasites ofAlexander Decker
 
4 ecology of parasites part 1
4 ecology of parasites part 14 ecology of parasites part 1
4 ecology of parasites part 1Irwan Izzauddin
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
 
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...ijscmcj
 
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...ijscmcj
 
The butterfly particle swarm optimization technique and its variables
The butterfly particle swarm optimization technique and its variablesThe butterfly particle swarm optimization technique and its variables
The butterfly particle swarm optimization technique and its variablesijscmcj
 
COLONIZATION OF NATURAL ENEMIES
COLONIZATION OF NATURAL ENEMIESCOLONIZATION OF NATURAL ENEMIES
COLONIZATION OF NATURAL ENEMIESRAKESH KUMAR MEENA
 
Studies on Prevalence of Ixodid Ticks Infesting Cattle and Their Control by P...
Studies on Prevalence of Ixodid Ticks Infesting Cattle and Their Control by P...Studies on Prevalence of Ixodid Ticks Infesting Cattle and Their Control by P...
Studies on Prevalence of Ixodid Ticks Infesting Cattle and Their Control by P...iosrjce
 

Ähnlich wie Schuh ecn2013 tcn_data_structure (20)

Programmatic detection of spatial behaviour in an agent-based model
Programmatic detection of spatial behaviour in an agent-based modelProgrammatic detection of spatial behaviour in an agent-based model
Programmatic detection of spatial behaviour in an agent-based model
 
bierapp_Aug2015
bierapp_Aug2015bierapp_Aug2015
bierapp_Aug2015
 
Evolution of plant animal interaction via receiver bias
Evolution of plant animal interaction via receiver biasEvolution of plant animal interaction via receiver bias
Evolution of plant animal interaction via receiver bias
 
Poster_BTJ_Final
Poster_BTJ_FinalPoster_BTJ_Final
Poster_BTJ_Final
 
The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population Variability
 
Cobb, Seltmann, Franz. 2014. The Current State of Arthropod Biodiversity Data...
Cobb, Seltmann, Franz. 2014. The Current State of Arthropod Biodiversity Data...Cobb, Seltmann, Franz. 2014. The Current State of Arthropod Biodiversity Data...
Cobb, Seltmann, Franz. 2014. The Current State of Arthropod Biodiversity Data...
 
Unlocking the Legacy: The untapped value of data in taxonomic literature #pibmei
Unlocking the Legacy: The untapped value of data in taxonomic literature #pibmeiUnlocking the Legacy: The untapped value of data in taxonomic literature #pibmei
Unlocking the Legacy: The untapped value of data in taxonomic literature #pibmei
 
01 pgr data base management
01 pgr data base management01 pgr data base management
01 pgr data base management
 
GERMPLASM DATABASE.ppt
GERMPLASM DATABASE.pptGERMPLASM DATABASE.ppt
GERMPLASM DATABASE.ppt
 
A survey on the ectoparasites and haemoparasites of
A survey on the ectoparasites and haemoparasites ofA survey on the ectoparasites and haemoparasites of
A survey on the ectoparasites and haemoparasites of
 
Dr Sarah Adamowicz - Ecological studies
Dr Sarah Adamowicz - Ecological studiesDr Sarah Adamowicz - Ecological studies
Dr Sarah Adamowicz - Ecological studies
 
insects-12-00314.pdf
insects-12-00314.pdfinsects-12-00314.pdf
insects-12-00314.pdf
 
4 ecology of parasites part 1
4 ecology of parasites part 14 ecology of parasites part 1
4 ecology of parasites part 1
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
Andy J Gap analysis and crop wild relatives bellagio sept 2010
Andy J Gap analysis and crop wild relatives bellagio sept 2010Andy J Gap analysis and crop wild relatives bellagio sept 2010
Andy J Gap analysis and crop wild relatives bellagio sept 2010
 
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
 
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
THE BUTTERFLY-PARTICLE SWARM OPTIMIZATION (BUTTERFLY-PSO/BF-PSO) TECHNIQUE AN...
 
The butterfly particle swarm optimization technique and its variables
The butterfly particle swarm optimization technique and its variablesThe butterfly particle swarm optimization technique and its variables
The butterfly particle swarm optimization technique and its variables
 
COLONIZATION OF NATURAL ENEMIES
COLONIZATION OF NATURAL ENEMIESCOLONIZATION OF NATURAL ENEMIES
COLONIZATION OF NATURAL ENEMIES
 
Studies on Prevalence of Ixodid Ticks Infesting Cattle and Their Control by P...
Studies on Prevalence of Ixodid Ticks Infesting Cattle and Their Control by P...Studies on Prevalence of Ixodid Ticks Infesting Cattle and Their Control by P...
Studies on Prevalence of Ixodid Ticks Infesting Cattle and Their Control by P...
 

Mehr von ECNOfficer

Price2 ecn2013
Price2 ecn2013Price2 ecn2013
Price2 ecn2013ECNOfficer
 
Sikes ecn2013 dn_ab
Sikes ecn2013 dn_abSikes ecn2013 dn_ab
Sikes ecn2013 dn_abECNOfficer
 
Janzen ecn2013
Janzen ecn2013Janzen ecn2013
Janzen ecn2013ECNOfficer
 
Nearns ecn2013
Nearns ecn2013Nearns ecn2013
Nearns ecn2013ECNOfficer
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013ECNOfficer
 
Giddens ecn2013
Giddens ecn2013Giddens ecn2013
Giddens ecn2013ECNOfficer
 
Rubinoff ecn2013 uhim
Rubinoff ecn2013 uhimRubinoff ecn2013 uhim
Rubinoff ecn2013 uhimECNOfficer
 
Mc alister ecn2013
Mc alister ecn2013Mc alister ecn2013
Mc alister ecn2013ECNOfficer
 
Dmitriev ecn2013
Dmitriev ecn2013Dmitriev ecn2013
Dmitriev ecn2013ECNOfficer
 
Oboyski ecn2013
Oboyski ecn2013Oboyski ecn2013
Oboyski ecn2013ECNOfficer
 
Thomas ecn2013
Thomas ecn2013Thomas ecn2013
Thomas ecn2013ECNOfficer
 
Austin ecn2013
Austin ecn2013Austin ecn2013
Austin ecn2013ECNOfficer
 
Yu ecn2013 cnc_databasing
Yu ecn2013 cnc_databasingYu ecn2013 cnc_databasing
Yu ecn2013 cnc_databasingECNOfficer
 
Abrahamson ecn2013 evaluating_naturalhistorycollectionuse
Abrahamson ecn2013 evaluating_naturalhistorycollectionuseAbrahamson ecn2013 evaluating_naturalhistorycollectionuse
Abrahamson ecn2013 evaluating_naturalhistorycollectionuseECNOfficer
 
Furth ecn 2013
Furth ecn 2013Furth ecn 2013
Furth ecn 2013ECNOfficer
 
Thayer ecn2013 renovation
Thayer ecn2013 renovationThayer ecn2013 renovation
Thayer ecn2013 renovationECNOfficer
 
Menard ecn 2012
Menard ecn 2012Menard ecn 2012
Menard ecn 2012ECNOfficer
 
Mc alister ecn_2012
Mc alister ecn_2012Mc alister ecn_2012
Mc alister ecn_2012ECNOfficer
 

Mehr von ECNOfficer (20)

Price2 ecn2013
Price2 ecn2013Price2 ecn2013
Price2 ecn2013
 
Sikes ecn2013 dn_ab
Sikes ecn2013 dn_abSikes ecn2013 dn_ab
Sikes ecn2013 dn_ab
 
Ryder ecn2013
Ryder ecn2013Ryder ecn2013
Ryder ecn2013
 
Janzen ecn2013
Janzen ecn2013Janzen ecn2013
Janzen ecn2013
 
Nearns ecn2013
Nearns ecn2013Nearns ecn2013
Nearns ecn2013
 
Krell ecn2013
Krell ecn2013Krell ecn2013
Krell ecn2013
 
D paul ecn2013
D paul ecn2013D paul ecn2013
D paul ecn2013
 
Giddens ecn2013
Giddens ecn2013Giddens ecn2013
Giddens ecn2013
 
Rubinoff ecn2013 uhim
Rubinoff ecn2013 uhimRubinoff ecn2013 uhim
Rubinoff ecn2013 uhim
 
Mc alister ecn2013
Mc alister ecn2013Mc alister ecn2013
Mc alister ecn2013
 
Dmitriev ecn2013
Dmitriev ecn2013Dmitriev ecn2013
Dmitriev ecn2013
 
Oboyski ecn2013
Oboyski ecn2013Oboyski ecn2013
Oboyski ecn2013
 
Thomas ecn2013
Thomas ecn2013Thomas ecn2013
Thomas ecn2013
 
Austin ecn2013
Austin ecn2013Austin ecn2013
Austin ecn2013
 
Yu ecn2013 cnc_databasing
Yu ecn2013 cnc_databasingYu ecn2013 cnc_databasing
Yu ecn2013 cnc_databasing
 
Abrahamson ecn2013 evaluating_naturalhistorycollectionuse
Abrahamson ecn2013 evaluating_naturalhistorycollectionuseAbrahamson ecn2013 evaluating_naturalhistorycollectionuse
Abrahamson ecn2013 evaluating_naturalhistorycollectionuse
 
Furth ecn 2013
Furth ecn 2013Furth ecn 2013
Furth ecn 2013
 
Thayer ecn2013 renovation
Thayer ecn2013 renovationThayer ecn2013 renovation
Thayer ecn2013 renovation
 
Menard ecn 2012
Menard ecn 2012Menard ecn 2012
Menard ecn 2012
 
Mc alister ecn_2012
Mc alister ecn_2012Mc alister ecn_2012
Mc alister ecn_2012
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Schuh ecn2013 tcn_data_structure

  • 1. The structure of insect—plant host data as derived from museum collections: An analysis based on data from the NSF-funded Tritrophic Database — Thematic Collections Network (TTD-TCN) Randall T. Schuh Katja Seltmann Christine A. Johnson American Museum of Natural History
  • 2. TTD-TCN Rationale “The data captured via ADBC funding will dramatically improve our understanding of the relationships among the more than 11,000 species of North American Hemiptera (scale insects, aphids, leafhoppers, true bugs, and relatives), their food plants, and the wasps that parasitize the hemipterans.”
  • 3. The data we will evaluate today were captured through a Web-based application developed with NSF Planetary Biodiversity Inventory funding and used by the TTD-TCN. This software application, known as Arthropod Easy Capture (AEC), is built in open-source code, is being implemented as an appliance by the ADBC-funded Home Uniting Biocollections (HUB, iDigBio), and through that implementation will be able to be installed with a “one-click” installation application. Server code is online at Source Forge: http://sourceforge.net/projects/arthropodeasy/
  • 4. Specimen Count by Project (1,144,240)
  • 6. Data on insect-plant relationships is available primarily from labels on insect specimens—as opposed to labels on plant specimens. Substantial amounts of data were captured for the family Miridae on a world basis under NSF Planetary Biodiversity Inventory funding between 2003—2011. The TTD-TCN is a collaboration among 17 US entomological institutions. The institutional contributions from these two projects, as represented by numbers of specimen records, are seen in the following graph. The TTD-TCN is defining the field structure for host data as used by the iDigBio and for other Web-aggregators such as DiscoverLife.org.
  • 7.
  • 8. Choice of Groups for Analysis
  • 9. In order to evaluate the nature of insect-host plant data derived from collections, we need to look at groups that offer large data sets. Necessary attributes are: 1.Large numbers of specimen records with host information 2.Large numbers of collecting events 3.Substantial diversity of host taxa At the present time the following taxa in our database meet those criteria:
  • 10. Hemiptera Sternorrhyncha Aphididae (4400 species worldwide) Auchenorrhyncha Membracidae (3200 species worldwide) Heteroptera Miridae (11,000 species worldwide) Raw data for each taxon are distributed as seen in the following four graphs.
  • 12. Host Records as a Proportion of Collecting Events Hosts unique Hosts non-unique Without hosts
  • 14.
  • 15.
  • 16.
  • 18. COLLECTING EVEN DATA: The occurrence of an insect species on a plant genus ANALYSIS: evaluate insect/plant ANALYSIS: evaluate insect/plant associations with different scores associations with different scores Modify algorithm to improve fit of model to data based on results Compute frequency of occurrence on a particular plant genus Compare with all insect collecting events on any plant Scores: High, Medium, or Low confidence in insect--plant association HEURISTIC DATA: Larvae present? Multiple specimens? Voucher specimen available?
  • 19. f(y) ≥ 15.00% y≥5 f(y) ≥ 2.00% y≥3 ∨ f(y) ≥ 15.00% y≥2 ) n m p # s h u , e v r : a c g l o i b ( x=y′ +y c t s i r u e H not high or medium v g l o n m i c e p s : t a D x=1 Analysis
  • 21.
  • 22. Using Larrea (creosote bush) as a example host
  • 23.
  • 25. Miridae/Larrea Association Network with High Confidence
  • 26. Reasons for Low Host Scores and Methods for Improving Data Quality
  • 27. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa.
  • 28. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events.
  • 29. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence.
  • 30. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence. 4. Mislabeling of insects for hosts from a collecting event: Difficult to distinguish from actual polyphagy in cases where all specimens from an event are mislabeled. Often seen as a unique host for a given insect taxon. More fieldwork needed.
  • 31. Reasons for Low Scores 1. Actual low host specificity: Indicated when a large number of collecting events are distributed across many plant taxa. 2. Movement of adult specimens to alternative food sources: Algorithm points out apparent vagility when there are multiple hosts and little or no host repetition across collecting events. 3. Commingling of specimens in the field: Algorithm points out problem when insect specimen numbers are low for a host taxon and when there is lack of repetition of host occurrence. 4. Mislabeling of insects for hosts from a collecting event: Difficult to distinguish from actual polyphagy in cases where all specimens from an event are mislabeled. Often seen as a unique host for a given insect taxon. More fieldwork needed. 5. Single collecting events: Indistinguishable from absolute host fidelity based on multiple events, except no confidence limit can be assessed. Heuristics such as presence of larvae and large numbers of specimens give credence to presumed association. Resolved only by further fieldwork.
  • 33.
  • 35. 1. Insect collections offer substantial data on host relationships even though a majority of the specimens lack such information. 2. Our algorithm demonstrates a method for assessing data quality on a large scale. Our initial analyses show that: - We can have confidence in a significant proportion of the available information The data demonstrate a substantial degree of host specificity in our three target groups. 3. Degree of host specificity requires a scoring method that takes into account biological attributes, collecting techniques, and approaches to data capture in the field.
  • 36. Acknowledgments •Participating TCN and PBI Institutions •iDigBio •AMNH Database Data-entry Personnel •Participating TCN Data-entry Personnel •Michael D. Schwartz •National Science Foundation

Hinweis der Redaktion

  1. Good morning. Today I would like to speak to you about data on insect-plant associations as derived from insect collections. This presentation is a joint effort by Katja Seltmann, Christine Johnson, and me as part of our work on a TCN award from the NSF.
  2. In this talk we will use TCN data to host data for three families of herbivorous hemipterans and evaluate three propositions: The degree to which collections contain information on host relationships The degree of confidence we are able to place in that information, and The degree to which those data demonstrate host specificity or the lack thereof
  3. The AEC database has supported data capture for a number of NSF-supported projects. This slide shows the relative proportion of data captured by these projects, which in aggregate represent more than 1 million specimen records, the largest numbers coming from the TCN project which represents about two-thirds of the red slice of the pie.
  4. Here we see the institutions with more than 10,000 speciemen records and which have therefore made the most significant contributions to our knowledge of host relationships.
  5. These graphs plot specimens against time, with each point representing a collecting event. The graph in the lower right is the sum of collecting events for all three groups. Note that the scale for each graph is different, with the Miridae having a much greater number of specimens per collecting event than Aphidae and Membracidae. These data represent all collecting events, irrespective of whether host data involved or not.
  6. Here we see the data in the prior graphs transformed to show the numbers of collecting events with host records or each taxon, as well as information for remaining taxa in the database. Comparison of the right-hand bar with the remaining three gives a clear indication of the reasons for choice of taxa for this analysis. Blue is for records without host information; brown is for non-unique hosts; Yellow represents unique hosts, in other words, all host records for the insect taxon are from the same host genus in this analysis. Aphids almost always come with host information as a result of the collecting methods that are used in the group.
  7. Here we can see the numbers of plant families on the left, and plant genera on the right, occupied by each of the three groups we have chosen to analyze. Relative to the size of the taxon sample, the Aphidae show the highest diversity of host information at both the family and generic levels. The family data also support the proposition that all three taxa are specializing on many of the same plant families, a phenomenon that is reinforced in the following graphs.
  8. Here we numbers of collecting events by plant family for the Miridae,
  9. For the Membracidae, and
  10. For the Aphididae. You will note that a few families loom large as hosts, usually in all three groups, notably Asteraceae, Fabaceae, Fagaceae, Rosaceae, and Pinaceae, with most other families occurring in lower frequencies.
  11. Our approach to assessing the strength of host data in through a DECISION TREE: the first set of decisions is based on the frequencies and collecting event counts; the second set of decisions is based on the heuristic properties. Scores are based on fit of the model to the data and ranked from high to low. The main contributor to a score is frequency (f). Low frequency does not argue that information for a taxon should be completely disregarded. The score for an insect-plant association can be increased through information from the heuristics component, as for example, having insect larvae collected on a given plant species which would indicate a strong association even when there is a low number of collecting events. The existence large numbers of specimens or of authoritatively identified plant vouchers would also improve the score for a given association. Associations with a frequency of 1 make no argument for whether the data are strong or not because no confidence limits can be established. The only way to bolster the score is through more collecting. The single-event data do suggest that when going to the field the first host to be investigated should be the one for which we already have a presumed association. For example, in order to get in the high category, the frequency of y (f(y) has to be greater than or equal to 15 AND y has to be equal to or greater than 5. In order to get a medium score, you either need one or the other score. The value for the frequency of f(y) [frequency of y] is obtained by following formula:
  12. Here we see confidence values for the three families we have analyzed. As a proportion the Membracidae have the most high scores (in yellow). In absolute terms the Miridae present the most data on host fidelity with 842 associations with high scores, and in blue 1844 associations with medium scores; but, they also possess the greatest number of host data points based on a single collecting event (pink), a situation that obviously demands further fieldwork but may nonetheless be an indicator of a large number of valid host associations. All three insect families have large numbers of putative host associations with low scores (gray), a situation we will return to later in the presentation.
  13. Here we see a histogram showing all species of Miridae known to occur on Larrea (Zygophyllaceae) in the American Southwest. The gray portion of the bar indicates the proportion of collecting events known from Larrea, while the other colors indicate the proportions of collecting events from other plant families. What does our decision tree approach tell us about these data? Larrea served as the model from which we developed the decision-tree criteria.
  14. Here we see those same data plotted in the form of a graph: gray nodes represent insect species, green nodes represent plant genera, the large node representing Larrea. Red lines (edges) represent associations for which the decision tree indicates a high level of confidence in the host association. One might ask “Just what does this graph tell us?” The size of the balls is determined by the number of collecting events.
  15. This slide makes clear that in order to make sense of these data we need to tease apart the noise from the real signal. This graph shows the signal, whereas the prior graph commingles noise and signal. Even though specimen labels indicate that many taxa had been collected on Larrea, only 5 of those taxa actually appear to be host specific, as seen by their connections to the large green node. All other insect taxa are shown to have there actual (breeding) host associations with taxa other than Larrea, a result that may not be clear from a naïve interpretation of the data. We might therefore wish to look at the reasons for why we get these spurious answers. This graph is based on high scores only.
  16. When the noise is filtered out for the Miridae as a group by the elimination of low scores, and insect genera are plotted against plant genera, we see distinct patterns develop in the group. Here we see plant groups which have high herbivore diversity and in which we also have high confidence in the data. If this graph was done on a species-by-species comparison, the strength of the signal would probably be even greater although more complex in terms of presentation.