A Critique of the Proposed National Education Policy Reform
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
1. lnformatics Workshop, Adelaide 28 November 2011
The BARCODE Data Standard:
Enabling Molecular Diagnostics
for Biodivesity
Robert Hanner, Ph.D.
Centre for Biodiversity Genomics
University of Guelph, Canada
2. The Infrastructure of Taxonomy
Collections and databases of specimens
Codes of Taxonomic Nomenclature
Compilations of taxonomic names
Monographs
Floristic and faunistic surveys/inventories
Revisions
The (undigitized) Taxonomic Literature
3.
4. DNA Barcoding New tools for taxonomy
The ability to compare genotype
information across a huge range of
organisms is a powerful tool
6. Couplets Consisting of:
“Species Name - DNA Sequence”
Basis of a “look-up table” enabling
molecular diagnostic applications
However, both elements are assertions
Underlying specimens and associated
raw sequence data are not typically
available for secondary inspection
9. “Only [27%] of papers had a legitimate specimens
examined section, with museum numbers for each
voucher, and names of the museums where the
specimens used in the study could be examined”
13. Barcoding:
Integrating Best Practices
Genomics
Classical
Taxonomy
14. Data Standards for BARCODE
Records in INSDC*
Community-based standards for COI
Creation of a reserved keyword BARCODE
- Required & recommended data elements
- Sequence quality and coverage
Recommended for identifying unknowns
Process to propose non-COI gene regions
*http://barcoding.si.edu/pdf/dwg_data_standards-final.pdf
17. Validation demonstrates that a procedure is
robust, reliable and reproducible.
PCR amplification and DNA sequencing:
• Are robust methods which produces
successful results a high percentage of the
time.
• Are reliable methods that produce accurate
results.
• Are reproducible methods producing similar
results each time a sample is tested.
19. 2009: Barcode Markers for Plants
52 authors from 24 institutions in 9
nations, proposed a pair of short
sequences (totaling about 1,450 base
pairs) from rbcL and matK as the
foundation for a DNA barcode library for
plants.
CBOL Plant Working Group (2009) A DNA barcode for
land plants. Proc Natl Acad Sci USA 106:12794–12797.
21. 2011: Barcode Marker for Fungi
149 authors from 71 institutions propose
ITS as fungal barcode target. It also has
demonstrated utility in some plants*.
Fungal Barcoding Consortium (2011) The nuclear
ribosomal internal transcribed spacer (ITS) region as a
universal DNA barcode marker for Fungi. Proc Natl Acad
Sci USA (Submitted).
*Hollingsworth (2011) Refining the DNA barcode for land
plants. www.pnas.org/cgi/doi/10.1073/pnas.1116812108
22. Move toward rapid data release:
In 2009 the community acknowledged
the value of the “Ft Lauderdale Accord”
Raw sequence data and high-level
taxonomy (eg order) deposited in INSDC
prior to publication
Gave rise to “Dark taxa” in INSDC and
subsequent arguments pro & con
23. Issues that need to be addressed:
Legacy BARCODE records lack trace
files
Many recent BARCODE records lack
valid names
Not all potential BARCODE data is in the
public domain
24. Question: What is barcoding?
A method for species identification and
discovery through the analysis of short,
standardized DNA sequences
Should BARCODE be applied only to
known species as an ID tag, or should it
be used to designate a sequence entry
conforming to a meta-data standard?
25. DNA Barcodes: a tool of integrative taxonomy
DNA Identification DNA Taxonomy
Barcoding
Low ambiguity High ambiguity
Species well-known Species unknown
26. Evolution of Standards
Even among well-studied vertebrates:
serious discrepancies exist in the
application of names across labs
Identification accuracy of reference
collections highly variable
Perhaps BARCODE is a better process
tag unless reserved for published data
27. 2011: BOLD 3.0
Supports assembly of BARCODE
compliant data records for all markers
Includes specimen images and
introduces BINs to aid data validation
Introduces features for 3rd party
annotation of data records to facilitate
library curation
28. What other issues remain?
Barcode annotation of plants and fungi?
Registration of institutions/collections
Synchronization of data bases
32. Accomplishments:
Integration of genomics and biodiversity
science via creation of a robust molecular
diagnostic interface between them
Increased community awareness of
taxonomy and collections
34. Rationale for Defining
“BARCODE” keyword in GenBank
Provides the community with reference
records with verifiable and retrievable data:
Associated with retrievable voucher specimens
(liberally defined: tissue, DNA, etc.)
Linked to on-line metadata
Meet an agreed upon standard of taxonomic
identification
Provide an assured level of data completeness
On an agreed upon gene region
Recommended for use in identifying unknowns
35. The Barcode Data Standard
Establishing a new data standard for “BARCODE”
keyword records in DDBJ/EMBL/GenBank:
1. Minimum 500bp, <1% ambiguous base calls
2. Double stranded sequence
3. Trace files and associated quality scores
4. Primers used to generate sequence
5. Linkages to:
1. A morphological voucher specimen
2. Structured reference to collections
3. Geospatial reference information
4. Valid species name
5. Who performed the identification
6. Literature citations