Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Döring dwc basisofrecord
1. Typing in Darwin Core
do we need dwc:basisOfRecord?
TDWG 2014
Markus Döring, GBIF
Jönköping, October 2014
2. dc:type
“The nature or genre of the resource.”
“Recommended best practice is to use a controlled vocabulary such
as the DCMI Type Vocabulary” - but it has range rdfs:Class
Collection PhysicalObject
Dataset Service
Event Software
Image Sound
InteractiveResource StillImage
MovingImage Text
3. dwc:basisOfRecord
“The Darwin Core Type Vocabulary extends and refines terms from the
Dublin Core Type Vocabulary to describe and categorize resources
more specifically for biodiversity applications. The basisOfRecord
should be populated with the value from the Darwin Core Type
Vocabulary that best corresponds to the resource being shared.”
Occurrence dc:Event HumanObservation
MaterialSample dc:StillImage MachineObservation
Taxon dc:MovingImage PreservedSpecimen
Nomen* dc:Sound FossilSpecimen
dc:Text LivingSpecimen
dc:PhysicalObject
DwC types vocabulary has been merged into DwC namespace !!!
http://rs.tdwg.org/dwc/terms/PreservedSpecimen
4. Typing DwC XML
• XML protocols (DiGIR, TAPIR)
use simple, flat DwC
!
• Occurrences typed by
• dc:type
• BasisOfRecord
5. Typing DwC Archives
• rowType with class term defines type for all records in a file
• extension files have their own rowType
• dwc:basisOfRecord in addition
6. Occurrences in DwC-A
• dwc:Occurrence core rowType
• single flat Occurrence core, similar to XML
• dwc:Occurrence extension (inherits all core values!)
• dwc:Taxon core, “checklists”
• dwc:Event core, sampling / monitoring
• dwc:MaterialSample for specimens (not observations)
• as core or extension?
• subset of basisOfRecord values is applicable
• should dwc:Occurrence be restricted to observations?
https://github.com/mdoering/dwca-examples
7. Typing in DwC RDF
• rdf:type primarily defines type for a RDF resource
• values can be URIs from dcmitype, dwc, …
• other terms could describe the resource nature
• dc:type, dcterms:type, dwc:basisOfRecord
• … but not recommended
8. basisOfRecord @ GBIF
HumanObserva,on 265573716 Museum
specimen 175324
PreservedSpecimen 139572007 Reportado 162128
Observa,on 72567152 F 143949
O 26314003 herbarium
specimen 141170
S 19055660 Published
Report 134974
specimen 9809257 L 131428
Occurrence 6733802 genomic
DNA 118827
Colectado 1774372 Observado 113815
voucher 1672883 Plant 110017
OtherSpecimen 1575388 preserved 107149
specimen(SP) 1325600 s,ll
image 96591
FossilSpecimen 1228399 Especimen
preservado 85682
HO 1119094 FishPrepara,on 75023
Accession 976457 Unknown 74037
FossileSpecimen 906052 Fossil
Specimen 73975
Observaciõn
humana 842668 collected
specimen 73311
fossil 764820 Compound
observa,on 72926
MachineObserva,on 672918 Especimen
preservado 70272
LivingSpecimen 558926 Literature 40380
FossilRecord 372226 DrawingOrPhotograph 37105
Observasjon 349598 VirtualSpecimen 26505
Objekt 267634 living
organism 11630
Unpublished
Report 255305 PreservedTissue 11091
Voucher 211771 living,
growing
plant 6189
Personal
Communica,on 180665 fluid
specimen 7870
9. basisOfRecord @ GBIF
HumanObserva,on 265573716 Museum
specimen 175324
PreservedSpecimen 139572007 Reportado 162128
Observa,on 72567152 F 143949
O 26314003 herbarium
specimen 141170
S 19055660 Published
Report 134974
specimen 9809257 L 131428
Occurrence 6733802 genomic
DNA 118827
Colectado 1774372 Observado 113815
voucher 1672883 Plant 110017
OtherSpecimen 1575388 preserved 107149
specimen(SP) 1325600 s,ll
image 96591
FossilSpecimen 1228399 Especimen
preservado 85682
HO 1119094 FishPrepara,on 75023
Accession 976457 Unknown 74037
FossileSpecimen 906052 Fossil
Specimen 73975
Observaciõn
humana 842668 collected
specimen 73311
fossil 764820 Compound
observa,on 72926
MachineObserva,on 672918 Especimen
preservado 70272
LivingSpecimen 558926 Literature 40380
FossilRecord 372226 DrawingOrPhotograph 37105
Observasjon 349598 VirtualSpecimen 26505
Objekt 267634 living
organism 11630
Unpublished
Report 255305 PreservedTissue 11091
Voucher 211771 living,
growing
plant 6189
Personal
Communica,on 180665 fluid
specimen 7870
10. Evidence model
• Keep evidence for Occurrence as distinct entities
• Occurrence only for organism
in place and time
!
!
!
!
• Feasable for publishers?
• overly normalized for flat sources?
• Evidence location != occurrence location
hasEvidence
Occurrence
StillImage
Machine
Observation
MaterialSample
hasEvidence
hasEvidence
hasEvidence
Literature
Time
Organism
Place
11. Typing Evidence
• Basic evidence types
• MaterialSample
• Observation
• Media
• Extend base types as class hierarchy in DwC?
• rdf:type/rowType
• An evidenceType property with an external vocabulary?
• is this just another name for basisOfRecord?
12. Managing type vocabulary
• Many dimensions. Multiple inheritance or many vocabularies?
• preservationMethod
• samplingProtocol
• organismPart
• Manage vocabulary
• simply Github?
• Format
• YAML
• BCO OWL
• RDF
• SKOS
• OBO file format
13. Do we need basisOfRecord?
• Occurrence
• HumanObservation
• MachineObservation
• PhysicalObject
• PreservedSpecimen
• FossilSpecimen
• LivingSpecimen
• legacy values
• Germplasm
• Literature
type=dwc:Occurrence
samplingProtocol=human
samplingProtocol=machine
type=dwc:MaterialSample
preparations=preserved
preparations=fossil (???)
preparations=alive (seed)
preparations=seed, culture collection, …
dc:source (evidence in literature)
18. Discussion
• Restrict Occurrence to observations?
• use MaterialSample for all physical things
• Do we want to use all new DwC terms as classes?
• is it legitimate to use them as rowType / rdf:type?
• do we need new id terms, e.g. FossilSpecimenID ?
• Type by single vocabulary or multiple “dimensions”
• typing by class hierarchy or properties
• How do we want to manage a type vocabulary?
• dc:type needed if we have a more specific type?