Presentation describing recent work on observation-related vocabularies, undertaken by CSIRO as part of a contribution to Australia's National Environmental Information Infrastructure.
Presented at the 2nd workshop of the Ocean Data Interoperability Platform, La Jolla, Ca. 3rd-6th December, 2013
Apidays New York 2024 - The value of a flexible API Management solution for O...
Developing and publishing vocabularies
1. Developing and publishing vocabularies
Simon Cox, Bruce Simons, Jonathan Yu | Environmental Information Systems
4 December 2013
LAND AND WATER
2. Are we talking about the same thing?
- Beer glasses in Australia
Glass
Size
115 ml
4 oz
140 ml
5 oz
170 ml
6 oz
200 ml
7 oz
225 ml
8 oz
255 ml
9 oz
NSW
-
Pony
-
Seven
-
-
Middy Schooner
Pint
NT
-
-
-
Seven
-
-
Handle Schooner
-
QLD
-
Small
Beer
-
-
Glass
-
Pot
-
-
SA
-
Pony
-
Butcher
-
-
Schooner
Pint
-
TAS
Small
Beer
-
A Beer or
Six
-
Eight
-
Ten or
Pot
-
-
VIC
-
Pony
Small
Glass
-
-
Pot
Schooner
-
WA
Shetland
Pony
Pony
Bobbie
Glass
-
-
Source: http://www.liquormerchants.org.au/Q&A.html
2 | Linked Vocabularies | Simon Cox
285 ml
10 oz
425 ml
15 oz
Middy Schooner
575 ml
20 oz
Pot
3. Healthy Headwater - NGIS Terms
cas_rn
number
EC
ANGDTS
Code
EC
PH
pH
1688700-6
TDS
1688700-6
TDS
ANGDTS Description
ease at which conduction current can be
caused to flow through material in
microSiemens/centimetre
negative logarithm of hydrogen ion
concentration in ph units
Units_used
us/cm
ms/cm
mg/L
pH units
concentration of chloride as Cl in
milligrams/litre
mg/L
mg/kg
the portion of total solids that passes
through filter and deemed to have been
dissolved in sample in milligrams/litre
mg/L
WDTF
Parameter
hierarchy
alt names
chemical name
ADWG
name
IUPAC
name
ElectricalConduc
tivityAt25C_uSc Electrical
m
Conductivity
WaterpH_pH
pH
TOTALAL
KALINITY
ALKT
concentration in milligrams/litre CaCO3 of
titratable bases using a methyl-orange
endpoint of about pH 4.3
mg/L
HARD
the ability of water to precipitate soap and
is sum of calcium and magnesium
concentrations as milligrams/litre CaCO3
mg/L
SAR
ratio of sodium to magnesium and calcium
and used to assess risk of excess sodium in
irrigation water
Ratio
alkalinity ascribed to carbonate in
milligrams/litre CO3
mg/L
%MOL
Hardness (as
CaCO3)
Sodium
Adsorption
Ratio
Carbonate
Alkalinity (as
CaCO3)
concentration of nitrate as N in
milligrams/litre
mg/L
mg/kg
Nitrate
mg/L
mg/kg
ug/L
Iron
SAR
3812-326
ALKC
NITRATE
1479755-8
7439-896
7439-89- concentration of iron as Fe in
6
milligrams/litre
AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
Ion
pH, alkalinity,
acidity
pH
Chloride
Total
Dissolved
Solids
Chloride
Anion
Salinity
Total Alkalinity
(as CaCO3)
HARDNE
SS_CACO
3
Group
Conductivity
Chloride
Total Dissolved
Solids
collection
pH, alkalinity,
acidity
Hardness
(as calcium
carbonate)
Hardness (as
calcium
carbonate)
Salinity
Carbonate
Nitrate and
Nitrite
pH, alkalinity,
acidity
Nitrate and
Nitrite
Iron
Anion
Metal
Cation
4. Are these the same?
“nitrogen”
“dissolved nitrogen”
“Total nitrogen, water, filtered, milligrams per liter”
“Concentration of nitrogen (total) per unit volume of the water body [dissolved
plus reactive particulate phase] by oxidation and colorimetric autoanalysis“
“Concentration of nitrogen (total) per unit mass of the water body *dissolved plus
reactive particulate <GF/F phase] by filtration and high temperature Pt catalytic
oxidation”
“Concentration (moles or mass) of total nitrogen (i.e. nitrogen in all chemical
forms) in suspended particulate material per unit volume of the water column.”
“Concentration of nitrogen (total) ,'PON'- per unit volume of the water body
[particulate 2-10um phase+ by filtration, acidification and elemental analysis”
“Dissolved total and organic nitrogen concentrations in the water column”
4 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
6. We are not
alone!
OKFN Linked Open
Vocabularies
http://lov.okfn.org/dataset/lov/
6 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
7. Conceptual Model of QUDT
AGU Fall 2013 | IN52B-08 | Cox,
Simons, Yu | Vocabulary re-use
7
8. Standard ontology
of chemicals
>36 000 chemical entities
http://purl.obolibrary.org/obo/chebi.owl
8 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
9. Dissolved nitrogen concentration objects
SubstanceOrTaxon
ScaledQuantityKind
+object Of In t er est
+qud t :gen er alizat ion
n it r ogen
n it r ogen con cen t r at ion
d issolved n it r ogen
con cen t r at ion
+qud t :gen er alizat ion
+exact Mat ch
Con cen t r at ion
elem en t al n it r ogen
+qud t :un it
+qud t :quan t it yKin d
(CHEBI_3 3 2 6 7 )
+qud t :gen er alizat ion
MolePer cen t
+qud t :un it
Unit
A m oun t Of Subst an cePer Un it Volum e
MilliGr am sPer Lit r e
+qud t :quan t it yKin d
AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
10. Extension to QUDT
QUDT
AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
WQOP
11. Linked to SKOS
AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
16. NVS implementation/interfaces
• SQL store
http://vocab.nerc.ac.uk/
• View item as SKOS Concept
http://vocab.nerc.ac.uk/collection/C18/current/72/
• SPARQL endpoint
http://vocab.nerc.ac.uk/sparql
• SISSvoc service
http://auscope-services-test.arrc.csiro.au/elda-demo/nerc
• SISSvoc search
http://sissvoc.ereefs.info/search#
16 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
17. Harmonization
• WQOP Model extends QUDT
• Items refer to ChEBI
• http://environment.data.gov.au/water/quality/def/object/anthracene
• Mappings to NVS
• http://environment.data.gov.au/water/quality/def/property/air_pressure
• http://environment.data.gov.au/water/quality/def/property/anthracene_con
centration
17 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
18. [ODIP-1] Conflation
• Property/parameter vocabularies have 100s-1000s entries
• each definition includes
– Semantics – the quantity being observed e.g. ‘Nitrogen’
Plus one or more of
– Procedure – the instrument or method used
– Sampling protocol e.g. Weekly-mean
– Units of measure
– Aggregation with other primitive parameters
• This makes it difficult to discover and combine data from different
projects
18 | Linked Vocabularies | Simon Cox
19. Harmonization challenges
• Even standard mapping props have no effect without reasoning
(sameAs/exactMatch/equivalentClass)
• items may be conceptualised as both classes and individuals:
complicates mapping mechanics
• URIs generally reflect ownership/maintenance
• re-use of items across vocabularies may lead to surprises
• versioning complicates cross-eferences and re-use
http://sweet.jpl.nasa.gov/2.2/matrRock.owl#Serpentinite
vs.
http://sweet.jpl.nasa.gov/2.1/matrRock.owl#Serpentinite
???
19 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
21. Summary
• Vocabularies should be
• Standardized
• Published
• Harmonized
• Extend / re-use existing vocabularies where possible
http://environment.data.gov.au/water/quality/def/property/
http://sissvoc.ereefs.info/search
21 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
22. Acknowledgements
This work was undertaken as part of CSIRO’s contribution to eReefs
– a National Environmental Information Infrastructure project.
22 | AGU Fall 2013 | IN52B-08 | Cox, Simons, Yu | Vocabulary re-use
23. Thank you
CSIRO Land and Water
Simon Cox
Research Scientist
t +61 3 9252 6342
e simon.cox@csiro.au
w www.csiro.au/people/simon.cox
LAND AND WATER
Hinweis der Redaktion
Here’s a set of important vocabularies. One of the key lessons from these is that words are not sufficient to act as keys. If you ask for a ‘pot’ of beer in different places in Australia, you could get 100% different quantity. At the very least, the name must be scoped. But probably a differnet kind of identifier is required.
Example of the data mapped between the HH and NGIS databases.Pretty good: Mostlyseparates Units and methods from the observable property Maps to terms from other related vocabularies (CAS, IUPAC, WDTF) Some grouping (simple hierarchy) and collections Not modulardifficult to incorporate vocabularies not governed by that discipline (e.g. ‘units of measure’) Inconsistent Governancesame concepts (different terms?) appear in multiple vocabularies, no relationships between terms and collections from different data providers, Not interoperablelocal, non-resolvable identifiers, lack of a formal definitions,lack of an ontology describing the relationship between conceptsAmbiguityconcepts are poorly defined,
If we look at a ‘nitrogen’ example we have theScaledQuantityKind“dissolved nitrogen concentration” which has the more general concept “nitrogen concentration”, which in turn is a specialization of “Concentration”.“dissolved nitrogen concentration” can be measured in “MolPerCent” units, which can be used for “Concentration” QuantityKinds, or “MilliGramsPerLitre” which can be used for “AmountOfSubstancePerUnitVolume” QuantityKinds.“dissolved nitrogen concentration” is a measure of the “nitrogen” SubstanceOrTaxon, which is an exactMatch of the ChEBI “elemental nitrogen” concept.
A number of formal units of measure systems exist: ‘The Unified Code For Units of Measure’ (UCUM; Schadow and McDonald, 2009), ‘Ontology of units of measure’ (OM; Rijgersberg et al., 2013), ‘Measurement Units Ontology’ (MUO; Berrueta and Polo, 2009), ‘Ontology of Units of Measurement’ (UO), ‘Semantic Web for Earth and Environmental Terminology’ (SWEET v2.2), ‘Quantities, Units, Dimensions, Values’ (QUDV; de Kooning et al. 2009) ‘Quantities, Units, Dimensions and Data Types’ (QUDT; Hodgson and Keller, 2011). These all use different modelling approaches and formalisms ranging from simple vocabularies enumerating units of measures, to alignment of measurement related concepts to upper ontologies. They also differ in their coverage of the set of unit of measures. Of the established and well-governed unit of measure ontology options, QUDT is well-aligned with our understanding of the relationships between measurements and units of measure. Our extension to QUDT recognised that QuantityKinds are the subset of all PropertyKinds that can be measured. We created the ScaledQuantityKind class as an equivalent class to allow specifying the units associated with the QuantityKind via the qudt:unit property. I understand that QUDT v2 intends to add this property to QuantityKind, making ScaledQuantityKind redundant.The objectOfInterest property allows specifying what the substance (or taxon if biological) is that is being measured.Vocabularies for the units of measure (instances of the ‘qudt:Unit’ concept) and the kinds of quantities (instances of the ‘qudt:QuantityKind’ concept) were imported from QUDT, supplemented with units of measure from the existing water quality vocabularies and any additional required quantity kinds.QUDT contains 1484 instances of units of measure and 236 quantity kinds. However, it is missing some units of measure and kinds of quantities required for the water quality data. To rectify this, we added 41 additional units of measure, and 17 quantity kinds, all in separate namespaces to the original QUDT. The conversion factors between these new units and the existing QUDT units still need to be defined using the QUDT mechanics.
HIC 2014Harmonization of vocabularies for water dataCox, Yu, SimonsObservational data encodes values of properties associated with a feature of interest, estimated by a specified procedure. For water the properties are physical parameters like level, volume, flow and pressure, and concentrations and counts of chemicals, substances and organisms. Water property vocabularies have been assembled at project, agency and jurisdictional level. Organizations such as EPA, USGS, CEH, GA and BoM maintain vocabularies for internal use, and may make them available externally as text files. BODC and MMI have harvested many water vocabularies alongside others of interest in their domain, formalized the content using SKOS, and published them through web interfaces. Scope is highly variable both within and between vocabularies. Individual items may conflate multiple concerns (e.g. property, instrument, statistical procedure, units). There is significant duplication between vocabularies. Semantic web technologies provide the opportunity both to publish vocabularies more effectively, and achieve harmonization to support greater interoperability between datasets. Models for vocabulary items (property, substance/taxon, process, unit-of-measure, etc) may be formalized OWL ontologies, supporting semantic relations between items in related vocabularies;By specializing the ontology elements from SKOS concepts and properties, diverse vocabularies may be published through a common interface;Properties from standard vocabularies (e.g. OWL, SKOS, PROV-O and VAEM) support mappings between vocabularies having a similar scopeExisting items from various sources may be assembled into new virtual vocabularies However, there are a number of challenges: use of standard properties such as sameAs/exactMatch/equivalentClass require reasoning support; items have been conceptualised as both classes and individuals, complicating the mapping mechanics;re-use of items across vocabularies may conflict with expectations concerning URI patterns;versioning complicates cross-references and re-use. This presentation will discuss ways to harness semantic web technologies to publish harmonized vocabularies, and will summarise how many of the challenges may be addressed.