3. Linked Data journey ...
explore
what is linked data?
what use it is for us?
4. Linked Data journey ...
explore
what is linked data?
what use it is for us?
self-describing Integration
carries semantics with it comparable
annotate and explain slice and dice
data in context web API
... ...
5. Linked Data journey ...
explore
what is linked data?
what use it is for us?
self-describing Integration
carries semantics with it comparable
annotate and explain slice and dice
data in context web API
... ...
what’s involved?
7. Linked Data journey ...
explore pilot routine?
Great pilot but ...
can we reduce the time and cost?
how do we handle changes and updates?
how can we make the published data easier to use?
How do we make Linked Data “business as usual”?
8. Example case study: Environment Agency
monitoring of bathing
water quality
static pilot
live pilot
historic annual
assessments
weekly assessments
operational system
additional data feeds
live update
integrated API
data explorer
9. From pilot to practice
reduce modelling costs
patterns dive1
reuse
handling change and update
patterns
publication process
automation
conversion
publication
embed in the business process
use internally as well as externally
publish once, use many
data platform
10. Reduce costs - modelling
1. Don’t do it
map source data into isomorphic RDF, synthesize URIs
loses some of the value proposition
2. Reuse existing ontologies intact or mix-and-match
best solution when available
W3C GLD work on vocabularies – people, organizations,
datasets ...
3. Reusable vocabulary patterns
example:
Data cube plus reference URI sets
adaptable to broad range of data – environmental, statistical,
financial ...
11. Reusable patterns: Data cube
Much public sector data has regularities
set of measures
observations, forecasts, budgets, assessments, statistics ...
>0.1 34
27 good
excellent
poor
good 125
12. Reusable patterns: Data cube
Much public sector data has regularities
sets of measures
observations, forecasts, budgets, assessments, estimates ...
organized along some dimensions
region, agency, time, category, cost centre ...
objective code cost centre
12 15 25
measure: spend
8 9 11
120 130 180
time
13. Reusable patterns: Data cube
Much public sector data has regularities
sets of measures
observations, forecasts, budgets, assessments, estimates ...
organized along some dimensions
region, agency, time, category, cost centre ...
interpreted according to attributes
units, multipliers, status
objective code cost centre
provisional
$12k $15k $25k
measure: spend
$8k $9k $11k
final
$120k $130k $180k
time
15. Data cube pattern
Pattern, not a fixed ontology
customize by selecting measures, dimensions and attributes
originated in publishing of statistics
applied to environment measurements, weather forecasts, budgets
and spend, quality assessments, regional demographics ...
Supports reuse
widely reusable URI sets – geography, time periods, agencies, units
organization-wide sets
modelling often only requires small increments on top of core
pattern and reusable components
opens door for reusable visualization tools
standardization through W3C GLD
16. Application to case study
Data Cubes for water quality measurement
in-season weekly assessments
end of season annual assessments
dimensions:
time intervals – UK reference time service
location - reference URI set for bathing waters and sample pts
cubes can reuse these dimensions
just need to define specific measures
17. From pilot to practice
reduce modelling costs
patterns
reuse
handling change and update
patterns dive 2
publication process
automation
conversion
publication
embed in the business process
use internally as well as externally
publish once, use many
data platform
18. Handling change
critical challenge
most initial pilots choose a snapshot dataset
and go stale, fast
understanding the nature of data updates and how to handle
them is critical to successful scaling to business as usual
types of change
new data related to different time period
corrections to data
entities change
properties
identity
19. Modelling change
1. Individual data items relate to new time period
Pattern: n-ary relation
observation resource relates value to time period and other context
use Data Cube dimensions for this
bwq:sampleYear
bwq:bathingWater http://reference.data.gov.uk/id/year/2009
http://environment.data.gov.
uk/id/bathing- bwq:classification Higher
water/ukk1202-36000
bwq:sampleYear
Clevedon Beach http://reference.data.gov.uk/id/year/2010
bwq:classification
Minimum
bwq:sampleYear
http://reference.data.gov.uk/id/year/2011
bwq:classification
Higher
History or latest?
latest is non-monotonic but helpful for many practical uses
materialize (SPARQL Update), implement in query, implement in API
choice whether to keep history as well
water quality v. weather forecasts
21. Modelling change
3. Mutation
Infrequent change of properties, essential identity remains
e.g. renaming a school, adding another building
routine accesses see property value, not function of time
patterns
in place update
named graphs
current graph + graphs for each previous state + meta-graph
explicit versioning with open periods
22. Modelling change
3. Mutation
explicit versioning with open periods
dct:hasVersion dct:hasVersion
endurant
“Clevedon Beach” “Clevedon Sands”
time:intervalStarts time:intervalStarts
dct:valid 2003 dct:valid 2011
2011
time:intervalFinishes
find right version by query on validity interval
simplify use through
non-monotonic “latest value” link
API to implement query filters automatically
23. Application to case study
weekly and annual samples
use Data Cube pattern (n-ary relation)
withdrawn samples
replacement pattern (no explicit change event)
Data Cube slice for “latest valid assessment”
generated by a SPARQL Update query
API gives easy access to the latest valid values
linked data following or raw SPARQL query allows drilling into changes
changes to bathing water profile
versioning pattern
bathing water entity points to latest profile (SPARQL Update again)
24. From pilot to practice
reduce modelling costs
patterns
reuse
handling change and update
patterns
publication process
automation
conversion dive 3
publication
embed in the business process
use internally as well as externally
publish once, use many
data platform
25. Automation
Transform and publish data feed increments
transformation engine service
reusable mappings, low cost to adapt to new feeds
linking to reference data
publication service that supports non-monotonic changes
publication
service
data increments (csv) transform
service
replicated
xform xform reconciliation
xform
spec. spec. publication
spec. service
servers
Reference data
26. Transformation service
declarative specification of transform
single service support range of transformations
easy to adapt transformation to new feeds and modelling
changes
R2RML – RDB to RDF Mapping Language
specify mapping from database tables to RDF triples
W3C candidate recommendation
D2RML
R2RML extension to treat CSV feed as a database table
28. Using patterns
problems with verbosity, increases reuse costs
extend to support modelling patterns
Data Cube
specify mapping to observation with measures and dimensions
engine generates Data Set and Data Structure Definition
automatically
29. D2RML cube map example
:dataCubeMap a dr:DataCubeMap ;
rr:logicalTable “dataSource”;
dr:datasetIRI “http://example.org/datacube1”^^xsd:anyURI ;
dr:dsdIRI “http://example.org/myDsd”^^xsd:anyURI ;
Instances will
dr:observationMap [ automatically link to
rr:subjectMap [ base Data Set
rr:termType rr:IRI ;
rr:template “http://example.org/observation/{PLACE}/{DATE}” ] ;
rr:componentMap [
Implies an entry in the Data
dr:componentType qb:measure ;
Structure Definition which is
rr:predicate aq:concentration ;
auto-generated
rr:objectMap [ rr:column “NO2” ; rr:datatype xsd:decimal ; ]
] ;
... Define how measure
value is to be
represented
30. But what about linking?
connect observations to reference data
a core value of linked data
R2RML has Term Maps to create values
constants and templates
extend to allow maps based on other data sources
Lookup map
lookup resource in a store, fetch predicate
Reconcile
specify lookup in a remote service
use Google Refine reconciliation API
31. Automation
Transform and publish data feed increments
transformation engine service
reusable mappings, low cost to adapt to new feeds
linking to reference data
publication service that supports non-monotonic changes
publication
service
data increments (csv) transform
service
replicated
xform xform reconciliation
xform
spec. spec. publication
spec. service
servers
Reference data
32. Publication service
goals
cope with non-monotonic effects of change representation
so replication is robust and cheap (=> make it idempotent)
solution
SPARQL Update
publish transformed increment as a simple DATA INSERT
then run SPARQL Update script for non-monotonic links
dct:replacedBy links
lastest value slices
34. Automation
Transform and publish data feed increments
transformation engine service
reusable mappings, low cost to adapt to new feeds
linking to reference data
publication service that supports non-monotonic changes
publication
service
data increments (csv) transform
service
replicated
xform xform reconciliation
xform
spec. spec. publication
spec. service
servers
Reference data
35. Application to case study
Update server
transforms based on scripts (earlier scripting utility)
linking to reference data
distributed publication via
SPARQL Update
extensible range of data sets
annual assessments
in-season assessments
bathing water profile
features (e.g. pollution sources)
reference data
36. From pilot to practice
reduce modelling costs
patterns
reuse
handling change and update
patterns
publication process
automation
conversion
publication
embed in the business process dive 4
use internally as well as externally
publish once, use many
data platform
37. Embed in business process
embedding is critical to ensure data kept up to date
in turn needs usage
=> lower barrier to use external
use
data not
used rich, up
to date invest
data
data goes hard to
stale justify
internal
use
38. Lowering barrier to use
simple REST APIs
use Linked Data API specification
rich query without learning SPARQL
easy consumption as JSON, XML
gets developers used to data and data model
publication
LD API
service
transform
service
39. Application to case study
embedded in process for weekly/daily updates
infrastructure to automate conversion and publishing
API plus extensive developer documentation
third party and in-house applications built over API
publish once, use many
information products as applications over a data platform,
usable externally as well as internally
40. The next stage
grow range of data publications and uses
range of reference data and sets brings new challenges
discover reference terms and models to reuse
discover datasets to use for application
discover models and links between sets
needs a coordination or registry service
story for another day ...
41. Conclusions
illustrated how public sector users of linked are moving
from static pilots to operational systems
keys are:
reduce modelling costs through patterns and reuse
design for continuous update
automation of publication using declarative mappings and
SPARQL Update
lower barrier to use through API design and documentation
embed in organization’s process so the data is used and useful
Acknowledgements
Only possible thanks to many smart colleagues: Stuart
Williams, Andy Seaborne, Ian Dickinson, Brian McBride,
Chris Dollin
plus Alex Coley and team from the Environment Agency