The Other Side of Linked Open Data: Managing Metadata Aggregation
1. The Other Side of Linked Data:
Managing Metadata Aggregation
ALCTS Metadata Interest Group
ALA Midwinter 2014
2. Where Are We Now?
• Major projects so far focused on exposing
selected portions of their data for
‘experimentation’
– Who’s using this data?
– Can LOD for libraries succeed on that basis?
• LOD is not just outputs, needs actual use to
inform practice
– A more complete view of the environment and
workflow should help
3. Outline
• Limitations of the traditional database strategy
– Including records, normalization, de-duplication, etc.
• Components of a fuller view
– Workflow
– Inputs, outputs
– Data cache and services
– Need for automated orchestration
– The maintenance conundrum
4. Substituting a Cache for a Database
• Supports multiple streams of data
• Allows detailed provenance to be carried over
time
• Separates services from data storage
• Allows more extensive automation (and
orchestration of services)
• Focuses valuable human effort where it’s
needed: analysis, design and implementation
of improvement services
5. Workflow
• Obtain data (possibly as ‘records’)
• Store data as statements in cache
• Evaluate data by source or collection
• Improve data using specific services, as
determined by evaluation
• Publish improved data
• [Rinse, repeat]
9. Yellow=Data we share now
Orange=Data we propose to share
Green=Data categories we can share
10. Developing and Defining Services
• Small single purpose services are easier to
develop and maintain
– What services you need are determined by goals,
evaluation results, etc.
– ‘Orchestration’ of services applies them to specific
kinds of data, in order
– Services can be described, and linked, to expose
who, what, when and how to downstream users
11. Developing Automated Interaction
• Rule: Use humans for things requiring human
understanding and decision making
– Use machines for everything else
– A manual process for something a machine can do as
well or better is a failure
• Improvement services can be granular, invoked in
prescribed order, and report results for later use
– Continuous improvement necessary to respond to
continuous change
12.
13. Data Maintenance
• Improved data returns as statements to the data
cache, with provenance attached
• Statement strategy avoids overwriting of new data
over ‘improved’ data
• Each new statement adds to what is known about a
described resource
• Statements can be cherry picked and exposed to others in
statements or records, in ‘flavors’ or as a ‘everything we
have’
If LOD exists in multiple versions, and nobody uses it, does it make noise?
Evaluation using statistical analysis tool, from http://dcpapers.dublincore.org/pubs/article/view/744, Analyzing Metadata for Effective Use and Re-Use
Naomi Dushay, Diane I. Hillmann
Revised diagram from: Orchestrating metadata enhancement services: Introducing Lenny
Jon Phipps, Diane I. Hillmann, Gordon Paynter. Note that XForms in this context means ‘Transforms’—was well before an XForms standard that means something specific.
http://dcpapers.dublincore.org/pubs/article/view/803