1. Measuring Metadata Quality
ELAG 2018
Péter Király, peter.kiraly@gwdg.de
Gesellschaft für wissenschaftliche
Datenverarbeitung mbH Göttingen (GWDG)
2. Measuring metadata quality. Generic title and bad thumbnail
2
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
3. Measuring metadata quality. Multilinguality problem
3
★ Mona Lisa → 456
results
★ La Gioconda → 365
results
★ La Joconde → 71
results
http://www.europeana.eu/portal/en/record/90402/RP_F_00_351.html
4. Measuring metadata quality. Problems with title
4
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
title: "VOETBAL-EREDIVISIE-
FEYENOORD - GO AHEAD 3-1",
description: "VOETBAL-EREDIVISIE-
FEYENOORD - GO AHEAD 3-1"
Same title and description
title: "NLD-820630-AMSTERDAM:
Straatmuzikanten proberen
geld te verdienen voor...",
Machine-readable ID in title
title: "+++EMPTY+++"
Leftover
5. Measuring metadata quality. Non-informative values
5
non informative dc:title:
“photograph, framed”,
“group photograph”
“photograph”
informative dc:title:
“Photograph of Sir Dugald Clerk”,
“Photograph of "Puffing Billy"”
bad good
6. Measuring metadata quality. Copy & paste cataloging
6
from a template?
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
7. Measuring metadata quality. The problem
7
there are “good” and “bad” metadata records
but we don’t have clear metrics like this:
functional requirements
good
acceptable
bad
8. Measuring metadata quality. Why data quality is important?
8
“Fitness for purpose” (QA principle)
purpose: to access content
no metadata no access to data no data usage
more explanation:
Data on the Web Best Practices
W3C Working Draft, https://www.w3.org/TR/dwbp/
9. Measuring metadata quality. Hypothesis
9
by measuring structural elements we
can approximate metadata record quality
≃ metadata smell
10. Measuring metadata quality. Purposes
10
★improve the metadata
★services: good data → reliable functions
★better metadata schema & documentation
★propagate “good practice”
11. Measuring metadata quality. Organisational approach
11
Europeana Data Quality Committee, DLF
Metadata Assessment WG, Metadata 2020,
Research Data Alliance
★ Analysing/revising metadata schema
★ Functional requirement analysis
★ Problem catalog
★ Multilinguality
12. Measuring metadata quality. Community bibliography
12
zotero.org/groups/metadata_assessment
dlfmetadataassessment.github.io
13. Measuring metadata quality. Tooling
13
Catmandu, UNT digital library, SHACL, ShEx, Luzzu,
Metadata Quality Assurance Framework
generic tools for measuring (meta)data quality
★ adaptable to different metadata schemes
★ scalable (to Big Data)
★ understandable reports for data curators
★ open source
14. Measuring metadata quality. What to measure?
14
★Structural and semantic features
Completeness, cardinality, uniqueness, length, dictionary entry, data type
conformance, multilinguality (generic metrics)
★Functional requirement analysis / Discovery scenarios
Requirements of the most important functions
★Problem catalog
Known metadata problems
15. Measuring metadata quality. Metadata requirements / User scenario
15
“As a user I want to be able to filter by whether a person is the
subject of a book, or its author, engraver, printer etc.”
Metadata analysis
Description of relevant metadata elements and their rules
Measurement rules
★ the relevant field values should be resolvable URI
★ each URI should be associated with labels in multiple languages
16. Measuring metadata quality. Metadata requirements / element—function map
16
Europeana sub-dimensions MARC Summary of Mapping to User Tasks
17. Measuring metadata quality. Measurement - Distinct Languages
17
Text w/o language annotation (dc.subject: Germany):
Text w language annotation (dc.subject: Germany@en)
Text w several language annotations (dc.subject:
Germany@en, Deutschland@de)
Link to (multilingual) vocabulary (http://www.geonames.org
/2921044/federal-republic-of-germany)
0
1
2
n
18. Measuring metadata quality. Measurement - Good example
18
dc:description
dc:title
Place/skos:prefLabel
Descriptive fields Subject headings
"Brandenburger Tor"@de
"Brandenburg Gate"@en
"Grenzübergang Potsdamer Platz"@de
"Postdamer Platz border crossing"@en
"Reichstag"@de
"Reichstag building"@en
"Die Mauer muß weg!"@de
"Die Mauer muß weg! (The
Wall must go!)"@en
"Kommentiertes Fotorama mit
Bildern von 1989-1990 in
Berlin"@de
"Annotated images from 1989-
1990 in Berlin"@en
19. Measuring metadata quality. Engineering - Batch API
19
client Metadata QA
/batch/measuring/start
sessionID
/batch/[recordId]
csv
for each records
/batch/measuring/stop
“success” | “failure”
/batch/analyzing/start
“success” | “failure”
/batch/analyzing/status
“in progress” | “ready”
/batch/analyzing/retriev
e
compressed package
periodically
measurement
analysis
20. Measuring metadata quality. Further steps
20
★Translate the results into
documentation,
recommendations
★Communication with data
providers
★Human evaluation of metadata
quality
★Cooperation with other projects
★Incorporating into ingestion
process
★Shape Constraint Language
(SHACL) for defining patterns
★Process usage statistics
★Measuring changes of scores
★Machine learning based
classification & clustering
human analysis technical
21. Measuring metadata quality. Links
21
★Europeana Data Quality Committee // http://pro.europeana.eu/europeana-
tech/data-quality-committee
★DLF Metadata Assessment group // http://dlfmetadataassessment.github.io
★Zotero group // https://www.zotero.org/groups/metadata_assessment
★Metadata 2020 // http://www.metadata2020.org/
★Research Data Alliance, Reseach Data Provenance WG // https://rd-
alliance.org/groups/research-data-provenance.html