SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Measuring Metadata Quality
ELAG 2018
Péter Király, peter.kiraly@gwdg.de
Gesellschaft für wissenschaftliche
Datenverarbeitung mbH Göttingen (GWDG)
Measuring metadata quality. Generic title and bad thumbnail
2
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
Measuring metadata quality. Multilinguality problem
3
★ Mona Lisa → 456
results
★ La Gioconda → 365
results
★ La Joconde → 71
results
http://www.europeana.eu/portal/en/record/90402/RP_F_00_351.html
Measuring metadata quality. Problems with title
4
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
title: "VOETBAL-EREDIVISIE-
FEYENOORD - GO AHEAD 3-1",
description: "VOETBAL-EREDIVISIE-
FEYENOORD - GO AHEAD 3-1"
Same title and description
title: "NLD-820630-AMSTERDAM:
Straatmuzikanten proberen
geld te verdienen voor...",
Machine-readable ID in title
title: "+++EMPTY+++"
Leftover
Measuring metadata quality. Non-informative values
5
non informative dc:title:
“photograph, framed”,
“group photograph”
“photograph”
informative dc:title:
“Photograph of Sir Dugald Clerk”,
“Photograph of "Puffing Billy"”
bad good
Measuring metadata quality. Copy & paste cataloging
6
from a template?
more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
Measuring metadata quality. The problem
7
there are “good” and “bad” metadata records
but we don’t have clear metrics like this:
functional requirements
good
acceptable
bad
Measuring metadata quality. Why data quality is important?
8
“Fitness for purpose” (QA principle)
purpose: to access content
no metadata no access to data no data usage
more explanation:
Data on the Web Best Practices
W3C Working Draft, https://www.w3.org/TR/dwbp/
Measuring metadata quality. Hypothesis
9
by measuring structural elements we
can approximate metadata record quality
≃ metadata smell
Measuring metadata quality. Purposes
10
★improve the metadata
★services: good data → reliable functions
★better metadata schema & documentation
★propagate “good practice”
Measuring metadata quality. Organisational approach
11
Europeana Data Quality Committee, DLF
Metadata Assessment WG, Metadata 2020,
Research Data Alliance
★ Analysing/revising metadata schema
★ Functional requirement analysis
★ Problem catalog
★ Multilinguality
Measuring metadata quality. Community bibliography
12
zotero.org/groups/metadata_assessment
dlfmetadataassessment.github.io
Measuring metadata quality. Tooling
13
Catmandu, UNT digital library, SHACL, ShEx, Luzzu,
Metadata Quality Assurance Framework
generic tools for measuring (meta)data quality
★ adaptable to different metadata schemes
★ scalable (to Big Data)
★ understandable reports for data curators
★ open source
Measuring metadata quality. What to measure?
14
★Structural and semantic features
Completeness, cardinality, uniqueness, length, dictionary entry, data type
conformance, multilinguality (generic metrics)
★Functional requirement analysis / Discovery scenarios
Requirements of the most important functions
★Problem catalog
Known metadata problems
Measuring metadata quality. Metadata requirements / User scenario
15
“As a user I want to be able to filter by whether a person is the
subject of a book, or its author, engraver, printer etc.”
Metadata analysis
Description of relevant metadata elements and their rules
Measurement rules
★ the relevant field values should be resolvable URI
★ each URI should be associated with labels in multiple languages
Measuring metadata quality. Metadata requirements / element—function map
16
Europeana sub-dimensions MARC Summary of Mapping to User Tasks
Measuring metadata quality. Measurement - Distinct Languages
17
Text w/o language annotation (dc.subject: Germany):
Text w language annotation (dc.subject: Germany@en)
Text w several language annotations (dc.subject:
Germany@en, Deutschland@de)
Link to (multilingual) vocabulary (http://www.geonames.org
/2921044/federal-republic-of-germany)
0
1
2
n
Measuring metadata quality. Measurement - Good example
18
dc:description
dc:title
Place/skos:prefLabel
Descriptive fields Subject headings
"Brandenburger Tor"@de
"Brandenburg Gate"@en
"Grenzübergang Potsdamer Platz"@de
"Postdamer Platz border crossing"@en
"Reichstag"@de
"Reichstag building"@en
"Die Mauer muß weg!"@de
"Die Mauer muß weg! (The
Wall must go!)"@en
"Kommentiertes Fotorama mit
Bildern von 1989-1990 in
Berlin"@de
"Annotated images from 1989-
1990 in Berlin"@en
Measuring metadata quality. Engineering - Batch API
19
client Metadata QA
/batch/measuring/start
sessionID
/batch/[recordId]
csv
for each records
/batch/measuring/stop
“success” | “failure”
/batch/analyzing/start
“success” | “failure”
/batch/analyzing/status
“in progress” | “ready”
/batch/analyzing/retriev
e
compressed package
periodically
measurement
analysis
Measuring metadata quality. Further steps
20
★Translate the results into
documentation,
recommendations
★Communication with data
providers
★Human evaluation of metadata
quality
★Cooperation with other projects
★Incorporating into ingestion
process
★Shape Constraint Language
(SHACL) for defining patterns
★Process usage statistics
★Measuring changes of scores
★Machine learning based
classification & clustering
human analysis technical
Measuring metadata quality. Links
21
★Europeana Data Quality Committee // http://pro.europeana.eu/europeana-
tech/data-quality-committee
★DLF Metadata Assessment group // http://dlfmetadataassessment.github.io
★Zotero group // https://www.zotero.org/groups/metadata_assessment
★Metadata 2020 // http://www.metadata2020.org/
★Research Data Alliance, Reseach Data Provenance WG // https://rd-
alliance.org/groups/research-data-provenance.html

Weitere ähnliche Inhalte

Ähnlich wie Measuring Metadata Quality (ELAG, 2018)

Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Péter Király
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In PracticeMarcia Zeng
 
Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Péter Király
 
Enterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsEnterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsBoris Otto
 
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceExtending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceKai Eckert
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Péter Király
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in productionStepan Pushkarev
 
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...HostedbyConfluent
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningSanghamitra Deb
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfData Science Council of America
 
[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...Paul Lo
 
Benchmarking Commercial RDF Stores with Publications Office Dataset
Benchmarking Commercial RDF Stores with Publications Office DatasetBenchmarking Commercial RDF Stores with Publications Office Dataset
Benchmarking Commercial RDF Stores with Publications Office DatasetGhislain Atemezing
 
Understanding voice of the member via text mining
Understanding voice of the member via text miningUnderstanding voice of the member via text mining
Understanding voice of the member via text miningChi-Yi Kuan
 
Data quality in cultural heritage (meta)data
Data quality in cultural heritage (meta)dataData quality in cultural heritage (meta)data
Data quality in cultural heritage (meta)dataValentine Charles
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015Sebastian Hellmann
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Péter Király
 

Ähnlich wie Measuring Metadata Quality (ELAG, 2018) (20)

Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
Konrad cedem praesi
Konrad cedem praesiKonrad cedem praesi
Konrad cedem praesi
 
Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)
 
Enterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and OptionsEnterprise Master Data Architecture: Design Decisions and Options
Enterprise Master Data Architecture: Design Decisions and Options
 
Extending DCAM for Metadata Provenance
Extending DCAM for Metadata ProvenanceExtending DCAM for Metadata Provenance
Extending DCAM for Metadata Provenance
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
 
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
 
Multi-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learningMulti-modal sources for predictive modeling using deep learning
Multi-modal sources for predictive modeling using deep learning
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...[PythonPH] Transforming the call center with Text mining and Deep learning (C...
[PythonPH] Transforming the call center with Text mining and Deep learning (C...
 
Benchmarking Commercial RDF Stores with Publications Office Dataset
Benchmarking Commercial RDF Stores with Publications Office DatasetBenchmarking Commercial RDF Stores with Publications Office Dataset
Benchmarking Commercial RDF Stores with Publications Office Dataset
 
Understanding voice of the member via text mining
Understanding voice of the member via text miningUnderstanding voice of the member via text mining
Understanding voice of the member via text mining
 
Data quality in cultural heritage (meta)data
Data quality in cultural heritage (meta)dataData quality in cultural heritage (meta)data
Data quality in cultural heritage (meta)data
 
Lider Reference Model ld4lt session March, 3rd, 2015
Lider Reference Model ld4lt session  March, 3rd, 2015Lider Reference Model ld4lt session  March, 3rd, 2015
Lider Reference Model ld4lt session March, 3rd, 2015
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
 

Mehr von Péter Király

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Péter Király
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Péter Király
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Péter Király
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)Péter Király
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Péter Király
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Péter Király
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Péter Király
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Péter Király
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Péter Király
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)Péter Király
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)Péter Király
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Péter Király
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Péter Király
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Péter Király
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Péter Király
 
Measuring MARC (ELAG 2018)
Measuring MARC (ELAG 2018)Measuring MARC (ELAG 2018)
Measuring MARC (ELAG 2018)Péter Király
 
SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)Péter Király
 
Measuring completeness as metadata quality metric in Europeana (DH 2017)
Measuring completeness as metadata quality metric in Europeana (DH 2017)Measuring completeness as metadata quality metric in Europeana (DH 2017)
Measuring completeness as metadata quality metric in Europeana (DH 2017)Péter Király
 
Stiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of MetadataStiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of MetadataPéter Király
 

Mehr von Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Measuring MARC (ELAG 2018)
Measuring MARC (ELAG 2018)Measuring MARC (ELAG 2018)
Measuring MARC (ELAG 2018)
 
SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)SHACL shortly (ELAG 2018)
SHACL shortly (ELAG 2018)
 
Measuring completeness as metadata quality metric in Europeana (DH 2017)
Measuring completeness as metadata quality metric in Europeana (DH 2017)Measuring completeness as metadata quality metric in Europeana (DH 2017)
Measuring completeness as metadata quality metric in Europeana (DH 2017)
 
Stiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of MetadataStiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of Metadata
 

Kürzlich hochgeladen

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 

Kürzlich hochgeladen (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 

Measuring Metadata Quality (ELAG, 2018)

  • 1. Measuring Metadata Quality ELAG 2018 Péter Király, peter.kiraly@gwdg.de Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
  • 2. Measuring metadata quality. Generic title and bad thumbnail 2 more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
  • 3. Measuring metadata quality. Multilinguality problem 3 ★ Mona Lisa → 456 results ★ La Gioconda → 365 results ★ La Joconde → 71 results http://www.europeana.eu/portal/en/record/90402/RP_F_00_351.html
  • 4. Measuring metadata quality. Problems with title 4 more examples in Report and Recommendations from the Task Force on Metadata Quality (2015) title: "VOETBAL-EREDIVISIE- FEYENOORD - GO AHEAD 3-1", description: "VOETBAL-EREDIVISIE- FEYENOORD - GO AHEAD 3-1" Same title and description title: "NLD-820630-AMSTERDAM: Straatmuzikanten proberen geld te verdienen voor...", Machine-readable ID in title title: "+++EMPTY+++" Leftover
  • 5. Measuring metadata quality. Non-informative values 5 non informative dc:title: “photograph, framed”, “group photograph” “photograph” informative dc:title: “Photograph of Sir Dugald Clerk”, “Photograph of "Puffing Billy"” bad good
  • 6. Measuring metadata quality. Copy & paste cataloging 6 from a template? more examples in Report and Recommendations from the Task Force on Metadata Quality (2015)
  • 7. Measuring metadata quality. The problem 7 there are “good” and “bad” metadata records but we don’t have clear metrics like this: functional requirements good acceptable bad
  • 8. Measuring metadata quality. Why data quality is important? 8 “Fitness for purpose” (QA principle) purpose: to access content no metadata no access to data no data usage more explanation: Data on the Web Best Practices W3C Working Draft, https://www.w3.org/TR/dwbp/
  • 9. Measuring metadata quality. Hypothesis 9 by measuring structural elements we can approximate metadata record quality ≃ metadata smell
  • 10. Measuring metadata quality. Purposes 10 ★improve the metadata ★services: good data → reliable functions ★better metadata schema & documentation ★propagate “good practice”
  • 11. Measuring metadata quality. Organisational approach 11 Europeana Data Quality Committee, DLF Metadata Assessment WG, Metadata 2020, Research Data Alliance ★ Analysing/revising metadata schema ★ Functional requirement analysis ★ Problem catalog ★ Multilinguality
  • 12. Measuring metadata quality. Community bibliography 12 zotero.org/groups/metadata_assessment dlfmetadataassessment.github.io
  • 13. Measuring metadata quality. Tooling 13 Catmandu, UNT digital library, SHACL, ShEx, Luzzu, Metadata Quality Assurance Framework generic tools for measuring (meta)data quality ★ adaptable to different metadata schemes ★ scalable (to Big Data) ★ understandable reports for data curators ★ open source
  • 14. Measuring metadata quality. What to measure? 14 ★Structural and semantic features Completeness, cardinality, uniqueness, length, dictionary entry, data type conformance, multilinguality (generic metrics) ★Functional requirement analysis / Discovery scenarios Requirements of the most important functions ★Problem catalog Known metadata problems
  • 15. Measuring metadata quality. Metadata requirements / User scenario 15 “As a user I want to be able to filter by whether a person is the subject of a book, or its author, engraver, printer etc.” Metadata analysis Description of relevant metadata elements and their rules Measurement rules ★ the relevant field values should be resolvable URI ★ each URI should be associated with labels in multiple languages
  • 16. Measuring metadata quality. Metadata requirements / element—function map 16 Europeana sub-dimensions MARC Summary of Mapping to User Tasks
  • 17. Measuring metadata quality. Measurement - Distinct Languages 17 Text w/o language annotation (dc.subject: Germany): Text w language annotation (dc.subject: Germany@en) Text w several language annotations (dc.subject: Germany@en, Deutschland@de) Link to (multilingual) vocabulary (http://www.geonames.org /2921044/federal-republic-of-germany) 0 1 2 n
  • 18. Measuring metadata quality. Measurement - Good example 18 dc:description dc:title Place/skos:prefLabel Descriptive fields Subject headings "Brandenburger Tor"@de "Brandenburg Gate"@en "Grenzübergang Potsdamer Platz"@de "Postdamer Platz border crossing"@en "Reichstag"@de "Reichstag building"@en "Die Mauer muß weg!"@de "Die Mauer muß weg! (The Wall must go!)"@en "Kommentiertes Fotorama mit Bildern von 1989-1990 in Berlin"@de "Annotated images from 1989- 1990 in Berlin"@en
  • 19. Measuring metadata quality. Engineering - Batch API 19 client Metadata QA /batch/measuring/start sessionID /batch/[recordId] csv for each records /batch/measuring/stop “success” | “failure” /batch/analyzing/start “success” | “failure” /batch/analyzing/status “in progress” | “ready” /batch/analyzing/retriev e compressed package periodically measurement analysis
  • 20. Measuring metadata quality. Further steps 20 ★Translate the results into documentation, recommendations ★Communication with data providers ★Human evaluation of metadata quality ★Cooperation with other projects ★Incorporating into ingestion process ★Shape Constraint Language (SHACL) for defining patterns ★Process usage statistics ★Measuring changes of scores ★Machine learning based classification & clustering human analysis technical
  • 21. Measuring metadata quality. Links 21 ★Europeana Data Quality Committee // http://pro.europeana.eu/europeana- tech/data-quality-committee ★DLF Metadata Assessment group // http://dlfmetadataassessment.github.io ★Zotero group // https://www.zotero.org/groups/metadata_assessment ★Metadata 2020 // http://www.metadata2020.org/ ★Research Data Alliance, Reseach Data Provenance WG // https://rd- alliance.org/groups/research-data-provenance.html