Open Archives Initiative for Sheet Music: Data Mapping

•Als PPT, PDF herunterladen•

1 gefällt mir•461 views

This document discusses the data mapping required for the OAI Sheet Music Harvester project. Data mapping was necessary because OAI requires unqualified Dublin Core, while contributed data used different formats and definitions. Mapping addressed inconsistencies between MARC, EAD, Dublin Core and local formats used by partner institutions. Issues included field formatting, creator/contributor distinctions, and date/subject standards. Outstanding issues concerned authority control, robust data formats, and improving participation. The document outlines the mapping process and challenges of integrating diverse legacy metadata into a single discovery interface.

Bildung

Data Mapping: OAI Sheet
Music Harvester
Jenn Riley
Digital Media Specialist
Indiana University Digital Library
Program

Why was data mapping required?







OAI requires unqualified Dublin Core
Contributed data only needed to
support resource discovery
Dublin Core field definitions need
interpretation
For efficient searching, data from
different institutions must be consistent

Dublin Core fields









Title
Creator
Subject
Description
Publisher
Contributor
Date
Type









Format
Identifier
Source
Language
Relation
Coverage
Rights

Limitations to Dublin Core







Heavily slanted towards electronic
resources
No content standards enforced
Without qualifiers, fields not granular
enough for sheet music needs
Field definitions open to interpretation

Existing metadata formats





MARC
Encoded Archival Description (EAD)
Dublin Core (DC)
Local custom formats

MARC


Library of Congress






some records in AACR2 MARC
many records in non-AACR2 MARC
already had data mapped “based on”
MARC to Dublin Core crosswalk
not able to alter their mapping for
participation in sheet music project

EAD


Duke – item level finding aid






records weren’t contributed for phase 1
very robust and specific
conversion was relatively simple because
data was converted to EAD from
collection-specific database
included virtually all information in EAD
documents to DC records

Dublin Core


UCLA – 4 types of DC records


songs


sheet music






covers et al

recordings

mapping basically only required
inheritance of songs and sheet music
data elements down to the covers level

Local custom formats (1)









Johns Hopkins - Simple DTD
publication (location,
publisher, date)
subject
call num (box, item)
title
composer/lyricist/
arranger
form of composition
instrumentation










first line
first line of chorus
performer
dedicatee
engraver/lithographer/
artist
advertisement
plate num
duplication

Local custom formats (2)









Indiana – simple database
title
composer
lyricist
place of publication
publisher
copyright
first line








first line of chorus
subject
form of composition
performance
medium
copies
call #

Data inconsistencies




Different depths of description
Different levels of authority control
No common subject vocabulary
between collections

Some mapping issues









Field formatting important, not just contents
Choices heavily influenced by LC practice
Can’t force institutions to comply
Sheet music has many alternative titles
Creator vs. contributor
Plate numbers: they’re important, where to
put and how to label?
Uncertain dates and date ranges

Outstanding issues







Authority control for names
Date formats
Data clean-up: what can be done at
harvester end and what must we ask data
providers to do?
What will more robust data format look like?
How do we make it easier for more
institutions to participate?

More information






Harvester site (still in development):
http://digital.library.ucla.edu/sheetmusic/
Jenn Riley, Indiana University Digital
Library Program: jenlrile@indiana.edu
These presentation slides:
http://www.dlib.indiana.edu/~jenlrile/rbms2003/

Weitere ähnliche Inhalte

Ähnlich wie Open Archives Initiative for Sheet Music: Data Mapping

RDA (Resource Description & Access)Jennifer Joyner

Metadata for Music: Understanding the LandscapeJenn Riley

Tools Of Our TradeFarrukhshahzad

The tools of our trade: AACR2/RDA and MARCAnn Chapman

Notating pop musicxjkoboe

Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup: AACR to R...National Information Standards Organization (NISO)

NCompass Live: Cataloging with RDANebraska Library Commission

Cataloging with RDA: An OverviewEmily Nimsakont

Tools of our Trade (RDA, MARC21) 2010-03-15Ann Chapman

Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Pr...Jenn Riley

Smart Interfaces through Domain Knowledge: Facets, Metadata Displays, Analysi...Charleston Conference

Intro to rdaAnna Enos

Aplicații Web Semantice - Descriere ProiectVlad Posea

Music Therapy Bi Fall 2005New England chapter of Music Library Association

RDA State of the UnionJohn Baga

April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersNational Information Standards Organization (NISO)

RDA for Original CatalogersShana McDanold

RDA from Scratch for CatalogersShana McDanold

RDA Presentationjendibbern

Ontology based metadata schema for digital library projects in ChinaAIMS (Agricultural Information Management Standards)

Ähnlich wie Open Archives Initiative for Sheet Music: Data Mapping (20)

RDA (Resource Description & Access)

Metadata for Music: Understanding the Landscape

Tools Of Our Trade

The tools of our trade: AACR2/RDA and MARC

Notating pop music

Tillett, Hillmann, and Moen, "Bibliographic Control Alphabet Soup: AACR to R...

NCompass Live: Cataloging with RDA

Cataloging with RDA: An Overview

Tools of our Trade (RDA, MARC21) 2010-03-15

Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Pr...

Smart Interfaces through Domain Knowledge: Facets, Metadata Displays, Analysi...

Intro to rda

Aplicații Web Semantice - Descriere Proiect

Music Therapy Bi Fall 2005

RDA State of the Union

April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters

RDA for Original Catalogers

RDA from Scratch for Catalogers

RDA Presentation

Ontology based metadata schema for digital library projects in China

Mehr von Jenn Riley

Understanding Metadata: Looking ForwardJenn Riley

The future of cataloguing? Future cataloguers!Jenn Riley

Discovery elsewhereJenn Riley

Designing the Garden: Getting Grounded in Linked DataJenn Riley

Launching metaware.buzzJenn Riley

Getting Comfortable with Metadata ReuseJenn Riley

Handout for Digital Imaging of PhotographsJenn Riley

Digital Imaging of PhotographsJenn Riley

Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...Jenn Riley

Handout for FRBR; or, How I learned to stop worrying and love the modelJenn Riley

Metadata for Brittle Books Page TurnerJenn Riley

Digitizing and Delivering Audio and VideoJenn Riley

Variations2Jenn Riley

Handout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSJenn Riley

Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSJenn Riley

Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...Jenn Riley

Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...Jenn Riley

Challenges in the Nursery: Linking a Finding Aid with Online ContentJenn Riley

Making Interoperability Easier: Creating Shareable MetadataJenn Riley

Tagging and User-Contributed MetadataJenn Riley

Mehr von Jenn Riley (20)

Understanding Metadata: Looking Forward

The future of cataloguing? Future cataloguers!

Discovery elsewhere

Designing the Garden: Getting Grounded in Linked Data

Launching metaware.buzz

Getting Comfortable with Metadata Reuse

Handout for Digital Imaging of Photographs

Digital Imaging of Photographs

Cushman Exposed! Exploiting Controlled Vocabularies to Enhance Browsing and S...

Handout for FRBR; or, How I learned to stop worrying and love the model

Metadata for Brittle Books Page Turner

Digitizing and Delivering Audio and Video

Variations2

Handout for Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS

Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS

Handout for Merging Metadata from Multiple Traditions: IN Harmony Sheet Music...

Merging Metadata from Multiple Traditions: IN Harmony Sheet Music from Librar...

Challenges in the Nursery: Linking a Finding Aid with Online Content

Making Interoperability Easier: Creating Shareable Metadata

Tagging and User-Contributed Metadata

Kürzlich hochgeladen

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC

FSB Advising Checklist - Orientation 2024Elizabeth Walsh

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection

Accessible Digital Futures project (20/03/2024)Jisc

Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand

SOC 101 Demonstration of Learning Presentationcamerronhm

Making communications land - Are they received and understood as intended? we...Association for Project Management

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136

Towards a code of practice for AI in AT.pptxJisc

Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi

How to Manage Global Discount in Odoo 17 POSCeline George

How to Give a Domain for a Field in Odoo 17Celine George

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade

Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417

Application orientated numerical on hev.pptRamjanShidvankar

Introduction to Nonprofit Accounting: The BasicsTechSoup

ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43

Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith

The basics of sentences session 3pptx.pptxheathfieldcps1

Kürzlich hochgeladen (20)

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

FSB Advising Checklist - Orientation 2024

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

Accessible Digital Futures project (20/03/2024)

Google Gemini An AI Revolution in Education.pptx

SOC 101 Demonstration of Learning Presentation

Making communications land - Are they received and understood as intended? we...

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...

Towards a code of practice for AI in AT.pptx

Spellings Wk 3 English CAPS CARES Please Practise

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf

How to Manage Global Discount in Odoo 17 POS

How to Give a Domain for a Field in Odoo 17

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx

Unit-IV; Professional Sales Representative (PSR).pptx

Application orientated numerical on hev.ppt

Introduction to Nonprofit Accounting: The Basics

ComPTIA Overview | Comptia Security+ Book SY0-701

Fostering Friendships - Enhancing Social Bonds in the Classroom

The basics of sentences session 3pptx.pptx

Open Archives Initiative for Sheet Music: Data Mapping

1. Data Mapping: OAI Sheet Music Harvester Jenn Riley Digital Media Specialist Indiana University Digital Library Program

2. Why was data mapping required?     OAI requires unqualified Dublin Core Contributed data only needed to support resource discovery Dublin Core field definitions need interpretation For efficient searching, data from different institutions must be consistent

3. Mapping inconsistencies

4. Dublin Core fields         Title Creator Subject Description Publisher Contributor Date Type        Format Identifier Source Language Relation Coverage Rights

5. Limitations to Dublin Core     Heavily slanted towards electronic resources No content standards enforced Without qualifiers, fields not granular enough for sheet music needs Field definitions open to interpretation

6. Existing metadata formats     MARC Encoded Archival Description (EAD) Dublin Core (DC) Local custom formats

7. MARC  Library of Congress     some records in AACR2 MARC many records in non-AACR2 MARC already had data mapped “based on” MARC to Dublin Core crosswalk not able to alter their mapping for participation in sheet music project

8. EAD  Duke – item level finding aid     records weren’t contributed for phase 1 very robust and specific conversion was relatively simple because data was converted to EAD from collection-specific database included virtually all information in EAD documents to DC records

9. Dublin Core  UCLA – 4 types of DC records  songs  sheet music    covers et al recordings mapping basically only required inheritance of songs and sheet music data elements down to the covers level

10. Local custom formats (1)         Johns Hopkins - Simple DTD publication (location, publisher, date) subject call num (box, item) title composer/lyricist/ arranger form of composition instrumentation         first line first line of chorus performer dedicatee engraver/lithographer/ artist advertisement plate num duplication

11. Local custom formats (2)         Indiana – simple database title composer lyricist place of publication publisher copyright first line       first line of chorus subject form of composition performance medium copies call #

12. Data inconsistencies    Different depths of description Different levels of authority control No common subject vocabulary between collections

13. Some mapping issues        Field formatting important, not just contents Choices heavily influenced by LC practice Can’t force institutions to comply Sheet music has many alternative titles Creator vs. contributor Plate numbers: they’re important, where to put and how to label? Uncertain dates and date ranges

14. Outstanding issues      Authority control for names Date formats Data clean-up: what can be done at harvester end and what must we ask data providers to do? What will more robust data format look like? How do we make it easier for more institutions to participate?

15. More information    Harvester site (still in development): http://digital.library.ucla.edu/sheetmusic/ Jenn Riley, Indiana University Digital Library Program: jenlrile@indiana.edu These presentation slides: http://www.dlib.indiana.edu/~jenlrile/rbms2003/

Hinweis der Redaktion

Unqualified DC required, but more robust formats also allowed. More on this later. Since the purpose of the harvester is discovery of resources, and a user is taken out of the harvester to view items at individual institutions, it was not necessary to force all of the information from each institution’s records into DC. We needed to define what was required for discovery only, not figure out how to squeeze every marc field into dc. The name “Dublin Core” reveals something about its purpose. It was designed to be a core set of metadata elements applicable to all types of resources. Thus it’s meant to be flexible, with a low entry barrier. This means the definitions of fields are open to wide interpretations. We needed to develop a single interpretation that all contributors followed to make searching and browsing more effective.
Search on Michigan’s OAIster for Einstein and format=image. Note rec. 1, 2 forms of name in author/creator, subject is a description. Note rec. 2, type=image, but very little indicates it’s a photo, weird note text, subjects stuck together in a single string
These are the 15 Dublin Core fields. None are required, all are repeatable. You’ll notice they’re pretty basic. That’s because they’re supposed to be “core.” Many of them are obvious inclusions – title, creator, description, publisher. Others are not so obvious from their names. “Type” is meant to be used to indicate a general category to which the resource belongs. Some suggested types are image, physical object, text, collection, and event. “Format” is for describing the physical or digital manifestation of a resource. You would record things like dimensions, duration, software needed in “format.” You’ll also notice that these terms are extremely generic – “creator” for example. Again, they must be this generic because Dublin Core is meant to be useable to describe any type of resource. You may look at this list and think these elements are TOO generic to be really useful. To try and change that, Dublin Core is defined in two types: qualified and unqualified. Unqualified is just using this list of fields, exactly as they are written here. Qualified Dublin Core allows the specification of a refinement of the meaning of the field or the encoding scheme used for the field. The first of these, refining the meaning of the field, would be used, for example, with “creator.” A qualifier could specify what role that creator had in the development of the resource. The second, specifying the encoding scheme, could be used, for example, with the “subject” field, to indicate the name of the controlled vocabulary from which a subject was taken. Unfortunately, OAI requires unqualified Dublin Core to be used, so the harvester couldn’t take advantage (at least in phase 1) of the greater specificity of description provided by qualifiers.
Even though unqualified DC is required for OAI, there are some reasons why it’s not the best metadata format to use for describing and searching sheet music collections. Although it doesn’t specifically limit its scope to online materials, DC was designed in the networked world. Field definitions tend to work better for networked materials. For example, the “format” element description suggests using internet mime types. Many of the DC fields have suggestions for controlled vocabularies to use. Internet mime types for format is one example. For subject, “recommended best practice” is to use terms from a CV, but no specific ones are identified. But DC itself does not require field values to conform to these suggestions. As we saw earlier, using unqualified DC doesn’t offer a great deal of specificity for describing resources. Sheet music has specific needs that DC can’t meet. For example, being able to distinguish between composers, lyricists, and arrangers is important for users of sheet music collections. Despite our efforts to clarify DC field definitions for use with the harvester, it is inevitable that the fields would be used differently by different institutions.
Duke (rare materials emphasis) and IU (little authority control) had records in MARC too, but these weren’t contributed in phase 1 of the project.
Very specific info, for example: subject type (LCSH, AAT, TGM), dedicatee, recordings available
Very specific for sheet music, not good even for other types of printed music.
No authority control
MARC has more fields than custom DBs, but custom DBs have more applicable fields. Many MARC records don’t have relator codes, so we don’t know who’s a composer and who’s a lyricist. Even if each individual collection was under name authority, they still might not interoperate. But the problem was much worse – there wasn’t even agreement on name order! Local subject vocabs in use, so even a complex mapping between LCSH, AAT, TGM wouldn’t solve the problem.

Open Archives Initiative for Sheet Music: Data Mapping

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Open Archives Initiative for Sheet Music: Data Mapping

Ähnlich wie Open Archives Initiative for Sheet Music: Data Mapping (20)

Mehr von Jenn Riley

Mehr von Jenn Riley (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Open Archives Initiative for Sheet Music: Data Mapping

Hinweis der Redaktion