Handwritten Text Recognition for manuscripts and early printed texts
1:1 Principle Violations
1. Principle Violations
Revisiting the Dublin Core 1:1 Principle
Richard J. Urban rjurban@illinois.edu http://www.richardurban.net
The Problem Pilot Study
The 1:1 Principle
In general, Dublin Core metadata describes one
manifestation or version of a resource, rather
than assuming that manifestations stand in for
one another. For instance, a jpeg image of the
Mona Lisa has much in common with the
original painting, but it is not the same as the
painting. As such, the digital image should be
described as itself, most likely with the creator
of the original image included as a Creator or
Contributor rather than just the painter of the
original Mona Lisa. The relationship between
the metadata for the original and the
reproduction is part of the metadata description,
and assists the user in determining whether
his/her need can be met by a reproduction
(Hillmann, 2003)
Although Dublin Core (DC) metadata emerged
from the need to describe "document-like objects"
on the World Wide Web in the mid-1990s, libraries,
archives and museums soon adopted it to share
information about hidden cultural heritage
collections. In response to concerns from this
community about distinguishing between records
describing "originals" and records describing
"reproductions," DCMI introduced the 1:1 Principle:
"each resource should have a discrete
metadata description and each description
should include elements describing a single
resource" (Weibel and Hakala, 1997)
However, metadata creators indicate that the 1:1
Principle causes "a great deal of confusion" in
practice (Park & Childress, 2009). Even when the
Principle is understood, software for metadata
creation lacks affordances for creating compliant
records (Miller, 2010). Studies find that records
frequently describe both physical and digital
resources and are "particularly problematic" in
large-scale metadata aggregations (Shreeves et
al., 2005; Han et al., 2009; Hutt & Riley, 2005).
Multiple accounts of the Principle, such as the
description provided by Using Dublin Core (below)
contribute to confusions about what the Principle is
about.
While these accounts of the 1:1 Principle may provide
guidance for metadata creators, additional rules are
needed to understand how particular records "violate"
the Principle. This pilot study explores techniques to
identify records that describe different classes of
resources.
Data Collection
IMLS Digital Collections & Content Project
25 collections
55,000 item-level OAI-PMH XML records
Data Analysis
Using the SIMILE Gadget (http://simile.mit.edu/wiki/Gadget)
XML data explorer, overviews of Dublin Core
properties and the frequency of unique values were
generated for each collection. Each statement was
assigned to a class of resources:
Using the statement classifications, each collection
was classified according to three categories:
Non-violating collections: records conformed to
the 1:1 Principle.
Violating collections: records included statements
about both physical and digital resources
Non-violating violations: Records described
physical resources, but identified digital resources.
Results
n=25
Digital�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Acknowledgments
Portions of this research was supported by a 2007 IMLS National Leadership
Research and Demonstration Grant (LG-06-07-0020-07) hosted by the
GSLIS Center for Informatics Research in Science and Scholarship (CIRSS),
Dr. Carole L. Palmer, Principal Investigator
:
PhysicalResources: resources
described by format values for
physical mediums and extents.
DigitalResources: resources
described by format values about
file formats and extents.
What is the 1:1 Principle, really?
Ongoing Research
n:1 Principle, DCAM & OAI-PMH XML
Although the Dublin Core Abstract Model
(DCAM) embodies the 1:1 Principle and may
help prevent errors, it does not directly help
identify violations in legacy OAI-PMH XML
that may include implicit description sets.
Nor does DCAM's generalized resources
("anything that can be identified") help
systematically recognize records that
describe more than one resource.
1:1 Principle & Bibliographic Relationships
If the concern of cultural heritage institutions
is about "originals", "reproductions" or
"surrogates," different kinds of bibliographic
relationships need to be considered. For
example, the museum community would not
classify the the relationship between a jpeg
and the Mona Lisa, as an Equivalence
Relationship that involves related FRBR
Manifestations. Rather, surrogate resources
may stand in Derivative or Descriptive
relationships involving FRBR Expressions or
FRBR Works. Unfortunately, "the problem of
defining reproductions in relationship to
originals has proven elusive through all of the
cataloging codes of the 20th
Century" (Knowlton, 2009)
Ongoing work will provide a conceptual
definition of a 1:1 Principle that reflects the
concerns of cultural heritage repositories and
is grounded in contemporary theories of the
bibliographic universe.
Identifying 1:1 Principle Violations
A conceptual definition will inform the
development of rules and techniques that
identify records that violate the 1:1 Principle.
Ongoing work will adapt the Getty Art &
Architecture Thesaurus to identify distinct
manifestation classes. Additional violation
categories based on other relationships or FRBR
Group 1 Entities will also be explored. (i.e. is it
possible to identify DC records that describe
more than one FRBR Expression or FRBR
Work?)
Violation identification techniques will be applied
to 148,000 item-level OAI-PMH records from the
IMLS DCC Opening History aggregation in order
to identify patterns of 1:1 Principle violations.
(http:/imlsdcc.grainger.illinois.edu/history).
Bibliography
Hillmann, D. (2003, August 26). Using Dublin Core. Dublin Core Metadata Initiative. Retrieved from http://dublincore.org/documents/2003/08/26/usageguide/
Hutt, A., & Riley, J. (2005). Semantics and syntax of dublin core usage in open archives initiative data providers of cultural heritage materials. In Proceedings
of the 5th ACM/IEEE-CS joint conference on Digital libraries (p. 270).
Knowlton, S. A. (2009). How the current draft of RDA addresses the cataloging of reproductions, facsimiles, and microforms. Library Resources and
Technical Services, 53(3), 159–165.
Miller, S. (2010). The One-To-One Principle: Challenges in Current Practice. International Conference On Dublin Core And Metadata Applications. Retrieved
October 23, 2010, from http://dcpapers.dublincore.org/ojs/pubs/article/view/1043
Park, J., & Childress, E. (2009). Dublin Core metadata semantics: An analysis of the perspectives of information professionals. Journal of Information
Science, XX(X), 1-13.
Powell, A., Nilsson, M., Naeve, A., Johnston, P., & Baker, T. (2007). DCMI Abstract Model. Dublin Core Metadata Initiative. Retrieved from http://dublincore.org/
documents/abstract-model/
Shreeves, S. L., Knutson, E. M., Stvilia, B., Palmer, C. L., Twidale, M. B., & Cole, T. W. (2005). Is “Quality” Metadata “Shareable” Metadata? The Implications
of Local Metadata Practices for Federated Collections. In Currents and convergence: navigating the rivers of change: proceedings of the Twelfth National
Conference of the Association of College and Research Libraries April 7-10, 2005, Minneapolis, Minnesota (p. 223).
Tillett, B. (2001). Bibliographic Relationships. In C. Bean & R. Green (Eds.), Relationships in the organization of knowledge. Boston: Kluwer Academic
Publishers
Weibel, S., & Hakala, J. (1998, February). DC-5: The Helsinki Metadata Workshop; A Report on the Workshop and Subsequent Developments. D-Lib
Magazine. Retrieved from http://www.dlib.org/dlib/february98/02weibel.html