The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Digital Object Identifiers for EOSDIS data
1. Digital Object Identifiers for
EOSDIS data
HDF Workshop
April 17, 2012
John Moses, ESDIS
John.f.moses@nasa.gov
2. Assessment of identification schemes
Study by ESIP Cluster on Preservation and Stewardship in 2009
Unique
Identifier
ID Scheme
Data
Set
Unique
Locator
Item
Data
Set
Citable
Locator
Item
Data
Set
Scientifically
Unique ID
Item
Data
Set
Item
URL/N/I
PURL
XRI
Handle
DOI
ARK
LSID
OID
UUID
Adapted from Duerr, R. E., et al.. 2011 (submitted). On the utility of identification schemes for digital Earth
science data: An assessment and recommendations. Earth Science Informatics.
2
2
3. Digital Object Identifier for EOS products
• The DOI® system and the Handle System provide an Internet
resolution service for unique and persistent identifiers of
digital objects
– Internet Infrastructure components owned by International DOI Foundation
(IDF)– www.doi.org
• A DOI consists of two part alphanumeric string
– doi:[prefix]/[suffix]; for example doi: 10.5067/123;
– Prefix 10 identifies the DOI registry; 5067 identifies the Registrant Agent
– Suffix alphanumeric string 123 uniquely identifies the data item
• The purpose in assigning DOIs to EOSDIS products is to
provide a permanent data identifier for citation in
publications
– ESIP citation guideline using doi:
– Doe, J. and R. Roe. 2001. The FOO Data Set. Version 2.3. The FOO Data Center.
http://dx.doi.org/10.xxxx/notfoo.547983. Accessed 1 May 2011.
3
4. Implementing DOIs for EOSDIS
– Develop ops concept through pilot processes
• Guidelines for DOI suffix, location & citation information.
• Request, assign, monitor DOIs, location & citation metadata
• Add DOIs to DAAC product citation web pages
• Imbed DOIs into product metadata at next reprocessing
– HIRDLS, GLAS, AMSR-E data providers are in final
reprocessing
• Add DOIs to GCMD and ECHO through metadata updates
• Add DOI metadata to NTRS for searchable documentation
• Setup metrics collection from journal citation reports
4
6. Attributes for embedding DOIs
• Framework structures in HDF and netCDF
– HDF global attribute name and value verses naming an identifier
group (which would allow discovery of identifier types)
– ECS CoreMetadata Product Specific Attributes in the
AdditionalAttributes group section
– netCDF file-level attribute name: “Id” and “naming authority”
• Consider attribute names for DOI value:
– Advantage to having two parts – a key code to indicated this is an
identifier, and namespace that indicates the type/application of DOI;
e.g., that it applies to the data product level (i.e., has same value for
all granules/files of the series – a series identifier).
• Hypothetical DOI example
– Attribute name: identifier_product_DOI
– Attribute value: 10.5067/Aura/HIRDLS/data1
6
8. DOI Examples for Pilot Projects
Suffix Model String
[mission]/[instrument]/data[1-n]
Example
doi: 10.5067/Aura/HIRDLS/data1234
doi: 10.5067/ICESat/GLAS/data1234
doi: 10.5067/Aqua/AMSR-E/data1234
[campaign]/[measurement group]/data[1-n] doi: 10.5067/BOREAS/Airborne/data1234
[campaign]/[platform group]/data[1-n]
[program]/[measurement group]/data[1-n]
[measurement group]/[data[1-n]
doi: 10.
5067/MEaSUREs/OceanFluxes/data1234
Doi:
10:5067/MEaSUREs/SnowExtent/data1234
8
9. DOI Registration and Guidelines
• A DOI will be assigned for each EOSDIS standard data
products
• The DOI subscription holder (ESDIS) will provide location &
citation metadata to DOI subscription provider (CDL EZID) and
will be notified when the DOI has been registered
– Ideally we want one DOI per data item but the registry
does not preclude multiple registrations of similar data
• New DOI metadata can be uploaded as frequently as desired
– Typically when location or citation information changes
• A major new version of the data product would be assigned a
new DOI. DOIs of old versions that are no longer available
would have updated locators that point to the new version
(with explanation)
9
10. Guidelines for DOI suffix
• The DOI itself should be a relatively short string so that users
can read from printed material or display and key into a
browser with minimum error.
• The DOI suffix (ASCI characters with no spaces):
– Would be a descriptive name of domain-specific structure that reflects
the science data product contents
– Should have some recognition by the research community, such as a
semantic name or acronym, e.g.,
instrument/platform/campaign/investigation name or measurement
parameter
– Should help readers distinguish between published paper and dataset
– Should not have organizational reference subject to change (i.e.,
publisher, archive, owner)
10
11. Member Institute using DataCite (RA):
California Digital Library and EZID
• EZID is a service providing researchers a way to manage identifiers
persistently for datasets, files, and resources of all types.
• The service is available via a machine to machine programming
interface (an API) and as a web user interface.
• Core functions:
– Create a persistent identifier: DOI
– Add object location (URL landing page, separate from citation)
– Add citation metadata (DataCite repository, mandatory shown below)
•
•
•
•
Creator (person or organization)
Title (long name of dataset)
Publisher (holder of the data – organization making it available)
Publication Year (year when data was, or will be first available)
– Update object location
– Update object metadata
11
13. Registration Agent: DataCite
•
•
•
•
DataCite, established a scientific data
application with IDF.
Service is run by open membership
organization of gov and edu libraries.
Focused on improving the scholarly
infrastructure around datasets.
Most appropriate RA because of their focus
on working with data centers to assign
persistent identifiers to datasets leveraging
the Digital Object Identifier (DOI)
infrastructure.
United States Member Institutes
– California Digital Library (Founding Member)
•
TIB: German
National Library of
Science and
Technology
Recommended subscription provider because of
bulk pricing and EZID Web/API services
– Office of Scientific and Technical Information, US
Department of Energy ( new Member Dec 2010)
– Purdue University Libraries (Member)
– Interuniversity Consortium for Political and
Social Research - ICPSR (Associate Member)
– Microsoft Research (Associate Member)
13
Hinweis der Redaktion
Debate in LSID community weakens it. - - an LSID is a locator; but also the ObjectID part of it is an Identifier and most people use a UUID for the ObjectID part of it
OID problem
ARK is a bit better than the rest of the locators because it has additional trust value ... maybe the color should have been more orange than yellow but I didn’t want to add more colors.
Working with California Digital Library (Joan Starr), Dept Of Energy (Sharon Jordan), and NASA Scientific & Technical Information (Gerald Steeman)
Metadata – descriptions that assign meaning to the data product.
DOI added by DAAC to product metadata sent to GCMD and ECHO
DOI embedded into product metadata by science product generation system in next reprocessing campaign
RE: NSIDC chart Metadata Evolution for NASA Data Systems (MENDS)
Metadata for some instrument-products is entirely contained in sections of the product data files.
Extensions must be kept in separate files and linked to the product files.
Provenance collection (e.g., DOI) could be accomplished at various places and times – TBD support for provenance services.
DOI could be inserted into granules, collection or granule level metadata, added to technical documentation in NTRS to allow DOI-based queries.
The suffix is an alpha-numeric string and has no special significance to the DOI system other than uniqueness and permanence.
Founding Members: the British Library; the Technical Information Center of Denmark; TU Delft Library; the National Research Council’s Canada Institute for Scientific and Technical Information (NRC-CISTI); California Digital Library; Purdue University; and the German National Library of Science and Technology.
Membership:
DataCite has two levels of participation: full membership and associate membership. Full membership is geared towards national libraries and data centers, while associate membership is open to a broader group of organizations who support the aims and interests of DataCite.
Managing Agent: TIB defers to local DataCite Member who provides access to DataCite service for minting DOIs.