2. Topics
•Why metadata is important
•Types of metadata in HDF-EOS files
•Required metadata
•How metadata is encoded and delivered
HDF-EOS Workshop II
SJSK 2
3. What is Metadata?
•Metadata is information that identifies and
characterizes an information product.
•Sometimes called “data about data”
HDF-EOS Workshop II
SJSK 3
4. Users Need Metadata
•Metadata is needed to answer questions such
as:
- What time and location does this data apply to?
- Why type of instrument and processing produced
the data?
- What other inputs were used to generate the data?
- What QA has been performed on this data?
- Who do I contact if I have questions about this data?
HDF-EOS Workshop II
SJSK 4
5. Metadata is Essential
•Large data archive systems cannot function
without metadata.
•Metadata is used to keep track of such things
as:
-
where the data is
what type of operations are possible on the data
whether there are any access restrictions on the data
how individual data files are logically grouped into
“collections.”
HDF-EOS Workshop II
SJSK 5
6. Key Concepts
•A granule is the smallest aggregation of data
that is independently described and inventoried
by the ECS. A granule consists of 1 or more
physical files.
•A collection is a logical grouping of granules.
•The ECS Data Model allows for:
- “Core” attributes
- “Product-Specific” Attributes (PSAs)
SJSK 6
7. Types of Metadata
•Metadata in HDF files
- stored as global text attributes
•Types of Metadata used in HDF-EOS files:
- Structural Metadata
- Core Metadata (inventory, can include PSAs)
- Archive Metadata (non-searchable, product-specific)
•Collection level metadata
- core and product-specific
HDF-EOS Workshop II
SJSK 7
8. Required Metadata
•Origins of metadata requirements:
- what is required to archive and retrieve files
- what is required to provide search and other
services on data
- what is federally mandated (FGDC)
•There are 287 attributes in the ECS data model
- only a subset are used for any given product
- 101 are applicable at the granule level
HDF-EOS Workshop II
SJSK 8
9. Metadata Coverage
•Science Data that are delivered for archiving in
ECS must meet what is called the Intermediate
level of metadata coverage. This involves as
few as:
- 31 collection level attributes
- 4 granule level attributes
•Compliance at this level is not enforced by the
system.
HDF-EOS Workshop II
SJSK 9
11. Granule-Level Metadata for
Intermediate Coverage
•There are only four granule-level metadata attributes
required:
- ShortName
- VersionID
- SizeMBECSDataGranule
- ProductionDateTime
•ShortName and VersionID are identical to the collectionlevel attributes with these names.
•For granules coming into ECS, SizeMBECSDataGranule
and ProductionDateTime are supplied by the system
upon insertion.
HDF-EOS Workshop II
SJSK 11
12. How is Metadata Supplied?
•Collection-level metadata is carried in an Earth
Science Data Type (ESDT) Descriptor file.
•Granule-level metadata is defined in the
descriptor file and populated using a Metadata
Configuration File (MCF).
•Granule-level metadata is delivered in the HDFEOS granule *or* in a populated MCF
accompanying a non-HDF granule.
•The DAAC where a collection will reside is
responsible for descriptors and ingest routines.
HDF-EOS Workshop II
SJSK 12
13. Metadata Work Flow for External
Data Providers
Data
Provider
Responsibility
Popula t ion
Analy s is
MDWorks
Data Model
MDWorks
Specs
DAAC
c ollec t ion c ore a t t ribut es +
granule
v a lue s c ore
a t t ributf init ions
P S A de es
Data/Docs
t y pe a nd f ormat
c hec k
PSA_Reg
Tools
V a lida t ion
ODL Parser
Descriptor
MCF Build
MCF
O DL
s y nt a x
c he ck
Ta s ks
Validated Desc.
Sc ie nc e
S of t ware
DLL c oding
SDP Toolkit
granule c ore va lues
P S A v a lue s
s t ruc t ura l me t a dat a
Te st & Va lid.
Const ra int s
c he ck s
Data Base
Load File
HDF-EOS file
HDF-EOS Workshop II
I nge s t
S ubs y s t e m
E SDT
I ns ert
DAAC Dat a Arc hiv e
SJSK 13
14. Metadata Resources on the Web
•ECS Metadata Homepage
http://ecsinfo.hitc.com/metadata/metadata.html
•Metadata Works (ESDT Descriptor Tool)
http://et3ws1.HITC.COM/metadata_works/
•EOSDIS Information Architecture
http://spsosun.gsfc.nasa.gov/InfoArch.html
•Federal Geographic Data Committee
http://www.fgdc.gov/
SJSK 14
15. Q&A w/ Experts Panel
•Q: “If you are a new data provider, how do you get your data into an HDF-EOS granule, given
the bewildering array of utilities and tools available? What is the simplest solution for this?”
•A: The recommended solution is to obtain the HCR package, which includes the HDF-EOS and
HDF libraries. For populating the required metadata in the granule, obtain the Metadata/Time
Toolkit_MDT. The steps would be:
1. Write an HCR and use the tools to turn this into a skeletal HDF-EOS granule. (This step is
optional).
2. Use the HDF-EOS library to create a granule. (If starting with a skeletal HDF-EOS file
generated from an HCR then plain HDF calls can be used to insert data into the granule ).
3. Use Toolkit_MDT calls to insert metadata into the granule. This requires generation of an
MCF in ODL. Metadata_Works is available for doing this. As an alternative, a simple HDF call
can be used to attach minimum metadata (in ODL) to an HDF file.
Note: if the data are going to reside in a DAAC, or in an archive that must be interoperable with
ECS, you will need to generate collection-level metadata. Metadata_Works is the recommended
tool for this.
SJSK 15
Hinweis der Redaktion
in short, without metadata, a user of the data is in the dark.
Not all metadata is used in searching. Some metadata is merely informative and will not be used in database queries. This metadata can be viewed to assist data consumers in deciding whether to order data or not.
Metadata is needed to identify a data product once it is archived in the system.
Without metadata, users could never find a file unless they knew the precise ID of the file (like a filename in some systems, or in ECS a UR).
By supplying a rich set of metadata attributes for the data, users will be able to find the data more easily and in a greater variety of routes or search methods.
All textual metadata (i.e. excluding things that are specifically provided for by HDF like scales and units) should be contained in HDF text attributes.
ECS compliant metadata must be written to HDF text attributes with specific names, and may span multiple attributes, numbered sequentially, to accommodate all metadata.
This metadata must also be written in ODL, or Object Description Language.
These tasks are best handled by using the SDP Toolkit.
Collection level metadata is delivered separately from the granules and will be discussed later.
ECS requires only 2 attributes to insert and acquire granules: ShortName and VersionID. Upon granule generation, ProductionDateTime is generated by the system and is this can also be used to identify granules belonging to an collection.
Temporal can also be designated by range, or periodic attributes
Spatial can also be designated by a single point, point & circle, or polygon.
ECS needs to be made aware of a data set prior to the arrival of the first “granule” of data, so that the archives that will hold the data and the database tables that will hold the metadata can be set up.
This is done by defining an Earth Science Data Type (ESDT). An ESDT “descriptor” file contains all the metadata values that describe the entire “collection” of data granules.
The ESDT descriptor also identifies the metadata that will pertain to the individual granules and whose values will be supplied as each granule is “inserted” into the system.
The Distributed Active Archive Centers (DAACs) are responsible for generating ESDT descriptor files, DLLs and any custom code necessary to ingest granules into the system.
(is it appropriate to say this?)