The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Moving form HDF4 to HDF5/netCDF-4
1. DM_PPT_NP_v02
Moving from HDF4 to
HDF5/netCDF-4
Elena Pourmal, Kent Yang, Joe Lee
epourmal@hdfgroup.org , myang6@hdfgroup.org,
hyoklee@hdfgroup.org
The HDF Group
This work was supported by NASA/GSFC under
Raytheon Co. contract number NNG15HZ39C
2. DM_PPT_NP_v02
2
Outline
• Difference between HDF4 and HDF5 Data
model and capabilities
• Moving data and applications from HDF4
to HDF5
• Taking advantage of HDF5 when
converting data
• Creating compatibility with netCDF-4 when
migrating data from HDF4 to HDF5
3. HDF4 and HDF5 Data Models
HDF4 Objects
• A scientific dataset (SD), a multidimensional
array with dimension scales
• An 8-bit raster image (DFR8), a 2-
dimensional array of 8-bit pixels
• A 24-bit raster image (DF24), a 2-
dimensional array of 24-bit pixels
• A general raster image (GR), a 2-
dimensional array of multi-component
pixels
• An 8-bit color lookup table or palette (DFP),
a 256 by 3 array of 8 bit integers
• A table (Vdata), a sequence of records
• An annotation (AN), a stream of text that
can be attached to any object
• A group (Vgroup), a structure for grouping
objects
HDF5 Objects
• A dataset, a multidimensional array of
records; no dimension scales (HL library)
• HDF5 dataset with attributes
• HDF5 dataset with attributes
• HDF5 dataset with attributes
• HDF5 dataset with attributes
• HDF5 dataset with attributes
• HDF5 one-dim dataset of the records
• Attributes, scale down version of HDF5
dataset
• A group, a structure for grouping objects
4. HDF4 and HDF5 Capabilities
HDF4
• 2GB limit on file size
• Limit on number of objects
(~20000)
• One unlimited dimension;
dataset cannot be compressed
• Compression doesn’t require
chunking storage
• Limited number of compression
methods
• Limited number of supported
datatypes
• Numerical data is always in BE
format
HDF5
• No limit on file size
• No limit on number of objects
• Up to 32 unlimited dimensions;
dataset can be compressed
• Compression requires chunking
storage
• Custom compression methods
supported
• Datatypes of any complexity
• User-defined endianess
5. DM_PPT_NP_v02
5
Moving data and applications from
HDF4 and HDF5
• Moving Data
– H4h5tools conversion toolkit
• Mapping Spec
https://support.hdfgroup.org/HDF5/doc/ADGuide/H4toH
5Mapping.pdf
• Library
• Command-line tools h4toh5 and h5toh4
• Moving Applications
– Software has to be rewritten if using HDF library
– HDF-EOS2 and netCDF based applications
require minimum rework
6. DM_PPT_NP_v02
6
Taking advantages of HDF5 and
avoiding pitfalls
• Data endianess
• Chunked storage for compression and
data extensibility
– Contiguous vs. chunked storage
– Chunk sizes
• Compression methods in HDF4 and HDF5
• Using strings in HDF5
– HDF4 fixed character arrays vs HDF5 strings
• Working with dimension scales
7. DM_PPT_NP_v02
7
Creating compatibility with netCDF-
4
• HDF5 files can be read by netCDF-4 library and
tools unless they use features that are not
supported by netCDF-4
• See unsupported HDF5 features in netCDF-4
http://www.unidata.ucar.edu/software/netcdf/docs/f
aq.html#fv15
• To assure maximum interoperability do not use
– Hierarchical HDF5 structure (nested groups)
– HDF5 user-defined datatypes
– HDF5 compound datatypes
• http://www.unidata.ucar.edu/software/netcdf/docs/
interoperability_hdf5.html
8. DM_PPT_NP_v02
8
Creating compatibility with netCDF-
4 (to be expanded)
Tools NetCDF-3 FormatNetCDF-3 Format
following CF
Conventions
NetCDF-4 Format
following netCDF-
4 generic model
NetCDF-4 Format
following netCDF-
4 classic model
NetCDF-4 Format
following netCDF-
4 classic model
and CF
HDF-EOS5
augmentation tool
No No Yes. Note: Users
have flexibility to
specify dimension
scales. Tested
with NASA Aura
files.
No No
HDF-EOS5 to
netCDF-4 converter
No No Yes. Note: Users
have no control.
The converter
tries to map the
HDF-EOS5
dimension
information
provided by the
file to netCDF-4
enhanced model.
No No
NetCDF-4 intentionally supports a simpler data model than HDF5, which means there are HDF5 files that cannot be converted to netCDF-4, including files that make use of features in the following list:* Multidimensional data that doesn't use shared dimensions implemented using HDF5 "dimension scales". (This restriction was eliminated in netCDF 4.1.1, permitting access to HDF5 datasets that don't use dimension scales.)* Non-hierarchical organizations of Groups, in which a Group may have multiple parents or may be both an ancestor and a descendant of another Group, creating cycles in the subgroup graph. In the netCDF-4 data model, Groups form a tree with no cycles, so each Group (except the top-level unnamed Group) has a unique parent.* HDF5 "references" which are like pointers to objects and data regions within a file. The netCDF-4 data model does not support references.* Additional primitive types not included in the netCDF-4 data model, including H5T_TIME, H5T_BITFIELD, and user-defined atomic types.* Multiple names for data objects such as variables and groups. The netCDF-4 data model requires that each variable and group have a single distinguished name.* Attributes attached to user-defined types.* Stored property lists* Object names that begin or end with a spaceIf you know that an HDF5 file conforms to the netCDF-4 enhanced data model, either because it was written with netCDF function calls or because it doesn't make use of HDF5 features in the list above, then it can be accessed using netCDF-4, and analyzed, visualized, and manipulated through other applications that can access netCDF-4 files.
In addition to using the tools, one may follow some instructions to create netCDF-4 files via HDF5 APIs,At NASA File Format standard document, https://cdn.earthdata.nasa.gov/conduit/upload/497/ESDS-RFC-022v1.pdfAppendix B Creating a valid netCDF4 file using the HDF5 API may be a reference for people to check.