Update on HDF, including recent changes to the software, upcoming releases, collaborations, future plans. Will include an overview of the upcoming HDF5 1.8 release, and updates on the netCDF4/HDF5 merge, HDF5 support for indexing, BioHDF, the HDF5-Storage Resource Broker project, the NPOESS BAA, HDF5-OPeNDAP project, HDF-EOS library and website supports and the HDF spin-off THG.
4. “The HDF Group” = “THG”
Founded Dec. 2006
Went solo July 15, 2006
Non-profit
5. THG mission
To support the vast community of HDF
users and to ensure the sustainable
development of HDF technologies and
the ongoing accessibility of HDF-stored
data.
6. The HDF Team
Frank Baker
Christian Chilan
Peter Cao
Vailin Choi
Mike Folk
Anne Jennings
Barbara Jones
Quincey Koziol
James Laird
Raymond Lu
John Mainzer
Matthew Needham
Pedro Nunes
Tammi O’Neill
Elena Pourmal
Binh-minh Ribler
Randy Ribler
Rishi Sinha
Kent Yang
And all those wonderful folks out there
who contribute ideas, requests, bug
reports, code, and support.
9. Platforms to be dropped
• Operating systems
•
•
•
•
•
•
•
•
HPUX 11.00
Crays SV1 and TS IEEE
AIX 5.1 and 5.2
SGI IRIX64-6.5
Linux 2.4
Solaris 2.7, 2.8, 2.9
Windows 2000
MAC OSX 10.3
• Compilers
• GNU C compilers older
than 3.4 (Linux)
• Intel 8.*
• PGI V. 5.*, 6.0
10. Platforms to be added
• Systems
•
•
•
•
•
•
•
MAC OSX 10.4 (Intel)
Solaris 2.* on Intel
Cray XT3
Windows 64-bit (?)
Linux 2.6
HPUX 11.23
IBM Power 5
• Compilers
• g95
• PGI V. 6.1
• Intel 9.*
11. New features
• Configuration
• Switched to use F77_FUNC macro for better
Fortran support (no hard-coded compilers
anymore!)
• Support for shared libraries
• Library
• No hard-coded limit on number of opened files
• New APIs to control number of files opened by
application
• Fortran support for SZIP compression
12. Bugs fixes
• Tools
• A lot of improvements to the hdp, hrepack,
hdiff and hdfimport utilites based on users’
feedback
• Library
• Data corruption bug for several opened
unlimited dimension SDSs
• Better handling of SDSs with duplicated
names in SDgetdimscale and more
14. No new releases!
• Focus on HDF5 release 1.8
• HDF5-1.8.0 Alpha 5 release is available from:
hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html
15. Platforms to be dropped
• Operating systems
•
•
•
•
•
•
HPUX 11.00
MAC OS 10.3
AIX 5.1 and 5.2
SGI IRIX64-6.5
Linux 2.4
Solaris 2.8 and 2.9
• Compilers
• GNU C compilers older
than 3.4 (Linux)
• Intel 8.*
• PGI V. 5.*, 6.0
• MPICH 1.2.5
http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html
16. Platforms to be added
• Systems
•
•
•
•
•
Alpha Open VMS
MAC OSX 10.4 (Intel)
Solaris 2.* on Intel (?)
Cray XT3
Windows 64-bit (32-bit
binaries)
• Linux 2.6
• BG/L
• Compilers
•
•
•
•
•
g95
PGI V. 6.1
Intel 9.*
MPICH 1.2.7
MPICH2
18. HDF5 1.8 new library features
• Datatype and dataspace features
•
•
•
•
•
•
•
•
Serialized dataspaces and datatypes
Ability to create data type from text description
Integer to float conversions during I/O
Revised exception handling during type
conversion
Compact storage for N-bit data types
Offset+size storage filter, saving space
“Null” dataspace – datasets with no elements
Data transformation filter
19. HDF5 1.8 – new library features
• Group revisions
•
•
•
•
Creation order access
Compact groups – small groups take less space
Large group storage improvements
Intermediate group creation
20. HDF5 1.8 – new library features
• Link improvements
• External links -- can refer to objects in another file
• User defined links – apps create own kinds of
links
• Attribute improvments
• Storage improvements for large numbers of attr
• Iterate or look up by creation order
21. HDF5 1.8 – new library features
• Support for Unicode UTF-8 character set
• Shared header info – duplicate header info
shared, possibly saving space
• Metadata cache improvements – faster I/O on
files with many objects
• Data transformation filter
• Stackable Virtual File Drivers
• Better UNIX/Linux portability
22. HDF5 1.8– new APIs
• New extendible error-handling API
• New APIs to copy objects between files fast
• Dimension scale model and API
• “HDFpacket” – API to read/write packets efficiently
24. HDF5 1.8 vs. 1.6.5
• Differences between 1.8 vs. 1.6.5
• Some file format changes
• Several new routines added
• Old APIs deprecated -- removed in later release
• Consequences
• Application requiring 1.8 format changes will write
objects that 1.6.5 library cannot read
• To exploit 1.8 changes, apps need to be rewritten
25. Principle of
“Maximum file format compatibility”
Unless instructed otherwise, the HDF5 library will
write objects using the earliest version of the format
possible for describing the information.
Assures forward compatibility with the older
versions whenever possible – objects in new
files can be read with old libraries if those
objects are “known” to the old libraries.
27. New features for old tools
• h5dump
• Dump data in binary format
• h5diff
• Compare dataset regions
• Parallel h5diff (ph5diff)
• Compare two files in MPI parallel environment
• h5repack
• Efficient data copy using H5Gcopy()
• Able to handle big datasets
28. New HDF5 Tools
• h5copy
• Copies an group, dataset or named datatype from
one location to another location
• Copies within a file or across files
• h5check
• Verifies an HDF5 file against the defined HDF5
File Format Specification
• h5stat
• Reports statistics about a file and objects in a file
30. HDFView changes
• Quality improvements for HDF-java package
• Full documentation of hdf-java object package
• Test suite for hdf-java object package
• Support 64-bit Java on Linux and Solaris
• Many new features, including
•
•
•
•
•
Change font size easily
Grab and move image
Create new table (compound dataset) from template
Filter out fill value for image creation
-geometry option for very high resolution displays
31. Future work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView 2.4 with bug fixes/new
features with HDF5 1.8 release
• New GUI features dealing with table, image
and animation
• Writing capability for HDF5-SRB model
33. Website for HDF-EOS Tools
• THG now manages HDF-EOS web site
•
•
•
•
Registered domain names: hdfeos.net/.org/.com
Re-implemented major topic areas
Re-designed interface
Registered google search
• Will continue maintenance
• Phase two
• Host mailing list
• Support simple forum features
37. HDF5 - PnetCDF performance comparison
Flash I/O Benchmark (Checkpoint files)
PnetCDF
HDF5 collective
HDF5 independent
2500
MB/s
2000
1500
1000
uP: Power 5
500
0
10
110
210
310
Number of Processors
I/O performance of PnetCDF is comparable with
parallel HDF5 when the libraries are used in similar
manners.
38. PnetCDF4 - PnetCDF comparison
Bandwidth (MB/S)
PNetCDF collective
NetCDF4 collective
160
140
120
100
80
60
40
20
0
0
16
32
48
64
80
96
112
128
144
Number of processors
I/O performance of parallel NetCDF4 is comparable
with PnetCDF with about 15% slowness on average for
the output of ROMS history file.
39. Collective I/O improvements
• HDF5 supports collective IO for non-regular
selections
• Collective IO for chunked storage is not trivial.
• Non-regular selection performance optimizations:
• Added IO options to achieve good collective IO
performance
• Added APIs for applications to participate in the
optimization process
• See the poster
41. DOE ASC* and Others
• Support HDF5 on major systems at Sandia &
Lawrence Livermore National Laboratories
• R&D efforts underway
•
•
•
•
File recovery after a crash
Very fast write speed – goal is 300 MB/sec
Read-while-writing capability
Java library and HDFView improvements
* Advanced Scientific Computing project
44. Boeing HDF5 for flight test data
• Boeing 787 active archive
• 10 TB per flight-test day
• Must handle raw, real-time data
• High speed ingest, by “packet”
• Post-processing, by “time-history”
• Boeing High Level API’s
• HDFpacket – released with HDF5 1.8
• HDFtime_history – new, open version likely
48. Agilent C# project
• Why?
• Heavy use of C# at Agilent
• Compatibility with Matlab
• Other interest in HDF5 at Agilent
• What?
• Prototype API in C# for Windows XP
• Basic functions to create, open, close, read, write
• Limited datatypes, no partial I/O
• When?
• March 2007
49. HDF5 Software
Tools & Applications
Fortran C++ Java C#
C API
HDF I/O Library
HDF File
51. NetCDF 4 project
• Enhanced NetCDF-4 Interface to HDF5
• Combine features of netCDF and HDF5
• Take advantage of their separate strengths
• Collaboration between NCSA, THG, Unidata
• Currently in Alpha Release
• Waiting for beta release
53. Archival formats
• Proposal to NOAA Scientific Data
Stewardship program
• Will investigate use of OAIS “Archive
Information Package” standard with HDF5
• PI: Ruth Duerr (NSIDC) and Kent Yang
OAIS: Open Archival Information System
55. • Huge streams of data
collected …
• To be accessed in little
bits…
56. Challenge – efficient remote access
• How do we efficiently find and access data
from distributed repositories, when the data
are big and complex?
• Storage Resource Broker (SRB)
• Efficient access to HDF5 objects in repository
• OPeNDAP
• Powerful protocol for remote querying and
subsetting of scientific data
57. Example – Storage resource broker
• Storage Resource Broker – repository for
heterogeneous data collections
• Simplifies storage, query and access to massive
amounts of scientific data
• Has data in HDF5, netCDF, other formats
59. OPeNDAP-HDF5 project
• OPeNDAP
• Powerful protocol for remote querying and
subsetting of scientific data
• Replaces direct file access with remote query and
access
• Widely used in Earth Sciences
60. OPeNDAP – HDF5 Project
• A NASA ROSES NRA project
• Tasks
•
•
•
•
HDF5-DAP2 server (now a prototype)
HDF5-DAP4 server
DAP4 to HDF5 conversion utility
Investigate integrated DAP-aware HDF5 library
62. SQL Server and HDF5
• Microsoft “dream environment for scientists”
• Combine data management, computing
• SQL Server 2005 solution
• Combine RDBMS with scientific analysis tools,
together in one integrated system.
• HDF5 & other formats manage scientific objects
63. HDF5 in SQL server
Visualization
Libraries
(MATLAB,…)
Web Services
(XML, REST, RSS)
OLAP and
Data Mining
Reporting
.NET Languages with Language Integrated Query
Entity Framework (EDM, eSQL, O-R mapping)
HDF5 EDM model
SQL Server
HDF5
HDF5
TVFs
Index
HDF5
type
HDF5 FS
blob
HDF5
files
65. Acknowledgement
This report is based upon work supported in part by a
Cooperative Agreement with NASA under NASA
NNG05GC60A. Any opinions, findings, and
conclusions or recommendations expressed in this
material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.