Are you curious what is coming next? Are you willing to try some prototyped software? If so, come to this talk to learn about new HDF5 features such as HDF5 FORTARN2003 APIs, metadata journaling etc. and how your application can benefit from using them.
5. Performance Improvements in HDF5
• Examples of completed work:
• New implementation of metadata cache to
improve I/O performance and memory usage when
accessing many objects (HDF5 1.8.0)
• Faster, more scalable storage and access for
large groups (HDF5 1.8.0)
October 15, 2008
HDF and HDF-EOS Workshop XII
5
6. Performance Improvements in HDF5
• Work in progress
• New implementation of free-space management
• Affects “dynamic” applications that
add/delete/modify existing objects
• When creating HDF5 objects, space is allocated
from available space tracked by the free-space
manager
• When deleting objects, unused space is added to
the free-space pool via free-space manager
• Current implementation uses O(N2) operations for
each N allocations or freeing space
• New implementation O(log2N)
October 15, 2008
HDF and HDF-EOS Workshop XII
6
7. Free-space Management
• Test: creating/deleting attributes
•
•
•
•
Create first set of attributes
Delete odd-numbered attributes from the first set
Create second set of attributes
Delete all attributes from the second set
Number of
attributes
Old
New
Improvement
implementation implementation ratio
500
786.5 sec
68.2 sec
11.5x
1000
11000 sec
289 sec
38x
October 15, 2008
HDF and HDF-EOS Workshop XII
7
8. Performance Improvements in HDF5
• Work in progress
• Fast data append (along slowest changing
dimension)
• Future areas of interest
• Efficient chunking cache implementation
(NetCDF4)
• Efficient handling of the variable-length data
including compression (will affect sizes of NPOESS
files)
October 15, 2008
HDF and HDF-EOS Workshop XII
8
10. Status of the HDF5 Fortran Library
• HDF5 Fortran library is a part of standard HDF5
distribution
• First release goes back to 1999
• Implemented as Fortran90 wrappers on top of the
HDF5 C library
• Supported on Linux, Windows, Mac Intel, Solaris,
VMS, clusters, etc.
• Compilers
• Open source gfortran, g95
• Vendors (SUN, Intel, PGI, Absoft)
• 32 and 64-bit versions
October 15, 2008
HDF and HDF-EOS Workshop XII
10
11. HDF5 Fortran Library
• Mimics HDF5 C APIs
• Fortran 90 features used
•
•
•
•
•
Modules
Function overloading
Function interfaces
Dynamic memory allocation
Optional parameters
• Many Fortran APIs are simpler than their C
counterparts
October 15, 2008
HDF and HDF-EOS Workshop XII
11
12. Current Limitations
• Supports only “native” Fortran types such as
•
•
•
•
INTEGER
REAL
CHARACTER
DOUBLE PRECISION (obsolete Fortran feature)
• No support for INTEGER*1, INTEGER*2,
INTEGER*8, INTEGER*16, REAL*8, REAL*16
• Cannot write/read buffers of those types
• Fortran types have to match C types
• No support for –r8 and –r16 flags (Fortran real =/=
C float)
October 15, 2008
HDF and HDF-EOS Workshop XII
12
13. Current Limitations
• No support for
INTEGER(KIND=n) and REAL(KIND=m)
• Integers n and m are called KIND parameters
• Returned by
selected _int_kind (r)
-10r < n < 10r
selected_real_kind(p,r)
p – precision
r – decimal exponent range
October 15, 2008
HDF and HDF-EOS Workshop XII
13
14. Current Limitations
• Limited support for derived types (compare with C
structures and HDF5 compound datatypes)
• Supports derived types with “native” fields only
• Doesn’t support complex HDF5 datatypes
(e.g., with array member, or nested compound)
• Writes/reads HDF5 compound datasets by fields
only
• Cannot be used in parallel applications
• No support for enum types
• No support for callback functions
October 15, 2008
HDF and HDF-EOS Workshop XII
14
15. Current Limitations - Summary
• Any application written according to Fortran
95/2003 standard will struggle using HDF5
Fortran Library
• Many HDF5 features are not available to Fortran
applications
October 15, 2008
HDF and HDF-EOS Workshop XII
15
16. Fortran 2003 Features
• Fortran 2003 provides a standardized mechanism
for interoperating with C
• Module ISO_C_BINDING for interoperability of
intrinsic types
• C_PTR and C_FUNCPTR for interoperability with
C pointers
• C_LOC(x) and C_FUNLOC(x) inquiry functions for
getting addresses of variables and procedures
• BIND attribute for interoperability of derived types
and C structures and enumerated types
October 15, 2008
HDF and HDF-EOS Workshop XII
16
17. Fortran 2003 Features and HDF5
• New 2003 features allowed us to support
• Any Fortran INTEGER and REAL type data in
HDF5 files
• Fortran derived types and HDF5 compound
datatypes
• Fortran enumerated types and HDF5 enumerated
types
• HDF5 APIs with callbacks
October 15, 2008
HDF and HDF-EOS Workshop XII
17
18. Fortran Compilers with 2003 Features
Compiler
Current versions and
status of HDF5
Future versions
Intel
Versions 10.1 and 11
All F2003 functionality
works in HDF5
g95
August 2008 and later; All
Fortran 2003 functionality
works in HDF5
gfortran
Version 4.4
Limited support for C
interoperability; HDF5
doesn’t work
No plans from compiler
developers to improve the
support
SUN compilers
Express-July 2008 build;
HDF5 doesn’t work
No timeline for fixes; may
be in a year
PGI
Version 7.2.1; HDF5
doesn’t work
Fixes will be available in
Version 8.
October 15, 2008
HDF and HDF-EOS Workshop XII
Version 11 will have a fix
that will allow us to
remove common blocks
18
23. HDF5 file recovery or
Surviving a System Failure
through Metadata
Journaling
October 15, 2008
HDF and HDF-EOS Workshop XII
23
24. Surviving a System Failure in HDF5
• Problem:
• Data in an opened HDF5 files susceptible to
corruption in the event of an application or system
crash.
• Corruption possible if an opened HDF5 file has
been updated when the crash occurs.
• Initial Objective:
• Guarantee an HDF5 file with consistent metadata
can be reconstructed in the event of a crash.
• No guarantee on state of raw data – contains
whatever made it to disk prior to crash.
October 15, 2008
HDF and HDF-EOS Workshop XII
24
25. Crash Survivability in HDF5
• Approach: Metadata Journaling
• When an HDF5 file is opened with Metadata
Journaling enabled, a companion Journal file is
created.
• When an HDF5 API function that modifies
metadata is completed, a transaction is recorded in
the Journal file.
• If the application crashes, a recovery program can
replay the journal by applying in order all metadata
writes until the end of the last completed
transaction written to the journal file.
October 15, 2008
HDF and HDF-EOS Workshop XII
25
26. HDF5 Metadata Journaling Recovery
Application
crashed
h5recover tool
liFe Corrupted 5DFH
Restored
HDF5
File
Companion Journal File
Oct. 16 2008
HDF and HDF-EOS Workshop XII
26
27. Implementation Status
• Serial HDF5 with synchronous write mode
• Alpha1 released August 2008
• User interface (API definition and h5recover tool)
and file format may change
October 15, 2008
HDF and HDF-EOS Workshop XII
27
28. Metadata Journaling Plans
• Serial HDF5 with synchronous write mode
• Finalize User interface definitions and file format
• Serial HDF5 with asynchronous write mode
• To improve Journal file write speed
• More features (need funding)
• Make raw data operations atomic
• Allow "super‐transactions" to be created by
applications
• Enable journaling for Parallel HDF5
October 15, 2008
HDF and HDF-EOS Workshop XII
28
30. Acknowledgement
• This report is based upon work supported in part
by a Cooperative Agreement with the National
Aeronautics and Space Administration (NASA)
under NASA Awards NNX06AC83A and
NNX08AO77A. Any opinions, findings, and
conclusions or recommendations expressed in
this material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.
October 16, 2008
HDF and HDF-EOS Workshop XII
30
Editor's Notes
Maybe Objective bullets do belong on later slide… not sure.