1. HDF4 and HDF5 Performance
Preliminary Results
Elena Pourmal
IV HDF-EOS Workshop
September 19 - 21 2000
2. Why compare?
⢠HDF5 emerges as a new standard
â proved to be robust
â most of the planned features have been
implemented in HDF5-1.2.2
â has a lot of new features compared to HDF4
â time for performance study and tuning
⢠Users move their data and applications to
HDF5
⢠HDF4 is not âbad,â but has limited
capabilities
3. HDF5
â˘
â˘
â˘
â˘
â˘
â˘
â˘
â˘
â˘
â˘
Files over 2GB
Unlimited number of objects
One data model
(multidimensional array of
structures)
|| support
Thread safe
Mounting files
Diversity of datatypes
(compound, VL, opaque) and
operations (create, write, read,
delete, shared)
âNativeâ file is portable
Modifiable I/O pipe-line
(registration of compression
methods)
Selections (unions and regular
blocks)
HDF4
â˘
â˘
â˘
Files less than 2GB
Max limit 20000 of objects
Different data models for SD,
GR, RI, Vdatas
â˘
â˘
â˘
â˘
N/A
N/A
N/A
Only predefined datatypes such
as float32, int16, char8
â˘
â˘
âNativeâ file is not portable
N/A
â˘
Selections (simple regular
subsampling)
4. What to compare?
(short list of common features)
⢠File I/O operations
â
â
â
â
â
plain read and write
hyperslab selections
regular subsampling
access to large number of objects
storage overhead
⢠Data organization in the file and access to it
â Vdata vs compound datasets
⢠Chunking, unlimited dimensions, compression
5. Benchmark Environment
⢠440-Mhz UltraSPARC i-IIi
â 1G memory
â Sun OS 5.7
â gettimeofday()
⢠2 - 550 Mhz Pentium III Xeon
â 1G memory
â RedHat 6.2
â clock()
⢠each measurement was taken 10 times, average
and best times were collected
6. Benchmarks
⢠Writing 1Dim and 2Dim datasets of integers
⢠Reading 2Dim contiguous hyperslabs of integers
⢠Reading 2Dim contiguous hyperslabs of integers
with subsampling
⢠Reading fixed size hyperslabs of integers from
different locations in the dataset
⢠Writing and reading Vdatas and Compound
Datasets
⢠CERES data
8. Writing 1Dim Datasets
⢠In this test we created one-dimensional arrays of integers
with sizes varying from 8Kbytes to 8000 Kbytes in steps
of 8Kbytes. We measured the average and best times for
writing these arrays into HDF4 and HDF5 files.
⢠Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
10. Writing 2Dim Datasets
⢠In this test we created two-dimensional arrays with sizes
varying from 40 X 40 bytes to 4000 X 4000 bytes in steps
of 40 bytes for each dimension. We measured the average
and best times for writing these arrays into HDF4 and
HDF5 files. The graphs were plotted by averaging the
values obtained for the same array size, without
considering the shape of the array.
⢠Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
13. Reading Contiguous Hyperslabs
⢠In this test we created a file with 1000 X 1000 array of
integers. Subsequently, we read hyperslabs of different
sizes starting from a fixed position in the array and the
measurements for read were averaged over 10 runs. HDF51.2.2, HDF5-1.2.2-patched and HDF5 development
libraries were tested.
⢠Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
15. Reading Hyperslabs
(latest version of the HDF5 development branch)
100000
80000
60000
HDF4
HDF5
40000
20000
8E+05
7E+05
6E+05
5E+05
4E+05
3E+05
3E+05
2E+05
2E+05
1E+05
64500
27600
0
100
Time (microseconds)
Hyperslab selection, best time
HDF5 development branch
Size of hyperslab (number of elements)
For hyperslabs > 2MB, HDF5 becomes more about 1.5 times slower
than HDF4. It still shows nonlinear growth.
16. Reading contiguous hyperslabs
(fixed size)
⢠In this test, the size of the hyperslab was fixed to 100x100
elements. The hyperslab was moved, first along the X
axis, then along the Y axis, and finally along the diagonal
and the read performance was measured.
⢠Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
17. Reading 100x100 Hyperslabs from Different Locations
Selection of 100x100 hyperslab
(best time)
Time (microseconds)
6000
5000
HDF4
HDF5-1.2.2
HDF5-1.2.2-patched
HDF5 development
4000
3000
2000
1000
0
1
2
3
4
5
6
7
8
9
10
Events
For small hyperslabs HDF5 performs about 3 times better than HDF4.
19. Subsampling Hyperslabs
⢠In this test we created a file with 1000x1000 array of
integers. Subsequently, we read every second element of
the hyperslabs of different sizes starting from a fixed
position in the array and the measurements for read were
averaged over 10 runs. HDF5-1.2.2, and HDF5
development libraries were tested.
⢠Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
20. Reading Each Second Element of the Hyperslabs
Hyperslabs with subsampling each second
element (best time)
35
Time (seconds)
30
25
20
HDF4
15
HDF5
10
5
3E+05
3E+05
3E+05
2E+05
2E+05
2E+05
2E+05
1E+05
1E+05
1E+05
91000
74700
59500
45500
32000
19600
8900
100
0
Size of hyperslab (number of elements)
HDF5 shows nonlinear growth. HDF4 performs about 3 times
for the hyperslabs with the size > .5MB
21. First Attempt to Improve the Performance
Hyperslabs with selection (best time)
30
Time (minutes)
25
20
HDF4
HDF5
HDF5 (latest)
15
10
5
3E+05
2E+05
2E+05
2E+05
2E+05
2E+05
1E+05
1E+05
97600
80300
63900
48500
34200
21000
9400
100
0
Size of hyperslab
HDF4 still performs 2 times better for the hyperslabs > 2MB.
HDF5 shows nonlinear growth.
22. Current Behavior (HDF5 development branch)
18
16
14
12
10
8
6
4
2
0
3E+05
3E+05
2E+05
2E+05
2E+05
2E+05
2E+05
1E+05
1E+05
1E+05
85800
70800
56400
42700
30000
18600
8500
HDF4
HDF5
100
Time (seconds)
Hyperslab with selection
(best time)
Hyperslab size (number of elements)
HDF5 growth linear and performs about 10 times better than HDF4 .
24. Vdatas and Compound Datasets
⢠In this test we created HDF4 files with Vdata and HDF5
files with compound dataset with sizes from 1000 to
1000000 number of records:
⢠float a; short b;float c[3]; char d;
⢠write operation, write with packing data and partial read
were tested.
⢠Test was performed on Linux platforms. We also looked
into data conversion issues.
25. Writing Data (VSwrite and H5Dwrite)
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
HDF4 native
4E+05
4E+05
4E+05
3E+05
3E+05
3E+05
2E+05
2E+05
1E+05
1E+05
75000
38000
HDF4 with conversion
HDF5 native
HDF5 with conversion
1000
Time ( in seconds)
Writing Vdatas and Compound Datasets
(average time)
Number of records (19bytes each)
Conversion does not affect HDF4 performance. It does affect
HDF5 ( more than in 15 times)
26. Writing Data
(timing includes packing:VSpack and H5Tpack)
3.5
3
2.5
2
1.5
1
0.5
0
9E+05
9E+05
8E+05
7E+05
6E+05
6E+05
5E+05
4E+05
4E+05
3E+05
2E+05
1E+05
73000
HDF4
HDF4 with packing
HDF5
HDF5 with packing
1000
Time (seconds)
Writing Vdatas and Compound Datasets
Effect of data packing in HDF4 and HDF5
(average time)
Number of records
Data packing was added to the previous test. For HDF5 we have
very small effect.
27. Reading Two Fields
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
HDF4
HDF4 without unpcking
data
9E+05
9E+05
8E+05
7E+05
6E+05
5E+05
5E+05
4E+05
3E+05
2E+05
2E+05
79000
HDF5
1000
Time ( seconds)
Reading Vdatas and Compound datasets
Native read
(average time)
Number of records
Unpacking slows down HDF4 significantly ( about 8 times)
HDF5 was reading packed data in this test.
30. Ceres File
⢠Used H4toH5 converter to create an HDF5
version of the file
â 81MB (HDF4), 80MB (HDF5)
â 1 min 55 sec on Linux
â 3 min 56 sec on Solaris
⢠Benchmarks
â read up to 14 datasets (2148x660 floats)
â subsampling: read two columns from the same
datasets
⢠Benchmark was run on Solaris and Linux
platforms
31. Reading CERES data on big and little - endian machines
Reading CERES data on different platforms
(best times)
3
Time (seconds)
2.5
2
HDF4 (LE)
HDF5 (LE)
HDF4 (BE)
HDF5 (BE)
1.5
1
0.5
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14
Number of 2148x660 datasets read
On Solaris platform, HDF5 was twice faster than HDF4.
On Linux (data conversion is on), HDF4 was about 1.3-1.5 faster.
32. Subsetting CERES Data
Selection of two columns from 2148x660 CERES
dataset
(best times)
Time (seconds)
25
20
HDF4
HDF5
HDF5 tuned
15
10
5
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14
Number of datasets
Current version of HDF5 shows about 3 times better performance.
33. Conclusion
⢠Goal: tune HDF5 and give our users
recommendations on its efficient usage
⢠Continue to study HDF4 and HDF5 performance
â try more platforms: O2K, NT/Windows
â try other features (e.g. chunking, compression)
â specific HDF5 features (e.g. writing/reading big files,
VL datatypes, compound datatypes, selections)
⢠Users input is necessary, send us access patterns
you use!
⢠Results will be available @http://hdf.ncsa.uiuc.edu