SlideShare ist ein Scribd-Unternehmen logo
1 von 88
HDF5 Advanced Topics
Neil Fortner
The HDF Group
The 14th HDF and HDF-EOS Workshop
September 28-30, 2010

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

1
Outline
• Overview of HDF5 datatypes
• Partial I/O in HDF5
• Chunking and compression

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

2
HDF5 Datatypes
Quick overview of the most
difficult topics

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

3
An HDF5 Datatype is…
• A description of dataset element type
• Grouped into “classes”:
•
•
•
•
•
•

Atomic – integers, floating-point values
Enumerated
Compound – like C structs
Array
Opaque
References
• Object – similar to soft link
• Region – similar to soft link to dataset + selection

• Variable-length
• Strings – fixed and variable-length
• Sequences – similar to Standard C++ vector class
Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

4
HDF5 Datatypes
• HDF5 has a rich set of pre-defined datatypes and
supports the creation of an unlimited variety of
complex user-defined datatypes.
• Self-describing:
• Datatype definitions are stored in the HDF5 file
with the data.
• Datatype definitions include information such as
byte order (endianness), size, and floating point
representation to fully describe how the data is
stored and to insure portability across platforms.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

5
Datatype Conversion
• Datatypes that are compatible, but not identical
are converted automatically when I/O is
performed
• Compatible datatypes:
• All atomic datatypes are compatible
• Identically structured array, variable-length and
compound datatypes whose base type or fields are
compatible
• Enumerated datatype values on a “by name” basis

• Make datatypes identical for best performance

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

6
Datatype Conversion Example
Array of integers on IA32 platform
Native integer is little-endian, 4 bytes

Array of integers on SPARC64 platform
Native integer is big-endian, 8 bytes

H5T_NATIVE_INT

H5T_NATIVE_INT

Little-endian 4 bytes integer

H5Dwrite

H5Dread
H5Dwrite
H5T_STD_I32LE

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

VAX G-floating

7
Datatype Conversion
Datatype of data on disk
dataset = H5Dcreate(file, DATASETNAME, H5T_STD_I64BE,
space, H5P_DEFAULT, H5P_DEFAULT);

Datatype of data in memory buffer
H5Dwrite(dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL,
H5P_DEFAULT, buf);
H5Dwrite(dataset, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL,
H5P_DEFAULT, buf);

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

8
Storing Records with HDF5

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

9
HDF5 Compound Datatypes
• Compound types
• Comparable to C structs
• Members can be any datatype
• Can write/read by a single field or a set of fields
• Not all data filters can be applied
(shuffling, SZIP)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

10
Creating and Writing Compound Dataset
h5_compound.c example

typedef struct s1_t {
int a;
float b;
double c;
} s1_t;
s1_t

Sep. 28-30, 2010

s1[LENGTH];

HDF/HDF-EOS Workshop XIV

11
Creating and Writing Compound Dataset
/* Create datatype in memory. */
s1_tid = H5Tcreate(H5T_COMPOUND, sizeof(s1_t));
H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a),
H5T_NATIVE_INT);
H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c),
H5T_NATIVE_DOUBLE);
H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b),
H5T_NATIVE_FLOAT);

Note:
• Use HOFFSET macro instead of calculating offset by hand.
• Order of H5Tinsert calls is not important if HOFFSET is used.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

12
Creating and Writing Compound Dataset
/* Create dataset and write data */
dataset = H5Dcreate(file, DATASETNAME, s1_tid, space,
H5P_DEFAULT, H5P_DEFAULT);
status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL,
H5P_DEFAULT, s1);
Note:
• In this example memory and file datatypes are the same.
• Type is not packed.
• Use H5Tpack to save space in the file.
status = H5Tpack(s1_tid);
status = H5Dcreate(file, DATASETNAME, s1_tid, space,
H5P_DEFAULT, H5P_DEFAULT);

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

13
Reading Compound Dataset
/* Create datatype in memory and read data. */
dataset
= H5Dopen(file, DATASETNAME, H5P_DEFAULT);
s2_tid
= H5Dget_type(dataset);
mem_tid
= H5Tget_native_type(s2_tid);
buf = malloc(H5Tget_size(mem_tid)*number_of_elements);
status
= H5Dread(dataset, mem_tid, H5S_ALL,
H5S_ALL, H5P_DEFAULT, buf);
Note:
• We could construct memory type as we did in writing example.
• For general applications we need to discover the type in the
file, find out corresponding memory type, allocate space and do
read.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

14
Reading Compound Dataset by Fields
typedef struct s2_t {
double c;
int
a;
} s2_t;
s2_t s2[LENGTH];
…
s2_tid = H5Tcreate (H5T_COMPOUND, sizeof(s2_t));
H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c),
H5T_NATIVE_DOUBLE);
H5Tinsert(s2_tid, “a_name", HOFFSET(s2_t, a),
H5T_NATIVE_INT);
…
status = H5Dread(dataset, s2_tid, H5S_ALL,
H5S_ALL, H5P_DEFAULT, s2);

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

15
Table Example
a_name b_name c_name
(integer) (float) (double)
0
0.
1.0000
1
2
3

1.
4.
9.

0.5000
0.3333
0.2500

4
5
6
7
8
9

16.
25.
36.
49.
64.
81.

0.2000
0.1667
0.1429
0.1250
0.1111
0.1000

Sep. 28-30, 2010

Multiple ways to store a table
• Dataset for each field
• Dataset with compound datatype
• If all fields have the same type:
◦ 2-dim array
◦ 1-dim array of array datatype
• Continued…

Choose to achieve your goal!
•
•
•
•
•

Storage overhead?
Do I always read all fields?
Do I read some fields more often?
Do I want to use compression?
Do I want to access some records?

HDF/HDF-EOS Workshop XIV

16
Storing Variable Length
Data with HDF5

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

17
HDF5 Fixed and Variable Length Array Storage

•Data
•Data

Time
•Data
•Data

•Data
•Data

Time
•Data

•Data
•Data

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

18
Storing Variable Length Data in HDF5
• Each element is represented by C structure
typedef struct {
size_t length;
void
*p;
} hvl_t;

• Base type can be any HDF5 type
H5Tvlen_create(base_type)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

19
Example
hvl_t

data[LENGTH];

for(i=0; i<LENGTH; i++) {
data[i].p = malloc((i+1)*sizeof(unsigned int));
data[i].len = i+1;
}
tvl = H5Tvlen_create (H5T_NATIVE_UINT);

data[0].p

•Data
•Data
•Data
•Data

data[4].len
Sep. 28-30, 2010

•Data

HDF/HDF-EOS Workshop XIV

20
Reading HDF5 Variable Length Array
• HDF5 library allocates memory to read data in
• Application only needs to allocate array of hvl_t
elements (pointers and lengths)
• Application must reclaim memory for data read in
hvl_t rdata[LENGTH];
/* Create the memory vlen type */
tvl = H5Tvlen_create(H5T_NATIVE_INT);
ret = H5Dread(dataset, tvl, H5S_ALL, H5S_ALL,
H5P_DEFAULT, rdata);
/* Reclaim the read VL data */
H5Dvlen_reclaim(tvl, H5S_ALL, H5P_DEFAULT,rdata);

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

21
Variable Length vs. Array
• Pros of variable length datatypes vs. arrays:
• Uses less space if compression unavailable
• Automatically stores length of data
• No maximum size
• Size of an array is its effective maximum size

• Cons of variable length datatypes vs. arrays:
• Substantial performance overhead
• Each element a “pointer” to piece of metadata

• Variable length data cannot be compressed
• Unused space in arrays can be “compressed away”

• Must be 1-dimensional
Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

22
Storing Strings in HDF5

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

23
Storing Strings in HDF5
• Array of characters (Array datatype or extra dimension in
dataset)
• Quick access to each character
• Extra work to access and interpret each string
• Fixed length
string_id = H5Tcopy(H5T_C_S1);
H5Tset_size(string_id, size);

• Wasted space in shorter strings
• Can be compressed
• Variable length
string_id = H5Tcopy(H5T_C_S1);
H5Tset_size(string_id, H5T_VARIABLE);

• Overhead as for all VL datatypes
• Compression will not be applied to actual data

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

24
HDF5 Reference Datatypes

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

25
Reference Datatypes
• Object Reference
• Pointer to an object in a file
• Predefined datatype H5T_STD_REG_OBJ
• Dataset Region Reference
• Pointer to a dataset + dataspace selection
• Predefined datatype
H5T_STD_REF_DSETREG

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

26
Saving Selected Region in a File
Need to select and access the same
elements of a dataset

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

27
Reference to Dataset Region
REF_REG.h5
Root

Matrix

Region References

1 1 2 3 3 4 5 5 6
1 2 2 3 4 4 5 6 6

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

28
Working with subsets

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

30
Collect data one way ….
Array of images (3D)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

31
Display data another way …

Stitched image (2D array)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

32
Data is too big to read….

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

33
HDF5 Library Features
• HDF5 Library provides capabilities to
• Describe subsets of data and perform write/read
operations on subsets
• Hyperslab selections and partial I/O

• Store descriptions of the data subsets in a file
• Object references
• Region references

• Use efficient storage mechanism to achieve good
performance while writing/reading subsets of data
• Chunking, compression

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

34
Partial I/O in HDF5

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

35
How to Describe a Subset in HDF5?
• Before writing and reading a subset of data
one has to describe it to the HDF5 Library.
• HDF5 APIs and documentation refer to a
subset as a “selection” or “hyperslab
selection”.
• If specified, HDF5 Library will perform I/O on a
selection only and not on all elements of a
dataset.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

36
Types of Selections in HDF5
• Two types of selections
• Hyperslab selection
• Regular hyperslab
• Simple hyperslab
• Result of set operations on hyperslabs
(union, difference, …)

• Point selection

• Hyperslab selection is especially important for
doing parallel I/O in HDF5 (See Parallel HDF5
Tutorial)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

37
Regular Hyperslab

Collection of regularly spaced equal size blocks

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

38
Simple Hyperslab

Contiguous subset or sub-array

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

39
Hyperslab Selection

Result of union operation on three simple hyperslabs

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

40
Hyperslab Description
• Start - starting location of a hyperslab (1,1)
• Stride - number of elements that separate each
block (3,2)
• Count - number of blocks (2,6)
• Block - block size (2,1)
• Everything is “measured” in number of elements

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

41
Simple Hyperslab Description
• Two ways to describe a simple hyperslab
• As several blocks
• Stride – (1,1)
• Count – (4,6)
• Block – (1,1)

• As one block
• Stride – (1,1)
• Count – (1,1)
• Block – (4,6)

No performance penalty for
one way or another
Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

42
H5Sselect_hyperslab Function

space_id Identifier of dataspace
op
Selection operator
H5S_SELECT_SET or H5S_SELECT_OR
start
Array with starting coordinates of hyperslab
stride
Array specifying which positions along a dimension
to select
count
Array specifying how many blocks to select from the
dataspace, in each dimension
block
Array specifying size of element block
(NULL indicates a block size of a single element in
a dimension)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

43
Reading/Writing Selections
Programming model for reading from a dataset in
a file
1. Open a dataset.
2. Get file dataspace handle of the dataset and specify
subset to read from.
a. H5Dget_space returns file dataspace handle
a.

File dataspace describes array stored in a file (number of
dimensions and their sizes).

b. H5Sselect_hyperslab selects elements of the array
that participate in I/O operation.

3. Allocate data buffer of an appropriate shape and size

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

44
Reading/Writing Selections
Programming model (continued)
4. Create a memory dataspace and specify subset to write
to.
1.
2.

Memory dataspace describes data buffer (its rank and
dimension sizes).
Use H5Screate_simple function to create memory
dataspace.

Use H5Sselect_hyperslab to select elements of the data
buffer that participate in I/O operation.
Issue H5Dread or H5Dwrite to move the data between

3.

5.

file and memory buffer.
6. Close file dataspace and memory dataspace when
done.
Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

45
Example : Reading Two Rows
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

-1

-1

-1

Data in a file
4x6 matrix

Buffer in memory
1-dim array of length 14
-1

-1

Sep. 28-30, 2010

-1

-1

-1

-1

-1

HDF/HDF-EOS Workshop XIV

-1

-1
46

-1

-1
Example: Reading Two Rows
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

start
count
block
stride

24

filespace = H5Dget_space (dataset);
H5Sselect_hyperslab (filespace, H5S_SELECT_SET,
start, NULL, count, NULL)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

47

=
=
=
=

{1,0}
{2,6}
{1,1}
{1,1}
Example: Reading Two Rows
start[1] = {1}
count[1] = {12}
dim[1]
= {14}

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

memspace = H5Screate_simple(1, dim, NULL);
H5Sselect_hyperslab (memspace, H5S_SELECT_SET,
start, NULL, count, NULL)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

48

-1

-1
Example: Reading Two Rows
1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

H5Dread (…, …, memspace, filespace, …, …);

-1

7

Sep. 28-30, 2010

8

9

10 11 12 13 14 15 16 17 18 -1
HDF/HDF-EOS Workshop XIV

49
Things to Remember
• Number of elements selected in a file and in a
memory buffer must be the same
• H5Sget_select_npoints returns number of
selected elements in a hyperslab selection

• HDF5 partial I/O is tuned to move data between
selections that have the same dimensionality;
avoid choosing subsets that have different ranks
(as in example above)
• Allocate a buffer of an appropriate size when
reading data; use H5Tget_native_type and
H5Tget_size to get the correct size of the data
element in memory.
Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

50
Chunking in HDF5

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

51
HDF5 Dataset

Metadata

Dataset data

Dataspace
Rank Dimensions
3

Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
IEEE 32-bit float

Attributes
Storage info

Time = 32.4

Chunked

Pressure = 987

Compressed

Temp = 56

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

52
Contiguous storage layout
• Metadata header separate from dataset data
• Data stored in one contiguous block in HDF5 file
Metadata cache
Dataset header
………….
Datatype
Dataspace
………….
Attributes
…

Dataset data

Application memory

File

Sep. 28-30, 2010

Dataset data

HDF/HDF-EOS Workshop XIV

53
What is HDF5 Chunking?
• Data is stored in chunks of predefined size
• Two-dimensional instance may be referred to as
data tiling
• HDF5 library usually writes/reads the whole chunk

Contiguous

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

Chunked

54
What is HDF5 Chunking?
• Dataset data is divided into equally sized blocks (chunks).
• Each chunk is stored separately as a contiguous block in
HDF5 file.
Metadata cache

Dataset data

Dataset header
………….
Datatype
Dataspace
………….
Attributes
…

File
Sep. 28-30, 2010

A

B

C

D

Chunk
index

Application memory

header

Chunk
index

A

HDF/HDF-EOS Workshop XIV

C

D

B
55
Why HDF5 Chunking?
• Chunking is required for several HDF5 features
• Enabling compression and other filters like
checksum
• Extendible datasets

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

56
Why HDF5 Chunking?
• If used appropriately chunking improves partial
I/O for big datasets

Only two chunks are involved in I/O

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

57
Creating Chunked Dataset

1.
2.
3.

Create a dataset creation property list.
Set property list to use chunked storage layout.
Create dataset with the above property list.
dcpl_id = H5Pcreate(H5P_DATASET_CREATE);
rank = 2;
ch_dims[0] = 100;
ch_dims[1] = 200;
H5Pset_chunk(dcpl_id, rank, ch_dims);
dset_id = H5Dcreate (…, dcpl_id);
H5Pclose(dcpl_id);

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

58
Creating Chunked Dataset
• Things to remember:
• Chunk always has the same rank as a dataset
• Chunk’s dimensions do not need to be factors
of dataset’s dimensions
• Caution: May cause more I/O than desired
(see white portions of the chunks below)

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

59
Creating Chunked Dataset
• Chunk size cannot be changed after the dataset is
created
• Do not make chunk sizes too small (e.g. 1x1)!
• Metadata overhead for each chunk (file space)
• Each chunk is read individually
• Many small reads inefficient

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

60
Writing or Reading Chunked Dataset
1.
2.

Chunking mechanism is transparent to application.
Use the same set of operation as for contiguous
dataset, for example,
H5Dopen(…);
H5Sselect_hyperslab (…);
H5Dread(…);

3.

Selections do not need to coincide precisely with the
chunks boundaries.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

61
HDF5 Chunking and compression
•

Chunking is required for compression and
other filters
HDF5 filters modify data during I/O operations
Filters provided by HDF5:

•
•
•
•
•

•

Checksum (H5Pset_fletcher32)
Data transformation (in 1.8.*)
Shuffling filter (H5Pset_shuffle)

Compression (also called filters) in HDF5
•
•
•
•

Sep. 28-30, 2010

Scale + offset (in 1.8.*) (H5Pset_scaleoffset)
N-bit (in 1.8.*) (H5Pset_nbit)
GZIP (deflate) (H5Pset_deflate)
SZIP (H5Pset_szip)
HDF/HDF-EOS Workshop XIV

62
HDF5 Third-Party Filters
• Compression methods supported by
HDF5 User’s community
http://wiki.hdfgroup.org/Community-Support-for-HDF5
•
LZO lossless compression (PyTables)
•
BZIP2 lossless compression (PyTables)
•
BLOSC lossless compression (PyTables)
•
LZF lossless compression H5Py

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

63
Creating Compressed Dataset
1.
2.
3.
4.

Create a dataset creation property list
Set property list to use chunked storage layout
Set property list to use filters
Create dataset with the above property list

dcpl_id = H5Pcreate(H5P_DATASET_CREATE);
rank = 2;
ch_dims[0] = 100;
ch_dims[1] = 100;
H5Pset_chunk(dcpl_id, rank, ch_dims);
H5Pset_deflate(dcpl_id, 9);
dset_id = H5Dcreate (…, dcpl_id);
H5Pclose(dcpl_id);

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

64
Performance Issues
or
What everyone needs to know
about chunking and the chunk
cache

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

65
Accessing a row in contiguous dataset

One seek is needed to find the starting location of row of data.
Data is read/written using one disk access.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

66
Accessing a row in chunked dataset

Five seeks is needed to find each chunk. Data is read/written
using five disk accesses. Chunking storage is less efficient
than contiguous storage.
Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

67
Quiz time
• How might I improve this situation, if it is
common to access my data in this way?

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

68
Accessing data in contiguous dataset

M rows

M seeks are needed to find the starting location of the element.
Data is read/written using M disk accesses. Performance may be
very bad.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

69
Motivation for chunking storage

M rows

Two seeks are needed to find two chunks. Data is
read/written using two disk accesses. For this pattern
chunking helps with I/O performance.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

70
Motivation for chunk cache
A

B

H5Dwrite
H5Dwrite

Selection shown is written by two H5Dwrite calls (one for
each row).
Chunks A and B are accessed twice (one time for each
row). If both chunks fit into cache, only two I/O accesses
needed to write the shown selections.
Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

71
Motivation for chunk cache
A

B

H5Dwrite
H5Dwrite

Question: What happens if there is a space for only one
chunk at a time?

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

72
Advanced Exercise
•
•
•
•

Write data to a dataset
Dataset is 512x2048, 4-byte native integers
Chunks are 256x128: 128KB each, 2MB rows
Write by rows

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

73
Advanced Exercise
• Very slow performance
• What is going wrong?
• Chunk cache is only 1MB by default
Read into cache

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

74
Advanced Exercise
• Very slow performance
• What is going wrong?
• Chunk cache is only 1MB by default
Write to disk

Sep. 28-30, 2010

Read into cache

HDF/HDF-EOS Workshop XIV

75
Advanced Exercise
• Very slow performance
• What is going wrong?
• Chunk cache is only 1MB by default
Write to disk

Sep. 28-30, 2010

Read into cache

HDF/HDF-EOS Workshop XIV

76
Advanced Exercise
• Very slow performance
• What is going wrong?
• Chunk cache is only 1MB by default
Write to disk

Sep. 28-30, 2010

Read into cache

HDF/HDF-EOS Workshop XIV

77
Advanced Exercise
• Very slow performance
• What is going wrong?
• Chunk cache is only 1MB by default
Write to disk

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

Read into cache

78
Advanced Exercise
• Very slow performance
• What is going wrong?
• Chunk cache is only 1MB by default
Write to disk

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

Read into cache

79
Advanced Exercise
• Very slow performance
• What is going wrong?
• Chunk cache is only 1MB by default
Read into cache

Sep. 28-30, 2010

Write to disk

HDF/HDF-EOS Workshop XIV

80
Advanced Exercise
• Very slow performance
• What is going wrong?
• Chunk cache is only 1MB by default
Read into cache

Sep. 28-30, 2010

Write to disk

HDF/HDF-EOS Workshop XIV

81
Exercise 1
• Improve performance by changing only chunk
size
Access pattern is fixed, limited memory
• One solution: 64x2048 chunks
• Row of chunks fits in cache

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

82
Exercise 2
• Improve performance by changing only access
pattern
• File already exists, cannot change chunk size

• One solution: Access by chunk
• Each selection fits in cache, contiguous on disk

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

83
Exercise 3
• Improve performance while not changing chunk
size or access pattern
• No memory limitation

• One solution: Chunk cache set to size of row of
chunks

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

84
Exercise 4
• Improve performance while not changing chunk
size or access pattern
• Chunk cache size can be set to max. 1MB
• One solution: Disable chunk cache
• Avoids repeatedly reading/writing whole chunks

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

85
More Information
• More detailed information on chunking and the
chunk cache can be found in the draft “Chunking
in HDF5” document at:
http://www.hdfgroup.org/HDF5/doc/_topic/Chunking

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

86
Thank You!

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

87
Acknowledgements
This work was supported by cooperative agreement
number NNX08AO77A from the National
Aeronautics and Space Administration (NASA).
Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the author[s] and do not necessarily reflect
the views of the National Aeronautics and Space
Administration.

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

88
Questions/comments?

Sep. 28-30, 2010

HDF/HDF-EOS Workshop XIV

89

Weitere ähnliche Inhalte

Ähnlich wie Advanced HDF5 Features

Improving long-term preservation of EOS data by independently mapping HDF4 da...
Improving long-term preservation of EOS data by independently mapping HDF4 da...Improving long-term preservation of EOS data by independently mapping HDF4 da...
Improving long-term preservation of EOS data by independently mapping HDF4 da...The HDF-EOS Tools and Information Center
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And HdfsCloudera, Inc.
 

Ähnlich wie Advanced HDF5 Features (20)

Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
HDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/OHDF5 Advanced Topics - Datatypes and Partial I/O
HDF5 Advanced Topics - Datatypes and Partial I/O
 
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 dataUsage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
 
UML Representation of NPOESS Data Products in HDF5
UML Representation of NPOESS Data Products in HDF5UML Representation of NPOESS Data Products in HDF5
UML Representation of NPOESS Data Products in HDF5
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
HDF5 Life cycle of data
HDF5 Life cycle of dataHDF5 Life cycle of data
HDF5 Life cycle of data
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Advanced HDF5 Features
Advanced HDF5 FeaturesAdvanced HDF5 Features
Advanced HDF5 Features
 
Data Interoperability
Data InteroperabilityData Interoperability
Data Interoperability
 
HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 
HDF5 Tools
HDF5 ToolsHDF5 Tools
HDF5 Tools
 
Modular HDFView
Modular HDFViewModular HDFView
Modular HDFView
 
Subsetting at UAH
Subsetting at UAHSubsetting at UAH
Subsetting at UAH
 
HDF-EOS Overview and Status
HDF-EOS Overview and StatusHDF-EOS Overview and Status
HDF-EOS Overview and Status
 
Improving long-term preservation of EOS data by independently mapping HDF4 da...
Improving long-term preservation of EOS data by independently mapping HDF4 da...Improving long-term preservation of EOS data by independently mapping HDF4 da...
Improving long-term preservation of EOS data by independently mapping HDF4 da...
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 

Mehr von The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

Mehr von The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 

Kürzlich hochgeladen

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Advanced HDF5 Features

  • 1. HDF5 Advanced Topics Neil Fortner The HDF Group The 14th HDF and HDF-EOS Workshop September 28-30, 2010 Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 1
  • 2. Outline • Overview of HDF5 datatypes • Partial I/O in HDF5 • Chunking and compression Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 2
  • 3. HDF5 Datatypes Quick overview of the most difficult topics Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 3
  • 4. An HDF5 Datatype is… • A description of dataset element type • Grouped into “classes”: • • • • • • Atomic – integers, floating-point values Enumerated Compound – like C structs Array Opaque References • Object – similar to soft link • Region – similar to soft link to dataset + selection • Variable-length • Strings – fixed and variable-length • Sequences – similar to Standard C++ vector class Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 4
  • 5. HDF5 Datatypes • HDF5 has a rich set of pre-defined datatypes and supports the creation of an unlimited variety of complex user-defined datatypes. • Self-describing: • Datatype definitions are stored in the HDF5 file with the data. • Datatype definitions include information such as byte order (endianness), size, and floating point representation to fully describe how the data is stored and to insure portability across platforms. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 5
  • 6. Datatype Conversion • Datatypes that are compatible, but not identical are converted automatically when I/O is performed • Compatible datatypes: • All atomic datatypes are compatible • Identically structured array, variable-length and compound datatypes whose base type or fields are compatible • Enumerated datatype values on a “by name” basis • Make datatypes identical for best performance Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 6
  • 7. Datatype Conversion Example Array of integers on IA32 platform Native integer is little-endian, 4 bytes Array of integers on SPARC64 platform Native integer is big-endian, 8 bytes H5T_NATIVE_INT H5T_NATIVE_INT Little-endian 4 bytes integer H5Dwrite H5Dread H5Dwrite H5T_STD_I32LE Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV VAX G-floating 7
  • 8. Datatype Conversion Datatype of data on disk dataset = H5Dcreate(file, DATASETNAME, H5T_STD_I64BE, space, H5P_DEFAULT, H5P_DEFAULT); Datatype of data in memory buffer H5Dwrite(dataset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf); H5Dwrite(dataset, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf); Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 8
  • 9. Storing Records with HDF5 Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 9
  • 10. HDF5 Compound Datatypes • Compound types • Comparable to C structs • Members can be any datatype • Can write/read by a single field or a set of fields • Not all data filters can be applied (shuffling, SZIP) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 10
  • 11. Creating and Writing Compound Dataset h5_compound.c example typedef struct s1_t { int a; float b; double c; } s1_t; s1_t Sep. 28-30, 2010 s1[LENGTH]; HDF/HDF-EOS Workshop XIV 11
  • 12. Creating and Writing Compound Dataset /* Create datatype in memory. */ s1_tid = H5Tcreate(H5T_COMPOUND, sizeof(s1_t)); H5Tinsert(s1_tid, "a_name", HOFFSET(s1_t, a), H5T_NATIVE_INT); H5Tinsert(s1_tid, "c_name", HOFFSET(s1_t, c), H5T_NATIVE_DOUBLE); H5Tinsert(s1_tid, "b_name", HOFFSET(s1_t, b), H5T_NATIVE_FLOAT); Note: • Use HOFFSET macro instead of calculating offset by hand. • Order of H5Tinsert calls is not important if HOFFSET is used. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 12
  • 13. Creating and Writing Compound Dataset /* Create dataset and write data */ dataset = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT, H5P_DEFAULT); status = H5Dwrite(dataset, s1_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s1); Note: • In this example memory and file datatypes are the same. • Type is not packed. • Use H5Tpack to save space in the file. status = H5Tpack(s1_tid); status = H5Dcreate(file, DATASETNAME, s1_tid, space, H5P_DEFAULT, H5P_DEFAULT); Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 13
  • 14. Reading Compound Dataset /* Create datatype in memory and read data. */ dataset = H5Dopen(file, DATASETNAME, H5P_DEFAULT); s2_tid = H5Dget_type(dataset); mem_tid = H5Tget_native_type(s2_tid); buf = malloc(H5Tget_size(mem_tid)*number_of_elements); status = H5Dread(dataset, mem_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf); Note: • We could construct memory type as we did in writing example. • For general applications we need to discover the type in the file, find out corresponding memory type, allocate space and do read. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 14
  • 15. Reading Compound Dataset by Fields typedef struct s2_t { double c; int a; } s2_t; s2_t s2[LENGTH]; … s2_tid = H5Tcreate (H5T_COMPOUND, sizeof(s2_t)); H5Tinsert(s2_tid, "c_name", HOFFSET(s2_t, c), H5T_NATIVE_DOUBLE); H5Tinsert(s2_tid, “a_name", HOFFSET(s2_t, a), H5T_NATIVE_INT); … status = H5Dread(dataset, s2_tid, H5S_ALL, H5S_ALL, H5P_DEFAULT, s2); Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 15
  • 16. Table Example a_name b_name c_name (integer) (float) (double) 0 0. 1.0000 1 2 3 1. 4. 9. 0.5000 0.3333 0.2500 4 5 6 7 8 9 16. 25. 36. 49. 64. 81. 0.2000 0.1667 0.1429 0.1250 0.1111 0.1000 Sep. 28-30, 2010 Multiple ways to store a table • Dataset for each field • Dataset with compound datatype • If all fields have the same type: ◦ 2-dim array ◦ 1-dim array of array datatype • Continued… Choose to achieve your goal! • • • • • Storage overhead? Do I always read all fields? Do I read some fields more often? Do I want to use compression? Do I want to access some records? HDF/HDF-EOS Workshop XIV 16
  • 17. Storing Variable Length Data with HDF5 Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 17
  • 18. HDF5 Fixed and Variable Length Array Storage •Data •Data Time •Data •Data •Data •Data Time •Data •Data •Data Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 18
  • 19. Storing Variable Length Data in HDF5 • Each element is represented by C structure typedef struct { size_t length; void *p; } hvl_t; • Base type can be any HDF5 type H5Tvlen_create(base_type) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 19
  • 20. Example hvl_t data[LENGTH]; for(i=0; i<LENGTH; i++) { data[i].p = malloc((i+1)*sizeof(unsigned int)); data[i].len = i+1; } tvl = H5Tvlen_create (H5T_NATIVE_UINT); data[0].p •Data •Data •Data •Data data[4].len Sep. 28-30, 2010 •Data HDF/HDF-EOS Workshop XIV 20
  • 21. Reading HDF5 Variable Length Array • HDF5 library allocates memory to read data in • Application only needs to allocate array of hvl_t elements (pointers and lengths) • Application must reclaim memory for data read in hvl_t rdata[LENGTH]; /* Create the memory vlen type */ tvl = H5Tvlen_create(H5T_NATIVE_INT); ret = H5Dread(dataset, tvl, H5S_ALL, H5S_ALL, H5P_DEFAULT, rdata); /* Reclaim the read VL data */ H5Dvlen_reclaim(tvl, H5S_ALL, H5P_DEFAULT,rdata); Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 21
  • 22. Variable Length vs. Array • Pros of variable length datatypes vs. arrays: • Uses less space if compression unavailable • Automatically stores length of data • No maximum size • Size of an array is its effective maximum size • Cons of variable length datatypes vs. arrays: • Substantial performance overhead • Each element a “pointer” to piece of metadata • Variable length data cannot be compressed • Unused space in arrays can be “compressed away” • Must be 1-dimensional Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 22
  • 23. Storing Strings in HDF5 Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 23
  • 24. Storing Strings in HDF5 • Array of characters (Array datatype or extra dimension in dataset) • Quick access to each character • Extra work to access and interpret each string • Fixed length string_id = H5Tcopy(H5T_C_S1); H5Tset_size(string_id, size); • Wasted space in shorter strings • Can be compressed • Variable length string_id = H5Tcopy(H5T_C_S1); H5Tset_size(string_id, H5T_VARIABLE); • Overhead as for all VL datatypes • Compression will not be applied to actual data Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 24
  • 25. HDF5 Reference Datatypes Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 25
  • 26. Reference Datatypes • Object Reference • Pointer to an object in a file • Predefined datatype H5T_STD_REG_OBJ • Dataset Region Reference • Pointer to a dataset + dataspace selection • Predefined datatype H5T_STD_REF_DSETREG Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 26
  • 27. Saving Selected Region in a File Need to select and access the same elements of a dataset Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 27
  • 28. Reference to Dataset Region REF_REG.h5 Root Matrix Region References 1 1 2 3 3 4 5 5 6 1 2 2 3 4 4 5 6 6 Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 28
  • 29. Working with subsets Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 30
  • 30. Collect data one way …. Array of images (3D) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 31
  • 31. Display data another way … Stitched image (2D array) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 32
  • 32. Data is too big to read…. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 33
  • 33. HDF5 Library Features • HDF5 Library provides capabilities to • Describe subsets of data and perform write/read operations on subsets • Hyperslab selections and partial I/O • Store descriptions of the data subsets in a file • Object references • Region references • Use efficient storage mechanism to achieve good performance while writing/reading subsets of data • Chunking, compression Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 34
  • 34. Partial I/O in HDF5 Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 35
  • 35. How to Describe a Subset in HDF5? • Before writing and reading a subset of data one has to describe it to the HDF5 Library. • HDF5 APIs and documentation refer to a subset as a “selection” or “hyperslab selection”. • If specified, HDF5 Library will perform I/O on a selection only and not on all elements of a dataset. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 36
  • 36. Types of Selections in HDF5 • Two types of selections • Hyperslab selection • Regular hyperslab • Simple hyperslab • Result of set operations on hyperslabs (union, difference, …) • Point selection • Hyperslab selection is especially important for doing parallel I/O in HDF5 (See Parallel HDF5 Tutorial) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 37
  • 37. Regular Hyperslab Collection of regularly spaced equal size blocks Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 38
  • 38. Simple Hyperslab Contiguous subset or sub-array Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 39
  • 39. Hyperslab Selection Result of union operation on three simple hyperslabs Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 40
  • 40. Hyperslab Description • Start - starting location of a hyperslab (1,1) • Stride - number of elements that separate each block (3,2) • Count - number of blocks (2,6) • Block - block size (2,1) • Everything is “measured” in number of elements Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 41
  • 41. Simple Hyperslab Description • Two ways to describe a simple hyperslab • As several blocks • Stride – (1,1) • Count – (4,6) • Block – (1,1) • As one block • Stride – (1,1) • Count – (1,1) • Block – (4,6) No performance penalty for one way or another Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 42
  • 42. H5Sselect_hyperslab Function space_id Identifier of dataspace op Selection operator H5S_SELECT_SET or H5S_SELECT_OR start Array with starting coordinates of hyperslab stride Array specifying which positions along a dimension to select count Array specifying how many blocks to select from the dataspace, in each dimension block Array specifying size of element block (NULL indicates a block size of a single element in a dimension) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 43
  • 43. Reading/Writing Selections Programming model for reading from a dataset in a file 1. Open a dataset. 2. Get file dataspace handle of the dataset and specify subset to read from. a. H5Dget_space returns file dataspace handle a. File dataspace describes array stored in a file (number of dimensions and their sizes). b. H5Sselect_hyperslab selects elements of the array that participate in I/O operation. 3. Allocate data buffer of an appropriate shape and size Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 44
  • 44. Reading/Writing Selections Programming model (continued) 4. Create a memory dataspace and specify subset to write to. 1. 2. Memory dataspace describes data buffer (its rank and dimension sizes). Use H5Screate_simple function to create memory dataspace. Use H5Sselect_hyperslab to select elements of the data buffer that participate in I/O operation. Issue H5Dread or H5Dwrite to move the data between 3. 5. file and memory buffer. 6. Close file dataspace and memory dataspace when done. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 45
  • 45. Example : Reading Two Rows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 -1 -1 -1 Data in a file 4x6 matrix Buffer in memory 1-dim array of length 14 -1 -1 Sep. 28-30, 2010 -1 -1 -1 -1 -1 HDF/HDF-EOS Workshop XIV -1 -1 46 -1 -1
  • 46. Example: Reading Two Rows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 start count block stride 24 filespace = H5Dget_space (dataset); H5Sselect_hyperslab (filespace, H5S_SELECT_SET, start, NULL, count, NULL) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 47 = = = = {1,0} {2,6} {1,1} {1,1}
  • 47. Example: Reading Two Rows start[1] = {1} count[1] = {12} dim[1] = {14} -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 memspace = H5Screate_simple(1, dim, NULL); H5Sselect_hyperslab (memspace, H5S_SELECT_SET, start, NULL, count, NULL) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 48 -1 -1
  • 48. Example: Reading Two Rows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 H5Dread (…, …, memspace, filespace, …, …); -1 7 Sep. 28-30, 2010 8 9 10 11 12 13 14 15 16 17 18 -1 HDF/HDF-EOS Workshop XIV 49
  • 49. Things to Remember • Number of elements selected in a file and in a memory buffer must be the same • H5Sget_select_npoints returns number of selected elements in a hyperslab selection • HDF5 partial I/O is tuned to move data between selections that have the same dimensionality; avoid choosing subsets that have different ranks (as in example above) • Allocate a buffer of an appropriate size when reading data; use H5Tget_native_type and H5Tget_size to get the correct size of the data element in memory. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 50
  • 50. Chunking in HDF5 Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 51
  • 51. HDF5 Dataset Metadata Dataset data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 52
  • 52. Contiguous storage layout • Metadata header separate from dataset data • Data stored in one contiguous block in HDF5 file Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … Dataset data Application memory File Sep. 28-30, 2010 Dataset data HDF/HDF-EOS Workshop XIV 53
  • 53. What is HDF5 Chunking? • Data is stored in chunks of predefined size • Two-dimensional instance may be referred to as data tiling • HDF5 library usually writes/reads the whole chunk Contiguous Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV Chunked 54
  • 54. What is HDF5 Chunking? • Dataset data is divided into equally sized blocks (chunks). • Each chunk is stored separately as a contiguous block in HDF5 file. Metadata cache Dataset data Dataset header …………. Datatype Dataspace …………. Attributes … File Sep. 28-30, 2010 A B C D Chunk index Application memory header Chunk index A HDF/HDF-EOS Workshop XIV C D B 55
  • 55. Why HDF5 Chunking? • Chunking is required for several HDF5 features • Enabling compression and other filters like checksum • Extendible datasets Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 56
  • 56. Why HDF5 Chunking? • If used appropriately chunking improves partial I/O for big datasets Only two chunks are involved in I/O Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 57
  • 57. Creating Chunked Dataset 1. 2. 3. Create a dataset creation property list. Set property list to use chunked storage layout. Create dataset with the above property list. dcpl_id = H5Pcreate(H5P_DATASET_CREATE); rank = 2; ch_dims[0] = 100; ch_dims[1] = 200; H5Pset_chunk(dcpl_id, rank, ch_dims); dset_id = H5Dcreate (…, dcpl_id); H5Pclose(dcpl_id); Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 58
  • 58. Creating Chunked Dataset • Things to remember: • Chunk always has the same rank as a dataset • Chunk’s dimensions do not need to be factors of dataset’s dimensions • Caution: May cause more I/O than desired (see white portions of the chunks below) Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 59
  • 59. Creating Chunked Dataset • Chunk size cannot be changed after the dataset is created • Do not make chunk sizes too small (e.g. 1x1)! • Metadata overhead for each chunk (file space) • Each chunk is read individually • Many small reads inefficient Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 60
  • 60. Writing or Reading Chunked Dataset 1. 2. Chunking mechanism is transparent to application. Use the same set of operation as for contiguous dataset, for example, H5Dopen(…); H5Sselect_hyperslab (…); H5Dread(…); 3. Selections do not need to coincide precisely with the chunks boundaries. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 61
  • 61. HDF5 Chunking and compression • Chunking is required for compression and other filters HDF5 filters modify data during I/O operations Filters provided by HDF5: • • • • • • Checksum (H5Pset_fletcher32) Data transformation (in 1.8.*) Shuffling filter (H5Pset_shuffle) Compression (also called filters) in HDF5 • • • • Sep. 28-30, 2010 Scale + offset (in 1.8.*) (H5Pset_scaleoffset) N-bit (in 1.8.*) (H5Pset_nbit) GZIP (deflate) (H5Pset_deflate) SZIP (H5Pset_szip) HDF/HDF-EOS Workshop XIV 62
  • 62. HDF5 Third-Party Filters • Compression methods supported by HDF5 User’s community http://wiki.hdfgroup.org/Community-Support-for-HDF5 • LZO lossless compression (PyTables) • BZIP2 lossless compression (PyTables) • BLOSC lossless compression (PyTables) • LZF lossless compression H5Py Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 63
  • 63. Creating Compressed Dataset 1. 2. 3. 4. Create a dataset creation property list Set property list to use chunked storage layout Set property list to use filters Create dataset with the above property list dcpl_id = H5Pcreate(H5P_DATASET_CREATE); rank = 2; ch_dims[0] = 100; ch_dims[1] = 100; H5Pset_chunk(dcpl_id, rank, ch_dims); H5Pset_deflate(dcpl_id, 9); dset_id = H5Dcreate (…, dcpl_id); H5Pclose(dcpl_id); Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 64
  • 64. Performance Issues or What everyone needs to know about chunking and the chunk cache Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 65
  • 65. Accessing a row in contiguous dataset One seek is needed to find the starting location of row of data. Data is read/written using one disk access. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 66
  • 66. Accessing a row in chunked dataset Five seeks is needed to find each chunk. Data is read/written using five disk accesses. Chunking storage is less efficient than contiguous storage. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 67
  • 67. Quiz time • How might I improve this situation, if it is common to access my data in this way? Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 68
  • 68. Accessing data in contiguous dataset M rows M seeks are needed to find the starting location of the element. Data is read/written using M disk accesses. Performance may be very bad. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 69
  • 69. Motivation for chunking storage M rows Two seeks are needed to find two chunks. Data is read/written using two disk accesses. For this pattern chunking helps with I/O performance. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 70
  • 70. Motivation for chunk cache A B H5Dwrite H5Dwrite Selection shown is written by two H5Dwrite calls (one for each row). Chunks A and B are accessed twice (one time for each row). If both chunks fit into cache, only two I/O accesses needed to write the shown selections. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 71
  • 71. Motivation for chunk cache A B H5Dwrite H5Dwrite Question: What happens if there is a space for only one chunk at a time? Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 72
  • 72. Advanced Exercise • • • • Write data to a dataset Dataset is 512x2048, 4-byte native integers Chunks are 256x128: 128KB each, 2MB rows Write by rows Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 73
  • 73. Advanced Exercise • Very slow performance • What is going wrong? • Chunk cache is only 1MB by default Read into cache Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 74
  • 74. Advanced Exercise • Very slow performance • What is going wrong? • Chunk cache is only 1MB by default Write to disk Sep. 28-30, 2010 Read into cache HDF/HDF-EOS Workshop XIV 75
  • 75. Advanced Exercise • Very slow performance • What is going wrong? • Chunk cache is only 1MB by default Write to disk Sep. 28-30, 2010 Read into cache HDF/HDF-EOS Workshop XIV 76
  • 76. Advanced Exercise • Very slow performance • What is going wrong? • Chunk cache is only 1MB by default Write to disk Sep. 28-30, 2010 Read into cache HDF/HDF-EOS Workshop XIV 77
  • 77. Advanced Exercise • Very slow performance • What is going wrong? • Chunk cache is only 1MB by default Write to disk Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV Read into cache 78
  • 78. Advanced Exercise • Very slow performance • What is going wrong? • Chunk cache is only 1MB by default Write to disk Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV Read into cache 79
  • 79. Advanced Exercise • Very slow performance • What is going wrong? • Chunk cache is only 1MB by default Read into cache Sep. 28-30, 2010 Write to disk HDF/HDF-EOS Workshop XIV 80
  • 80. Advanced Exercise • Very slow performance • What is going wrong? • Chunk cache is only 1MB by default Read into cache Sep. 28-30, 2010 Write to disk HDF/HDF-EOS Workshop XIV 81
  • 81. Exercise 1 • Improve performance by changing only chunk size Access pattern is fixed, limited memory • One solution: 64x2048 chunks • Row of chunks fits in cache Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 82
  • 82. Exercise 2 • Improve performance by changing only access pattern • File already exists, cannot change chunk size • One solution: Access by chunk • Each selection fits in cache, contiguous on disk Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 83
  • 83. Exercise 3 • Improve performance while not changing chunk size or access pattern • No memory limitation • One solution: Chunk cache set to size of row of chunks Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 84
  • 84. Exercise 4 • Improve performance while not changing chunk size or access pattern • Chunk cache size can be set to max. 1MB • One solution: Disable chunk cache • Avoids repeatedly reading/writing whole chunks Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 85
  • 85. More Information • More detailed information on chunking and the chunk cache can be found in the draft “Chunking in HDF5” document at: http://www.hdfgroup.org/HDF5/doc/_topic/Chunking Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 86
  • 86. Thank You! Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 87
  • 87. Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. Sep. 28-30, 2010 HDF/HDF-EOS Workshop XIV 88