SlideShare ist ein Scribd-Unternehmen logo
1 von 51
Caching and Buffering in
HDF5
The HDF Group

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

1
Software stack and the “magic box”
• Life cycle: What happens to data when it is transferred from
application buffer to HDF5 file?

Application

Data buffer

Object API

H5Dwrite

Library internals

Magic box

Virtual file I/O

Unbuffered I/O

File or other “storage”
Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

Data in a file

2
Inside the magic box
• Understanding of what is happening to data inside the
magic box will help to write efficient applications
• HDF5 library has mechanisms to control behavior inside
the magic box
• Goals of this talk:
 Describe some basic operations and data structures and
explain how they affect performance and storage sizes
 Give some “recipes” for how to improve performance

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

3
Topics
• Dataset metadata and array data storage layouts
• Types of dataset storage layouts
• Factors affecting I/O performance
•
•
•
•

I/O with compact datasets
I/O with contiguous datasets
I/O with chunked datasets
Variable length data and I/O

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

4
HDF5 dataset metadata and
array data storage layouts

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

5
HDF5 Dataset
• Data array
• Ordered collection of identically typed data items
distinguished by their indices

• Metadata
•
•
•
•

Dataspace: Rank, dimensions of dataset array
Datatype: Information on how to interpret data
Storage Properties: How array is organized on disk
Attributes: User-defined metadata (optional)

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

6
Separate Components of a Dataset

Header

Data array

Dataspace

Rank

Dimensions

3

Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
IEEE 32-bit float

Storage info

Attributes
Time = 32.4

Chunked

Pressure = 987

Compressed

Temp = 56

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

7
Metadata cache and array data
• Dataset array data typically kept in application memory
• Dataset header in separate space – metadata cache
Metadata cache
Dataset header
………….
Datatype
Dataspace
………….
Attributes
…

Dataset array data

Application memory
HDF5 metadata

Dataset array data

File

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

8
Metadata and metadata cache
• HDF5 metadata
• Information about HDF5 objects used by the library
• Examples: object headers, B-tree nodes for group, B-Tree
nodes for chunks, heaps, super-block, etc.
• Usually small compared to raw data sizes (KB vs. MB-GB)

• Metadata cache
• Space allocated to handle pieces of the HDF5 metadata
• Allocated by the HDF5 library in application’s memory
space
• Cache behavior affects overall performance
• Metadata cache implementation prior to HDF5 1.6.5
could cause performance degradation for some
applications
Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

9
Types of data storage layouts

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

10
HDF5 datasets storage layouts
• Contiguous
• Chunked
• Compact

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

11
Contiguous storage layout
• Metadata header separate from raw data
• Raw data stored in one contiguous block on disk
Metadata cache

Dataset array data

Dataset header
………….
Datatype
Dataspace
………….
Attributes
…

Application memory

File

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

12
Chunked storage
• Chunking – storage layout where a dataset is partitioned
in fixed-size multi-dimensional tiles or chunks
• Used for extendible datasets and datasets with filters
applied (checksum, compression)
• HDF5 library treats each chunk as atomic object
• Greatly affects performance and file sizes

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

13
Chunked storage layout
• Raw data divided into equal sized blocks (chunks).
• Each chunk stored separately as a contiguous block on disk
Metadata cache

Dataset array data

Dataset header

A

………….
Datatype
Dataspace
………….
Attributes
…

File

B

C

D

Chunk
index

Application memory

header

Nov. 6, 2007

Chunk
index

A

C

HDF-EOS Workshop XI Tutorial

D
14

B
Compact storage layout
• Data array and metadata stored together in the header
Dataset header
………….
Datatype
Dataspace
………….
Attributes
…

Array data
Data

Metadata cache
Array data

Application memory

File*

* “File” may in fact be a collection of files, memory, or other storage destination.

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

15
Factors affecting I/O
performance

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

16
What goes on inside the magic box?
• Operations on data inside the magic box
• Copying to/from internal buffers
• Datatype conversion
• Scattering - gathering
• Data transformation (filters, compression)
• Data structures used
• B-trees (groups, dataset chunks)
• Hash tables
• Local and Global heaps (variable length data: link names,
strings, etc.)
• Other concepts
• HDF5 metadata, metadata cache
• Chunking, chunk cache
Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

17
Operations on data inside the magic box
• Copying to/from internal buffers
• Datatype conversion, such as
• float  integer
• LE  BE
• 64-bit integer to 16-bit integer

• Scattering - gathering
• Data is scattered/gathered from/to application buffers into
internal buffers for datatype conversion and partial I/O
• Data transformation (filters, compression)
• Checksum on raw data and metadata (in 1.8.0)
• Algebraic transform
• GZIP and SZIP compressions
• User-defined filters
Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

18
I/O performance depends on
•
•
•
•
•
•
•

Storage layouts
Dataset storage properties
Chunking strategy
Metadata cache performance
Datatype conversion performance
Other filters, such as compression
Access patterns

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

19
I/O with different storage
layouts

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

20
Writing compact dataset

Dataset header

Metadata cache

………….
Datatype
Dataspace
………….
Attributes
…

Array data
Data

Application memory
One write to store header and data array

File

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

21
Writing contiguous dataset – no conversion

Metadata cache
Dataset header
………….
Datatype
Dataspace
………….
Attributes
…

Dataset array data

Application memory

File

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

22
Writing a contiguous dataset with datatype conversion

Dataset header
………….
Datatype
Dataspace
………….
Attribute 1
Attribute 2
…………

Metadata cache

Dataset array data

Conversion buffer 1MB
Application memory

File

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

23
Partial I/O with contiguous
datasets

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

24
Writing whole dataset – contiguous rows
N

M
One I/O operation
Application data in memory
M rows

File

Nov. 6, 2007

Data is contiguous in a file

HDF-EOS Workshop XI Tutorial

25
Sub-setting of contiguous dataset
Series of adjacent rows
Application data in memory
N
M
One I/O operation

M rows
Subset – contiguous in file
File
Entire dataset – contiguous in file

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

26
Sub-setting of contiguous dataset
Adjacent, partial rows
Application data in memory
N
Several small I/O operation

M

N elements
File

Nov. 6, 2007

…

Data is scattered in a file in M contiguous blocks

HDF-EOS Workshop XI Tutorial

27
Sub-setting of contiguous dataset
Extreme case: writing a column
Application data in memory
N
Several small I/O operation

M

1 element

…

Subset data is scattered in a file in M different locations

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

28
Sub-setting of contiguous dataset
Data sieve buffer
Application data in memory
N

Data is gathered in a sieve buffer in memory 64K
memcopy

M

1 element
File

Nov. 6, 2007

…

Data is scattered in a file in M contiguous blocks

HDF-EOS Workshop XI Tutorial

29
Performance tuning for contiguous dataset
• Datatype conversion
• Avoid for better performance
• Use H5Pset_buffer function to customize
conversion buffer size

• Partial I/O
• Write/read in big contiguous blocks
• Use H5Pset_sieve_buf_size to improve
performance for complex subsetting

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

30
I/O with Chunking

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

31
Reminder – chunked storage layout

Metadata cache

Dataset array data

Dataset header

A

………….
Datatype
Dataspace
………….
Attributes
…

File

B

C

D

Chunk
index

Application memory

header

Nov. 6, 2007

Chunk
index

A

C

HDF-EOS Workshop XI Tutorial

D

32

B
Information about chunking
• HDF5 library treats each chunk as atomic object
• Compression is applied to each chunk
• Datatype conversion, other filters applied per chunk

• Chunk size greatly affects performance
• Chunk overhead adds to file size
• Chunk processing involves many steps

• Chunk cache
•
•
•
•
•
•

Caches chunks for better performance
Created for each chunked dataset
Size of chunk cache is set for file (default size 1MB)
Each chunked dataset has its own chunk cache
Chunk may be too big to fit into cache
Memory may grow if application keeps opening datasets

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

33
Chunk cache

Dataset_1 header

Metadata cache

…………
………

Dataset_N header Chunking B-tree nodes
…………

Chunk cache
Default size is 1MB

Application memory

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

34
Writing chunked dataset
Chunked dataset
A
C

Chunk cache
C

B

Filter pipeline

File

B

A

…………..

C

• Compression performed when chunk evicted from the chunk cache
• Other filters applied as data goes through filter pipeline

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

35
Partial I/O with Chunking

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

36
Partial I/O for chunked dataset
1

2

3

4

• Example: write the green subset from the dataset , converting
the data
• Dataset is stored as six chunks in the file.
• The subset spans four chunks, numbered 1-4 in the figure.
• Hence four chunks must be written to the file.
• But first, the four chunks must be read from the file, to preserve
those parts of each chunk that are not to be overwritten.
Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 37
Partial I/O for chunked dataset
• For each of the four chunks:
1

2

3

4

• Read chunk from file into chunk cache,
unless it’s already there.
• Determine which part of the chunk will be
replaced by the selection.
• Replace that part of the chunk in the cache
with the corresponding elements from the
application’s array.
• Move those elements to conversion buffer
and perform conversion
• Move those elements back from conversion
buffer to chunk cache.
• Apply filters (compression) when chunk is
flushed from chunk cache

• For each element 3 memcopy
performed
Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

38
Partial I/O for chunked dataset

Application buffer

Chunk cache

3

3

Chunk
memcopy
Elements participating in I/O are gathered into corresponding chunk
Application memory

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

39
Partial I/O for chunked dataset

Chunk cache
Memcopy
Conversion buffer
3

Memcopy

Application memory

Compress and write to file

File

Nov. 6, 2007

Chunk

HDF-EOS Workshop XI Tutorial

40
Variable length data and I/O

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

41
Examples of variable length data
• String
A[0] “the first string we want to write”
…………………………………
A[N-1] “the N-th string we want to write”

• Each element is a record of variable-length
A[0] (1,1,0,0,0,5,6,7,8,9) [length = 10]
A[1] (0,0,110,2005)
[length = 4]
………………………..
A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) [length = M]

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

42
Variable length data in HDF5
• Variable length description in HDF5 application
typedef struct {
size_t length;
void
*p;
}hvl_t;

• Base type can be any HDF5 type
H5Tvlen_create(base_type)

• ~ 20 bytes overhead for each element
• Data cannot be compressed

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

43
How variable length data is stored in HDF5

Actual variable
length data
Global
heap

File

Dataset header

Nov. 6, 2007

Dataset with
variable length
elements

Pointer into
global heap

HDF-EOS Workshop XI Tutorial

44
Variable length datasets and I/O
• When writing variable length data, elements in application
buffer point to global heaps in the metadata cache where
actual data is stored.
Raw data

Application buffer

Global
heap

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

45
There may be more than one global heap

Raw data

Application buffer

Global
heap
Global
heap

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

46
Variable length datasets and I/O
Raw data
Global
heap
Global
heap

File

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

47
VL chunked dataset in a file

Chunk B-tree

File

Dataset header
Heaps with
VL data

Nov. 6, 2007

Dataset chunks

HDF-EOS Workshop XI Tutorial

48
Writing chunked VL datasets
Metadata cache

B-tree nodes

Chunk cache

Dataset header
…………

Application memory

Global heap

………
Raw data

Chunk cache
Conversion buffer
Filter pipeline

VL chunked dataset with selected region

File

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

49
Hints for variable length data I/O
• Avoid closing/opening a file while writing VL datasets
• Global heap information is lost
• Global heaps may have unused space

• Avoid alternately writing different VL datasets
• Data from different datasets will go into to the same heap

• If maximum length of the record is known, consider
using fixed-length records and compression

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

50
Thank you!

Questions ?

Nov. 6, 2007

HDF-EOS Workshop XI Tutorial

51

Weitere ähnliche Inhalte

Was ist angesagt?

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

Was ist angesagt? (20)

HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
HDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary ResultsHDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary Results
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 I/O Performance
HDF5 I/O PerformanceHDF5 I/O Performance
HDF5 I/O Performance
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
MODIS Land and HDF-EOS
MODIS Land and HDF-EOSMODIS Land and HDF-EOS
MODIS Land and HDF-EOS
 
Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
 
NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
NetCDF and HDF5
 
Efficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAPEfficiently serving HDF5 via OPeNDAP
Efficiently serving HDF5 via OPeNDAP
 
HDF Project Update
HDF Project UpdateHDF Project Update
HDF Project Update
 
NEON HDF5
NEON HDF5NEON HDF5
NEON HDF5
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
Open-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDFOpen-source Scientific Computing and Data Analytics using HDF
Open-source Scientific Computing and Data Analytics using HDF
 
Easy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAPEasy Access of NASA HDF data via OPeNDAP
Easy Access of NASA HDF data via OPeNDAP
 
HDF Tools Tutorial
HDF Tools TutorialHDF Tools Tutorial
HDF Tools Tutorial
 

Ähnlich wie Caching and Buffering in HDF5

Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
co-Hadoop: Data co-location on Hadoop.
co-Hadoop: Data co-location on Hadoop.co-Hadoop: Data co-location on Hadoop.
co-Hadoop: Data co-location on Hadoop.Yousef Fadila
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
A quick start guide to using HDF5 files in GLOBE Claritas
A quick start guide to using HDF5 files in GLOBE ClaritasA quick start guide to using HDF5 files in GLOBE Claritas
A quick start guide to using HDF5 files in GLOBE ClaritasGuy Maslen
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object Sandeep Patil
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?gvernik
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it bettergvernik
 

Ähnlich wie Caching and Buffering in HDF5 (20)

HDF5 Life cycle of data
HDF5 Life cycle of dataHDF5 Life cycle of data
HDF5 Life cycle of data
 
Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Migrating from HDF5 1.6 to 1.8
Migrating from HDF5 1.6 to 1.8Migrating from HDF5 1.6 to 1.8
Migrating from HDF5 1.6 to 1.8
 
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIsIntroduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
co-Hadoop: Data co-location on Hadoop.
co-Hadoop: Data co-location on Hadoop.co-Hadoop: Data co-location on Hadoop.
co-Hadoop: Data co-location on Hadoop.
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - ChunkingHDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - Chunking
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Update on HDF5 1.8
Update on HDF5 1.8Update on HDF5 1.8
Update on HDF5 1.8
 
A quick start guide to using HDF5 files in GLOBE Claritas
A quick start guide to using HDF5 files in GLOBE ClaritasA quick start guide to using HDF5 files in GLOBE Claritas
A quick start guide to using HDF5 files in GLOBE Claritas
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it better
 
Metadata Requirements for EOSDIS Data Providers
Metadata Requirements for EOSDIS Data ProvidersMetadata Requirements for EOSDIS Data Providers
Metadata Requirements for EOSDIS Data Providers
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 

Mehr von The HDF-EOS Tools and Information Center

Mehr von The HDF-EOS Tools and Information Center (18)

Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
HDF for the Cloud
HDF for the CloudHDF for the Cloud
HDF for the Cloud
 
S3 VFD
S3 VFDS3 VFD
S3 VFD
 
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Data in the Cloud
 

Kürzlich hochgeladen

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Caching and Buffering in HDF5

  • 1. Caching and Buffering in HDF5 The HDF Group Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 1
  • 2. Software stack and the “magic box” • Life cycle: What happens to data when it is transferred from application buffer to HDF5 file? Application Data buffer Object API H5Dwrite Library internals Magic box Virtual file I/O Unbuffered I/O File or other “storage” Nov. 6, 2007 HDF-EOS Workshop XI Tutorial Data in a file 2
  • 3. Inside the magic box • Understanding of what is happening to data inside the magic box will help to write efficient applications • HDF5 library has mechanisms to control behavior inside the magic box • Goals of this talk:  Describe some basic operations and data structures and explain how they affect performance and storage sizes  Give some “recipes” for how to improve performance Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 3
  • 4. Topics • Dataset metadata and array data storage layouts • Types of dataset storage layouts • Factors affecting I/O performance • • • • I/O with compact datasets I/O with contiguous datasets I/O with chunked datasets Variable length data and I/O Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 4
  • 5. HDF5 dataset metadata and array data storage layouts Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 5
  • 6. HDF5 Dataset • Data array • Ordered collection of identically typed data items distinguished by their indices • Metadata • • • • Dataspace: Rank, dimensions of dataset array Datatype: Information on how to interpret data Storage Properties: How array is organized on disk Attributes: User-defined metadata (optional) Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 6
  • 7. Separate Components of a Dataset Header Data array Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Storage info Attributes Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 7
  • 8. Metadata cache and array data • Dataset array data typically kept in application memory • Dataset header in separate space – metadata cache Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … Dataset array data Application memory HDF5 metadata Dataset array data File Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 8
  • 9. Metadata and metadata cache • HDF5 metadata • Information about HDF5 objects used by the library • Examples: object headers, B-tree nodes for group, B-Tree nodes for chunks, heaps, super-block, etc. • Usually small compared to raw data sizes (KB vs. MB-GB) • Metadata cache • Space allocated to handle pieces of the HDF5 metadata • Allocated by the HDF5 library in application’s memory space • Cache behavior affects overall performance • Metadata cache implementation prior to HDF5 1.6.5 could cause performance degradation for some applications Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 9
  • 10. Types of data storage layouts Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 10
  • 11. HDF5 datasets storage layouts • Contiguous • Chunked • Compact Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 11
  • 12. Contiguous storage layout • Metadata header separate from raw data • Raw data stored in one contiguous block on disk Metadata cache Dataset array data Dataset header …………. Datatype Dataspace …………. Attributes … Application memory File Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 12
  • 13. Chunked storage • Chunking – storage layout where a dataset is partitioned in fixed-size multi-dimensional tiles or chunks • Used for extendible datasets and datasets with filters applied (checksum, compression) • HDF5 library treats each chunk as atomic object • Greatly affects performance and file sizes Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 13
  • 14. Chunked storage layout • Raw data divided into equal sized blocks (chunks). • Each chunk stored separately as a contiguous block on disk Metadata cache Dataset array data Dataset header A …………. Datatype Dataspace …………. Attributes … File B C D Chunk index Application memory header Nov. 6, 2007 Chunk index A C HDF-EOS Workshop XI Tutorial D 14 B
  • 15. Compact storage layout • Data array and metadata stored together in the header Dataset header …………. Datatype Dataspace …………. Attributes … Array data Data Metadata cache Array data Application memory File* * “File” may in fact be a collection of files, memory, or other storage destination. Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 15
  • 16. Factors affecting I/O performance Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 16
  • 17. What goes on inside the magic box? • Operations on data inside the magic box • Copying to/from internal buffers • Datatype conversion • Scattering - gathering • Data transformation (filters, compression) • Data structures used • B-trees (groups, dataset chunks) • Hash tables • Local and Global heaps (variable length data: link names, strings, etc.) • Other concepts • HDF5 metadata, metadata cache • Chunking, chunk cache Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 17
  • 18. Operations on data inside the magic box • Copying to/from internal buffers • Datatype conversion, such as • float  integer • LE  BE • 64-bit integer to 16-bit integer • Scattering - gathering • Data is scattered/gathered from/to application buffers into internal buffers for datatype conversion and partial I/O • Data transformation (filters, compression) • Checksum on raw data and metadata (in 1.8.0) • Algebraic transform • GZIP and SZIP compressions • User-defined filters Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 18
  • 19. I/O performance depends on • • • • • • • Storage layouts Dataset storage properties Chunking strategy Metadata cache performance Datatype conversion performance Other filters, such as compression Access patterns Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 19
  • 20. I/O with different storage layouts Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 20
  • 21. Writing compact dataset Dataset header Metadata cache …………. Datatype Dataspace …………. Attributes … Array data Data Application memory One write to store header and data array File Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 21
  • 22. Writing contiguous dataset – no conversion Metadata cache Dataset header …………. Datatype Dataspace …………. Attributes … Dataset array data Application memory File Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 22
  • 23. Writing a contiguous dataset with datatype conversion Dataset header …………. Datatype Dataspace …………. Attribute 1 Attribute 2 ………… Metadata cache Dataset array data Conversion buffer 1MB Application memory File Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 23
  • 24. Partial I/O with contiguous datasets Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 24
  • 25. Writing whole dataset – contiguous rows N M One I/O operation Application data in memory M rows File Nov. 6, 2007 Data is contiguous in a file HDF-EOS Workshop XI Tutorial 25
  • 26. Sub-setting of contiguous dataset Series of adjacent rows Application data in memory N M One I/O operation M rows Subset – contiguous in file File Entire dataset – contiguous in file Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 26
  • 27. Sub-setting of contiguous dataset Adjacent, partial rows Application data in memory N Several small I/O operation M N elements File Nov. 6, 2007 … Data is scattered in a file in M contiguous blocks HDF-EOS Workshop XI Tutorial 27
  • 28. Sub-setting of contiguous dataset Extreme case: writing a column Application data in memory N Several small I/O operation M 1 element … Subset data is scattered in a file in M different locations Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 28
  • 29. Sub-setting of contiguous dataset Data sieve buffer Application data in memory N Data is gathered in a sieve buffer in memory 64K memcopy M 1 element File Nov. 6, 2007 … Data is scattered in a file in M contiguous blocks HDF-EOS Workshop XI Tutorial 29
  • 30. Performance tuning for contiguous dataset • Datatype conversion • Avoid for better performance • Use H5Pset_buffer function to customize conversion buffer size • Partial I/O • Write/read in big contiguous blocks • Use H5Pset_sieve_buf_size to improve performance for complex subsetting Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 30
  • 31. I/O with Chunking Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 31
  • 32. Reminder – chunked storage layout Metadata cache Dataset array data Dataset header A …………. Datatype Dataspace …………. Attributes … File B C D Chunk index Application memory header Nov. 6, 2007 Chunk index A C HDF-EOS Workshop XI Tutorial D 32 B
  • 33. Information about chunking • HDF5 library treats each chunk as atomic object • Compression is applied to each chunk • Datatype conversion, other filters applied per chunk • Chunk size greatly affects performance • Chunk overhead adds to file size • Chunk processing involves many steps • Chunk cache • • • • • • Caches chunks for better performance Created for each chunked dataset Size of chunk cache is set for file (default size 1MB) Each chunked dataset has its own chunk cache Chunk may be too big to fit into cache Memory may grow if application keeps opening datasets Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 33
  • 34. Chunk cache Dataset_1 header Metadata cache ………… ……… Dataset_N header Chunking B-tree nodes ………… Chunk cache Default size is 1MB Application memory Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 34
  • 35. Writing chunked dataset Chunked dataset A C Chunk cache C B Filter pipeline File B A ………….. C • Compression performed when chunk evicted from the chunk cache • Other filters applied as data goes through filter pipeline Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 35
  • 36. Partial I/O with Chunking Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 36
  • 37. Partial I/O for chunked dataset 1 2 3 4 • Example: write the green subset from the dataset , converting the data • Dataset is stored as six chunks in the file. • The subset spans four chunks, numbered 1-4 in the figure. • Hence four chunks must be written to the file. • But first, the four chunks must be read from the file, to preserve those parts of each chunk that are not to be overwritten. Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 37
  • 38. Partial I/O for chunked dataset • For each of the four chunks: 1 2 3 4 • Read chunk from file into chunk cache, unless it’s already there. • Determine which part of the chunk will be replaced by the selection. • Replace that part of the chunk in the cache with the corresponding elements from the application’s array. • Move those elements to conversion buffer and perform conversion • Move those elements back from conversion buffer to chunk cache. • Apply filters (compression) when chunk is flushed from chunk cache • For each element 3 memcopy performed Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 38
  • 39. Partial I/O for chunked dataset Application buffer Chunk cache 3 3 Chunk memcopy Elements participating in I/O are gathered into corresponding chunk Application memory Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 39
  • 40. Partial I/O for chunked dataset Chunk cache Memcopy Conversion buffer 3 Memcopy Application memory Compress and write to file File Nov. 6, 2007 Chunk HDF-EOS Workshop XI Tutorial 40
  • 41. Variable length data and I/O Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 41
  • 42. Examples of variable length data • String A[0] “the first string we want to write” ………………………………… A[N-1] “the N-th string we want to write” • Each element is a record of variable-length A[0] (1,1,0,0,0,5,6,7,8,9) [length = 10] A[1] (0,0,110,2005) [length = 4] ……………………….. A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) [length = M] Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 42
  • 43. Variable length data in HDF5 • Variable length description in HDF5 application typedef struct { size_t length; void *p; }hvl_t; • Base type can be any HDF5 type H5Tvlen_create(base_type) • ~ 20 bytes overhead for each element • Data cannot be compressed Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 43
  • 44. How variable length data is stored in HDF5 Actual variable length data Global heap File Dataset header Nov. 6, 2007 Dataset with variable length elements Pointer into global heap HDF-EOS Workshop XI Tutorial 44
  • 45. Variable length datasets and I/O • When writing variable length data, elements in application buffer point to global heaps in the metadata cache where actual data is stored. Raw data Application buffer Global heap Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 45
  • 46. There may be more than one global heap Raw data Application buffer Global heap Global heap Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 46
  • 47. Variable length datasets and I/O Raw data Global heap Global heap File Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 47
  • 48. VL chunked dataset in a file Chunk B-tree File Dataset header Heaps with VL data Nov. 6, 2007 Dataset chunks HDF-EOS Workshop XI Tutorial 48
  • 49. Writing chunked VL datasets Metadata cache B-tree nodes Chunk cache Dataset header ………… Application memory Global heap ……… Raw data Chunk cache Conversion buffer Filter pipeline VL chunked dataset with selected region File Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 49
  • 50. Hints for variable length data I/O • Avoid closing/opening a file while writing VL datasets • Global heap information is lost • Global heaps may have unused space • Avoid alternately writing different VL datasets • Data from different datasets will go into to the same heap • If maximum length of the record is known, consider using fixed-length records and compression Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 50
  • 51. Thank you! Questions ? Nov. 6, 2007 HDF-EOS Workshop XI Tutorial 51