SlideShare ist ein Scribd-Unternehmen logo
1 von 33
HDF4 and HDF5 Performance
Preliminary Results

Elena Pourmal
IV HDF-EOS Workshop
September 19 - 21 2000
Why compare?
• HDF5 emerges as a new standard
– proved to be robust
– most of the planned features have been
implemented in HDF5-1.2.2
– has a lot of new features compared to HDF4
– time for performance study and tuning

• Users move their data and applications to
HDF5
• HDF4 is not “bad,” but has limited
capabilities
HDF5
•
•
•

•
•
•
•

•
•

•

Files over 2GB
Unlimited number of objects
One data model
(multidimensional array of
structures)
|| support
Thread safe
Mounting files
Diversity of datatypes
(compound, VL, opaque) and
operations (create, write, read,
delete, shared)
“Native” file is portable
Modifiable I/O pipe-line
(registration of compression
methods)
Selections (unions and regular
blocks)

HDF4
•
•
•

Files less than 2GB
Max limit 20000 of objects
Different data models for SD,
GR, RI, Vdatas

•
•
•
•

N/A
N/A
N/A
Only predefined datatypes such
as float32, int16, char8

•
•

“Native” file is not portable
N/A

•

Selections (simple regular
subsampling)
What to compare?
(short list of common features)
• File I/O operations
–
–
–
–
–

plain read and write
hyperslab selections
regular subsampling
access to large number of objects
storage overhead

• Data organization in the file and access to it
– Vdata vs compound datasets

• Chunking, unlimited dimensions, compression
Benchmark Environment
• 440-Mhz UltraSPARC i-IIi
– 1G memory
– Sun OS 5.7
– gettimeofday()

• 2 - 550 Mhz Pentium III Xeon
– 1G memory
– RedHat 6.2
– clock()

• each measurement was taken 10 times, average
and best times were collected
Benchmarks
• Writing 1Dim and 2Dim datasets of integers
• Reading 2Dim contiguous hyperslabs of integers
• Reading 2Dim contiguous hyperslabs of integers
with subsampling
• Reading fixed size hyperslabs of integers from
different locations in the dataset
• Writing and reading Vdatas and Compound
Datasets
• CERES data
Writing 1Dim and 2Dim Datasets
Writing 1Dim Datasets
• In this test we created one-dimensional arrays of integers
with sizes varying from 8Kbytes to 8000 Kbytes in steps
of 8Kbytes. We measured the average and best times for
writing these arrays into HDF4 and HDF5 files.
• Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
Writing 1Dim Datasets

2
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
7896

7432

6968

6504

6040

5576

5112

4648

4184

3720

3256

2792

2328

1864

1400

936

472

HDF4
HDF5

8

Time (seconds)

Writing 1Dim dataset (best time)

Dataset size (Kbytes)

HDF5 performs about 8 times better than HDF4.
System activity affects timing results.
Writing 2Dim Datasets
• In this test we created two-dimensional arrays with sizes
varying from 40 X 40 bytes to 4000 X 4000 bytes in steps
of 40 bytes for each dimension. We measured the average
and best times for writing these arrays into HDF4 and
HDF5 files. The graphs were plotted by averaging the
values obtained for the same array size, without
considering the shape of the array.
• Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
Writing 2Dim Datasets
Writing 2Dim Datasets (best time)

Time (microseconds)

450000
400000
350000
300000
250000

HDF4
HDF5

200000
150000
100000
50000
3563

2883

2490

2188

1920

1684

1466

1266

1076

899

732

577

432

302

183

79.3

0.39

0

Dataset size (Kbytes)

HDF4 shows nonlinear growth. HDF5 performs about 10 times better
than HDF4.
Reading 2Dim Contiguous
Hyperslabs
Reading Contiguous Hyperslabs
• In this test we created a file with 1000 X 1000 array of
integers. Subsequently, we read hyperslabs of different
sizes starting from a fixed position in the array and the
measurements for read were averaged over 10 runs. HDF51.2.2, HDF5-1.2.2-patched and HDF5 development
libraries were tested.
• Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
Reading Hyperslabs

250000
200000
150000

HDF4
HDF5

100000
50000
8E+05

7E+05

6E+05

5E+05

4E+05

3E+05

3E+05

2E+05

2E+05

1E+05

64800

27900

0
100

Time (microseconds)

Hyperslab selection, best time
HDF5-1.2.2

Size of hyperslab (number of elements)

For hyperslabs > 1MB, HDF5 becomes more than 3 times slower
than HDF4. It also shows nonlinear growth.
Reading Hyperslabs
(latest version of the HDF5 development branch)

100000
80000
60000

HDF4
HDF5

40000
20000
8E+05

7E+05

6E+05

5E+05

4E+05

3E+05

3E+05

2E+05

2E+05

1E+05

64500

27600

0
100

Time (microseconds)

Hyperslab selection, best time
HDF5 development branch

Size of hyperslab (number of elements)

For hyperslabs > 2MB, HDF5 becomes more about 1.5 times slower
than HDF4. It still shows nonlinear growth.
Reading contiguous hyperslabs
(fixed size)
• In this test, the size of the hyperslab was fixed to 100x100
elements. The hyperslab was moved, first along the X
axis, then along the Y axis, and finally along the diagonal
and the read performance was measured.
• Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
Reading 100x100 Hyperslabs from Different Locations
Selection of 100x100 hyperslab
(best time)

Time (microseconds)

6000
5000
HDF4
HDF5-1.2.2
HDF5-1.2.2-patched
HDF5 development

4000
3000
2000
1000
0
1

2

3

4

5

6

7

8

9

10

Events

For small hyperslabs HDF5 performs about 3 times better than HDF4.
Reading Hyperslabs with
Subsampling
Subsampling Hyperslabs
• In this test we created a file with 1000x1000 array of
integers. Subsequently, we read every second element of
the hyperslabs of different sizes starting from a fixed
position in the array and the measurements for read were
averaged over 10 runs. HDF5-1.2.2, and HDF5
development libraries were tested.
• Test was performed on Solaris platform. Neither HDF4 nor
HDF5 performed data conversion.
Reading Each Second Element of the Hyperslabs

Hyperslabs with subsampling each second
element (best time)
35
Time (seconds)

30
25
20

HDF4

15

HDF5

10
5
3E+05

3E+05

3E+05

2E+05

2E+05

2E+05

2E+05

1E+05

1E+05

1E+05

91000

74700

59500

45500

32000

19600

8900

100

0

Size of hyperslab (number of elements)

HDF5 shows nonlinear growth. HDF4 performs about 3 times
for the hyperslabs with the size > .5MB
First Attempt to Improve the Performance

Hyperslabs with selection (best time)
30

Time (minutes)

25
20

HDF4
HDF5
HDF5 (latest)

15
10
5

3E+05

2E+05

2E+05

2E+05

2E+05

2E+05

1E+05

1E+05

97600

80300

63900

48500

34200

21000

9400

100

0

Size of hyperslab

HDF4 still performs 2 times better for the hyperslabs > 2MB.
HDF5 shows nonlinear growth.
Current Behavior (HDF5 development branch)

18
16
14
12
10
8
6
4
2
0
3E+05

3E+05

2E+05

2E+05

2E+05

2E+05

2E+05

1E+05

1E+05

1E+05

85800

70800

56400

42700

30000

18600

8500

HDF4
HDF5

100

Time (seconds)

Hyperslab with selection
(best time)

Hyperslab size (number of elements)

HDF5 growth linear and performs about 10 times better than HDF4 .
Vdatas vs Compound Datasets
Vdatas and Compound Datasets
• In this test we created HDF4 files with Vdata and HDF5
files with compound dataset with sizes from 1000 to
1000000 number of records:
• float a; short b;float c[3]; char d;
• write operation, write with packing data and partial read
were tested.
• Test was performed on Linux platforms. We also looked
into data conversion issues.
Writing Data (VSwrite and H5Dwrite)

1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0

HDF4 native

4E+05

4E+05

4E+05

3E+05

3E+05

3E+05

2E+05

2E+05

1E+05

1E+05

75000

38000

HDF4 with conversion
HDF5 native
HDF5 with conversion
1000

Time ( in seconds)

Writing Vdatas and Compound Datasets
(average time)

Number of records (19bytes each)

Conversion does not affect HDF4 performance. It does affect
HDF5 ( more than in 15 times)
Writing Data
(timing includes packing:VSpack and H5Tpack)

3.5
3
2.5
2
1.5
1
0.5
0
9E+05

9E+05

8E+05

7E+05

6E+05

6E+05

5E+05

4E+05

4E+05

3E+05

2E+05

1E+05

73000

HDF4
HDF4 with packing
HDF5
HDF5 with packing

1000

Time (seconds)

Writing Vdatas and Compound Datasets
Effect of data packing in HDF4 and HDF5
(average time)

Number of records

Data packing was added to the previous test. For HDF5 we have
very small effect.
Reading Two Fields

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

HDF4
HDF4 without unpcking
data

9E+05

9E+05

8E+05

7E+05

6E+05

5E+05

5E+05

4E+05

3E+05

2E+05

2E+05

79000

HDF5

1000

Time ( seconds)

Reading Vdatas and Compound datasets
Native read
(average time)

Number of records

Unpacking slows down HDF4 significantly ( about 8 times)
HDF5 was reading packed data in this test.
CERES Data File
Structure of CERES file
Vgroup
CERES_ES8

Vgroup
Data Fields

18

SDS

Vgroup
Geolocation Fields

19

Vdata

2

1

SDS

Vdata
Ceres File
• Used H4toH5 converter to create an HDF5
version of the file
– 81MB (HDF4), 80MB (HDF5)
– 1 min 55 sec on Linux
– 3 min 56 sec on Solaris

• Benchmarks
– read up to 14 datasets (2148x660 floats)
– subsampling: read two columns from the same
datasets

• Benchmark was run on Solaris and Linux
platforms
Reading CERES data on big and little - endian machines
Reading CERES data on different platforms
(best times)
3

Time (seconds)

2.5
2

HDF4 (LE)
HDF5 (LE)
HDF4 (BE)
HDF5 (BE)

1.5
1
0.5
0
1

2

3

4

5

6

7

8

9 10 11 12 13 14

Number of 2148x660 datasets read

On Solaris platform, HDF5 was twice faster than HDF4.
On Linux (data conversion is on), HDF4 was about 1.3-1.5 faster.
Subsetting CERES Data

Selection of two columns from 2148x660 CERES
dataset
(best times)

Time (seconds)

25
20
HDF4
HDF5
HDF5 tuned

15
10
5
0
1

2

3

4

5

6

7

8

9 10 11 12 13 14

Number of datasets

Current version of HDF5 shows about 3 times better performance.
Conclusion
• Goal: tune HDF5 and give our users
recommendations on its efficient usage
• Continue to study HDF4 and HDF5 performance
– try more platforms: O2K, NT/Windows
– try other features (e.g. chunking, compression)
– specific HDF5 features (e.g. writing/reading big files,
VL datatypes, compound datatypes, selections)

• Users input is necessary, send us access patterns
you use!
• Results will be available @http://hdf.ncsa.uiuc.edu

Weitere ähnliche Inhalte

Was ist angesagt?

Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
The HDF-EOS Tools and Information Center
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
Takrim Ul Islam Laskar
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
larsgeorge
 

Was ist angesagt? (20)

NetCDF and HDF5
NetCDF and HDF5NetCDF and HDF5
NetCDF and HDF5
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
 
Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5Data Analytics using MATLAB and HDF5
Data Analytics using MATLAB and HDF5
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Introduction to NetCDF-4
Introduction to NetCDF-4Introduction to NetCDF-4
Introduction to NetCDF-4
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and ToolsStatus of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
co-Hadoop: Data co-location on Hadoop.
co-Hadoop: Data co-location on Hadoop.co-Hadoop: Data co-location on Hadoop.
co-Hadoop: Data co-location on Hadoop.
 
SQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - HadoopSQLRally Amsterdam 2013 - Hadoop
SQLRally Amsterdam 2013 - Hadoop
 

Ähnlich wie HDF4 and HDF5 Performance Preliminary Results

Update on HDF5 1.8
Update on HDF5 1.8Update on HDF5 1.8

Ähnlich wie HDF4 and HDF5 Performance Preliminary Results (20)

HDF5 Advanced Topics
HDF5 Advanced TopicsHDF5 Advanced Topics
HDF5 Advanced Topics
 
Using HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshootingUsing HDF5 tools for performance tuning and troubleshooting
Using HDF5 tools for performance tuning and troubleshooting
 
Introduction to HDF5
Introduction to HDF5Introduction to HDF5
Introduction to HDF5
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - ChunkingHDF5 Advanced Topics - Chunking
HDF5 Advanced Topics - Chunking
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
HDF5 Life cycle of data
HDF5 Life cycle of dataHDF5 Life cycle of data
HDF5 Life cycle of data
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 
FITSIO, HDF4, NetCDF, PDB and HDF5 Performance - Some Benchmark Results
FITSIO, HDF4, NetCDF, PDB and HDF5 Performance - Some Benchmark ResultsFITSIO, HDF4, NetCDF, PDB and HDF5 Performance - Some Benchmark Results
FITSIO, HDF4, NetCDF, PDB and HDF5 Performance - Some Benchmark Results
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
HDF5 Tools Updates
HDF5 Tools UpdatesHDF5 Tools Updates
HDF5 Tools Updates
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
Comparative Performance Analysis of AWS EC2 Instance Types Commonly Used for ...
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
Update on HDF5 1.8
Update on HDF5 1.8Update on HDF5 1.8
Update on HDF5 1.8
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 

Mehr von The HDF-EOS Tools and Information Center

Mehr von The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 

KĂźrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

KĂźrzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

HDF4 and HDF5 Performance Preliminary Results

  • 1. HDF4 and HDF5 Performance Preliminary Results Elena Pourmal IV HDF-EOS Workshop September 19 - 21 2000
  • 2. Why compare? • HDF5 emerges as a new standard – proved to be robust – most of the planned features have been implemented in HDF5-1.2.2 – has a lot of new features compared to HDF4 – time for performance study and tuning • Users move their data and applications to HDF5 • HDF4 is not “bad,” but has limited capabilities
  • 3. HDF5 • • • • • • • • • • Files over 2GB Unlimited number of objects One data model (multidimensional array of structures) || support Thread safe Mounting files Diversity of datatypes (compound, VL, opaque) and operations (create, write, read, delete, shared) “Native” file is portable Modifiable I/O pipe-line (registration of compression methods) Selections (unions and regular blocks) HDF4 • • • Files less than 2GB Max limit 20000 of objects Different data models for SD, GR, RI, Vdatas • • • • N/A N/A N/A Only predefined datatypes such as float32, int16, char8 • • “Native” file is not portable N/A • Selections (simple regular subsampling)
  • 4. What to compare? (short list of common features) • File I/O operations – – – – – plain read and write hyperslab selections regular subsampling access to large number of objects storage overhead • Data organization in the file and access to it – Vdata vs compound datasets • Chunking, unlimited dimensions, compression
  • 5. Benchmark Environment • 440-Mhz UltraSPARC i-IIi – 1G memory – Sun OS 5.7 – gettimeofday() • 2 - 550 Mhz Pentium III Xeon – 1G memory – RedHat 6.2 – clock() • each measurement was taken 10 times, average and best times were collected
  • 6. Benchmarks • Writing 1Dim and 2Dim datasets of integers • Reading 2Dim contiguous hyperslabs of integers • Reading 2Dim contiguous hyperslabs of integers with subsampling • Reading fixed size hyperslabs of integers from different locations in the dataset • Writing and reading Vdatas and Compound Datasets • CERES data
  • 7. Writing 1Dim and 2Dim Datasets
  • 8. Writing 1Dim Datasets • In this test we created one-dimensional arrays of integers with sizes varying from 8Kbytes to 8000 Kbytes in steps of 8Kbytes. We measured the average and best times for writing these arrays into HDF4 and HDF5 files. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.
  • 9. Writing 1Dim Datasets 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 7896 7432 6968 6504 6040 5576 5112 4648 4184 3720 3256 2792 2328 1864 1400 936 472 HDF4 HDF5 8 Time (seconds) Writing 1Dim dataset (best time) Dataset size (Kbytes) HDF5 performs about 8 times better than HDF4. System activity affects timing results.
  • 10. Writing 2Dim Datasets • In this test we created two-dimensional arrays with sizes varying from 40 X 40 bytes to 4000 X 4000 bytes in steps of 40 bytes for each dimension. We measured the average and best times for writing these arrays into HDF4 and HDF5 files. The graphs were plotted by averaging the values obtained for the same array size, without considering the shape of the array. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.
  • 11. Writing 2Dim Datasets Writing 2Dim Datasets (best time) Time (microseconds) 450000 400000 350000 300000 250000 HDF4 HDF5 200000 150000 100000 50000 3563 2883 2490 2188 1920 1684 1466 1266 1076 899 732 577 432 302 183 79.3 0.39 0 Dataset size (Kbytes) HDF4 shows nonlinear growth. HDF5 performs about 10 times better than HDF4.
  • 13. Reading Contiguous Hyperslabs • In this test we created a file with 1000 X 1000 array of integers. Subsequently, we read hyperslabs of different sizes starting from a fixed position in the array and the measurements for read were averaged over 10 runs. HDF51.2.2, HDF5-1.2.2-patched and HDF5 development libraries were tested. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.
  • 14. Reading Hyperslabs 250000 200000 150000 HDF4 HDF5 100000 50000 8E+05 7E+05 6E+05 5E+05 4E+05 3E+05 3E+05 2E+05 2E+05 1E+05 64800 27900 0 100 Time (microseconds) Hyperslab selection, best time HDF5-1.2.2 Size of hyperslab (number of elements) For hyperslabs > 1MB, HDF5 becomes more than 3 times slower than HDF4. It also shows nonlinear growth.
  • 15. Reading Hyperslabs (latest version of the HDF5 development branch) 100000 80000 60000 HDF4 HDF5 40000 20000 8E+05 7E+05 6E+05 5E+05 4E+05 3E+05 3E+05 2E+05 2E+05 1E+05 64500 27600 0 100 Time (microseconds) Hyperslab selection, best time HDF5 development branch Size of hyperslab (number of elements) For hyperslabs > 2MB, HDF5 becomes more about 1.5 times slower than HDF4. It still shows nonlinear growth.
  • 16. Reading contiguous hyperslabs (fixed size) • In this test, the size of the hyperslab was fixed to 100x100 elements. The hyperslab was moved, first along the X axis, then along the Y axis, and finally along the diagonal and the read performance was measured. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.
  • 17. Reading 100x100 Hyperslabs from Different Locations Selection of 100x100 hyperslab (best time) Time (microseconds) 6000 5000 HDF4 HDF5-1.2.2 HDF5-1.2.2-patched HDF5 development 4000 3000 2000 1000 0 1 2 3 4 5 6 7 8 9 10 Events For small hyperslabs HDF5 performs about 3 times better than HDF4.
  • 19. Subsampling Hyperslabs • In this test we created a file with 1000x1000 array of integers. Subsequently, we read every second element of the hyperslabs of different sizes starting from a fixed position in the array and the measurements for read were averaged over 10 runs. HDF5-1.2.2, and HDF5 development libraries were tested. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.
  • 20. Reading Each Second Element of the Hyperslabs Hyperslabs with subsampling each second element (best time) 35 Time (seconds) 30 25 20 HDF4 15 HDF5 10 5 3E+05 3E+05 3E+05 2E+05 2E+05 2E+05 2E+05 1E+05 1E+05 1E+05 91000 74700 59500 45500 32000 19600 8900 100 0 Size of hyperslab (number of elements) HDF5 shows nonlinear growth. HDF4 performs about 3 times for the hyperslabs with the size > .5MB
  • 21. First Attempt to Improve the Performance Hyperslabs with selection (best time) 30 Time (minutes) 25 20 HDF4 HDF5 HDF5 (latest) 15 10 5 3E+05 2E+05 2E+05 2E+05 2E+05 2E+05 1E+05 1E+05 97600 80300 63900 48500 34200 21000 9400 100 0 Size of hyperslab HDF4 still performs 2 times better for the hyperslabs > 2MB. HDF5 shows nonlinear growth.
  • 22. Current Behavior (HDF5 development branch) 18 16 14 12 10 8 6 4 2 0 3E+05 3E+05 2E+05 2E+05 2E+05 2E+05 2E+05 1E+05 1E+05 1E+05 85800 70800 56400 42700 30000 18600 8500 HDF4 HDF5 100 Time (seconds) Hyperslab with selection (best time) Hyperslab size (number of elements) HDF5 growth linear and performs about 10 times better than HDF4 .
  • 23. Vdatas vs Compound Datasets
  • 24. Vdatas and Compound Datasets • In this test we created HDF4 files with Vdata and HDF5 files with compound dataset with sizes from 1000 to 1000000 number of records: • float a; short b;float c[3]; char d; • write operation, write with packing data and partial read were tested. • Test was performed on Linux platforms. We also looked into data conversion issues.
  • 25. Writing Data (VSwrite and H5Dwrite) 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 HDF4 native 4E+05 4E+05 4E+05 3E+05 3E+05 3E+05 2E+05 2E+05 1E+05 1E+05 75000 38000 HDF4 with conversion HDF5 native HDF5 with conversion 1000 Time ( in seconds) Writing Vdatas and Compound Datasets (average time) Number of records (19bytes each) Conversion does not affect HDF4 performance. It does affect HDF5 ( more than in 15 times)
  • 26. Writing Data (timing includes packing:VSpack and H5Tpack) 3.5 3 2.5 2 1.5 1 0.5 0 9E+05 9E+05 8E+05 7E+05 6E+05 6E+05 5E+05 4E+05 4E+05 3E+05 2E+05 1E+05 73000 HDF4 HDF4 with packing HDF5 HDF5 with packing 1000 Time (seconds) Writing Vdatas and Compound Datasets Effect of data packing in HDF4 and HDF5 (average time) Number of records Data packing was added to the previous test. For HDF5 we have very small effect.
  • 27. Reading Two Fields 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 HDF4 HDF4 without unpcking data 9E+05 9E+05 8E+05 7E+05 6E+05 5E+05 5E+05 4E+05 3E+05 2E+05 2E+05 79000 HDF5 1000 Time ( seconds) Reading Vdatas and Compound datasets Native read (average time) Number of records Unpacking slows down HDF4 significantly ( about 8 times) HDF5 was reading packed data in this test.
  • 29. Structure of CERES file Vgroup CERES_ES8 Vgroup Data Fields 18 SDS Vgroup Geolocation Fields 19 Vdata 2 1 SDS Vdata
  • 30. Ceres File • Used H4toH5 converter to create an HDF5 version of the file – 81MB (HDF4), 80MB (HDF5) – 1 min 55 sec on Linux – 3 min 56 sec on Solaris • Benchmarks – read up to 14 datasets (2148x660 floats) – subsampling: read two columns from the same datasets • Benchmark was run on Solaris and Linux platforms
  • 31. Reading CERES data on big and little - endian machines Reading CERES data on different platforms (best times) 3 Time (seconds) 2.5 2 HDF4 (LE) HDF5 (LE) HDF4 (BE) HDF5 (BE) 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of 2148x660 datasets read On Solaris platform, HDF5 was twice faster than HDF4. On Linux (data conversion is on), HDF4 was about 1.3-1.5 faster.
  • 32. Subsetting CERES Data Selection of two columns from 2148x660 CERES dataset (best times) Time (seconds) 25 20 HDF4 HDF5 HDF5 tuned 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Number of datasets Current version of HDF5 shows about 3 times better performance.
  • 33. Conclusion • Goal: tune HDF5 and give our users recommendations on its efficient usage • Continue to study HDF4 and HDF5 performance – try more platforms: O2K, NT/Windows – try other features (e.g. chunking, compression) – specific HDF5 features (e.g. writing/reading big files, VL datatypes, compound datatypes, selections) • Users input is necessary, send us access patterns you use! • Results will be available @http://hdf.ncsa.uiuc.edu