SlideShare ist ein Scribd-Unternehmen logo
1 von 67
HDF Update
Mike Folk
The HDF Group
HDF and HDF-EOS Workshop X
November 29, 2006
HDF
Outline
• Organizational info
• HDF Software Update
• Other Activities of Interest
Organizational info
“The HDF Group” = “THG”

Founded Dec. 2006

Went solo July 15, 2006
Non-profit
THG mission
To support the vast community of HDF
users and to ensure the sustainable
development of HDF technologies and
the ongoing accessibility of HDF-stored
data.
The HDF Team
Frank Baker
Christian Chilan
Peter Cao
Vailin Choi
Mike Folk
Anne Jennings
Barbara Jones
Quincey Koziol
James Laird
Raymond Lu

John Mainzer
Matthew Needham
Pedro Nunes
Tammi O’Neill
Elena Pourmal
Binh-minh Ribler
Randy Ribler
Rishi Sinha
Kent Yang

And all those wonderful folks out there
who contribute ideas, requests, bug
reports, code, and support.
HDF Software Update
HDF4 update
Platforms to be dropped
• Operating systems
•
•
•
•
•
•
•
•

HPUX 11.00
Crays SV1 and TS IEEE
AIX 5.1 and 5.2
SGI IRIX64-6.5
Linux 2.4
Solaris 2.7, 2.8, 2.9
Windows 2000
MAC OSX 10.3

• Compilers
• GNU C compilers older
than 3.4 (Linux)
• Intel 8.*
• PGI V. 5.*, 6.0
Platforms to be added
• Systems
•
•
•
•
•
•
•

MAC OSX 10.4 (Intel)
Solaris 2.* on Intel
Cray XT3
Windows 64-bit (?)
Linux 2.6
HPUX 11.23
IBM Power 5

• Compilers
• g95
• PGI V. 6.1
• Intel 9.*
New features
• Configuration
• Switched to use F77_FUNC macro for better
Fortran support (no hard-coded compilers
anymore!)
• Support for shared libraries

• Library
• No hard-coded limit on number of opened files
• New APIs to control number of files opened by
application
• Fortran support for SZIP compression
Bugs fixes
• Tools
• A lot of improvements to the hdp, hrepack,
hdiff and hdfimport utilites based on users’
feedback

• Library
• Data corruption bug for several opened
unlimited dimension SDSs
• Better handling of SDSs with duplicated
names in SDgetdimscale and more
HDF5 update
No new releases!
• Focus on HDF5 release 1.8
• HDF5-1.8.0 Alpha 5 release is available from:
hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html
Platforms to be dropped
• Operating systems
•
•
•
•
•
•

HPUX 11.00
MAC OS 10.3
AIX 5.1 and 5.2
SGI IRIX64-6.5
Linux 2.4
Solaris 2.8 and 2.9

• Compilers
• GNU C compilers older
than 3.4 (Linux)
• Intel 8.*
• PGI V. 5.*, 6.0
• MPICH 1.2.5

http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html
Platforms to be added
• Systems
•
•
•
•
•

Alpha Open VMS
MAC OSX 10.4 (Intel)
Solaris 2.* on Intel (?)
Cray XT3
Windows 64-bit (32-bit
binaries)
• Linux 2.6
• BG/L

• Compilers
•
•
•
•
•

g95
PGI V. 6.1
Intel 9.*
MPICH 1.2.7
MPICH2
New Features
in HDF5 1.8
HDF5 1.8 new library features
• Datatype and dataspace features
•
•
•
•
•
•
•
•

Serialized dataspaces and datatypes
Ability to create data type from text description
Integer to float conversions during I/O
Revised exception handling during type
conversion
Compact storage for N-bit data types
Offset+size storage filter, saving space
“Null” dataspace – datasets with no elements
Data transformation filter
HDF5 1.8 – new library features
• Group revisions
•
•
•
•

Creation order access
Compact groups – small groups take less space
Large group storage improvements
Intermediate group creation
HDF5 1.8 – new library features
• Link improvements
• External links -- can refer to objects in another file
• User defined links – apps create own kinds of
links

• Attribute improvments
• Storage improvements for large numbers of attr
• Iterate or look up by creation order
HDF5 1.8 – new library features
• Support for Unicode UTF-8 character set
• Shared header info – duplicate header info
shared, possibly saving space
• Metadata cache improvements – faster I/O on
files with many objects
• Data transformation filter
• Stackable Virtual File Drivers
• Better UNIX/Linux portability
HDF5 1.8– new APIs
• New extendible error-handling API
• New APIs to copy objects between files fast
• Dimension scale model and API
• “HDFpacket” – API to read/write packets efficiently
HDF5 1.8 – backward and
forward compatibility
HDF5 1.8 vs. 1.6.5
• Differences between 1.8 vs. 1.6.5
• Some file format changes
• Several new routines added
• Old APIs deprecated -- removed in later release

• Consequences
• Application requiring 1.8 format changes will write
objects that 1.6.5 library cannot read
• To exploit 1.8 changes, apps need to be rewritten
Principle of
“Maximum file format compatibility”
Unless instructed otherwise, the HDF5 library will
write objects using the earliest version of the format
possible for describing the information.
Assures forward compatibility with the older
versions whenever possible – objects in new
files can be read with old libraries if those
objects are “known” to the old libraries.
Command line tools
New features for old tools
• h5dump
• Dump data in binary format

• h5diff
• Compare dataset regions

• Parallel h5diff (ph5diff)
• Compare two files in MPI parallel environment

• h5repack
• Efficient data copy using H5Gcopy()
• Able to handle big datasets
New HDF5 Tools
• h5copy
• Copies an group, dataset or named datatype from
one location to another location
• Copies within a file or across files

• h5check
• Verifies an HDF5 file against the defined HDF5
File Format Specification

• h5stat
• Reports statistics about a file and objects in a file
HDF Java Products
HDFView changes
• Quality improvements for HDF-java package
• Full documentation of hdf-java object package
• Test suite for hdf-java object package

• Support 64-bit Java on Linux and Solaris
• Many new features, including
•
•
•
•
•

Change font size easily
Grab and move image
Create new table (compound dataset) from template
Filter out fill value for image creation
-geometry option for very high resolution displays
Future work for Java
• Update HDF5 JNI APIs for HDF5 1.8 release
• Release HDFView 2.4 with bug fixes/new
features with HDF5 1.8 release
• New GUI features dealing with table, image
and animation
• Writing capability for HDF5-SRB model
Website Development for
HDF-EOS Tools &
Information Center
Website for HDF-EOS Tools
• THG now manages HDF-EOS web site
•
•
•
•

Registered domain names: hdfeos.net/.org/.com
Re-implemented major topic areas
Re-designed interface
Registered google search

• Will continue maintenance
• Phase two
• Host mailing list
• Support simple forum features
Website for HDF-EOS Tools
Other Activities of
Interest
Performance R&D
HDF5 - PnetCDF performance comparison
Flash I/O Benchmark (Checkpoint files)
PnetCDF

HDF5 collective

HDF5 independent

2500

MB/s

2000
1500
1000

uP: Power 5

500
0
10

110

210

310

Number of Processors

I/O performance of PnetCDF is comparable with
parallel HDF5 when the libraries are used in similar
manners.
PnetCDF4 - PnetCDF comparison

Bandwidth (MB/S)

PNetCDF collective

NetCDF4 collective

160
140
120
100
80
60
40
20
0
0

16

32

48

64

80

96

112

128

144

Number of processors

I/O performance of parallel NetCDF4 is comparable
with PnetCDF with about 15% slowness on average for
the output of ROMS history file.
Collective I/O improvements
• HDF5 supports collective IO for non-regular
selections
• Collective IO for chunked storage is not trivial.
• Non-regular selection performance optimizations:
• Added IO options to achieve good collective IO
performance
• Added APIs for applications to participate in the
optimization process

• See the poster
DOE Labs
Sandia
National
Laboratory

Lawrence
Livermore
National
Laboratory
DOE ASC* and Others
• Support HDF5 on major systems at Sandia &
Lawrence Livermore National Laboratories
• R&D efforts underway
•
•
•
•

File recovery after a crash
Very fast write speed – goal is 300 MB/sec
Read-while-writing capability
Java library and HDFView improvements

* Advanced Scientific Computing project
Flight test
Flight test – collect, then process
Boeing HDF5 for flight test data
• Boeing 787 active archive
• 10 TB per flight-test day

• Must handle raw, real-time data
• High speed ingest, by “packet”
• Post-processing, by “time-history”

• Boeing High Level API’s
• HDFpacket – released with HDF5 1.8
• HDFtime_history – new, open version likely
Product data
STE
P
Bioinformatics
caacaagccaaaactcgtacaa
Cgagatatctcttggaaaaact
gctcacaatattgacgtacaag
gttgttcatgaaactttcggta
Acaatcgttgacattgcgacct
aatacagcccagcaagcagaat

Managing genomic data
C# HDF5 API
for Agilent
Agilent C# project
• Why?
• Heavy use of C# at Agilent
• Compatibility with Matlab
• Other interest in HDF5 at Agilent

• What?
• Prototype API in C# for Windows XP
• Basic functions to create, open, close, read, write
• Limited datatypes, no partial I/O

• When?
• March 2007
HDF5 Software
Tools & Applications

Fortran C++ Java C#
C API
HDF I/O Library

HDF File
NetCDF 4
NetCDF 4 project
• Enhanced NetCDF-4 Interface to HDF5
• Combine features of netCDF and HDF5
• Take advantage of their separate strengths

• Collaboration between NCSA, THG, Unidata
• Currently in Alpha Release
• Waiting for beta release
NetCDF-4 Architecture
netCDF-3
netCDF-3
applications
applications

netCDF
netCDF
files
files
netCDF-4
HDF5 files

HDF5
files

netCDF-4
netCDF-4
applications
applications

HDF5
HDF5
applications
applications

netCDF-3
Interface

netCDF-4
Library

HDF5 Library

• Supports access to netCDF files and HDF5 files
created through netCDF-4 interface
Archival formats
• Proposal to NOAA Scientific Data
Stewardship program
• Will investigate use of OAIS “Archive
Information Package” standard with HDF5
• PI: Ruth Duerr (NSIDC) and Kent Yang

OAIS: Open Archival Information System
Asymmetries between
collecting and accessing data
• Huge streams of data
collected …

• To be accessed in little
bits…
Challenge – efficient remote access
• How do we efficiently find and access data
from distributed repositories, when the data
are big and complex?
• Storage Resource Broker (SRB)
• Efficient access to HDF5 objects in repository

• OPeNDAP
• Powerful protocol for remote querying and
subsetting of scientific data
Example – Storage resource broker
• Storage Resource Broker – repository for
heterogeneous data collections
• Simplifies storage, query and access to massive
amounts of scientific data
• Has data in HDF5, netCDF, other formats
Normal SRB configuration

client
HDF5
HDF5 File
(whole file or a
sequence of
bytes)

SRB Server

MCAT
OPeNDAP-HDF5 project
• OPeNDAP
• Powerful protocol for remote querying and
subsetting of scientific data
• Replaces direct file access with remote query and
access
• Widely used in Earth Sciences
OPeNDAP – HDF5 Project
• A NASA ROSES NRA project
• Tasks
•
•
•
•

HDF5-DAP2 server (now a prototype)
HDF5-DAP4 server
DAP4 to HDF5 conversion utility
Investigate integrated DAP-aware HDF5 library
SQL Server and HDF5
with Microsoft
SQL Server and HDF5
• Microsoft “dream environment for scientists”
• Combine data management, computing
• SQL Server 2005 solution
• Combine RDBMS with scientific analysis tools,
together in one integrated system.
• HDF5 & other formats manage scientific objects
HDF5 in SQL server
Visualization

Libraries

(MATLAB,…)

Web Services

(XML, REST, RSS)

OLAP and
Data Mining

Reporting

.NET Languages with Language Integrated Query
Entity Framework (EDM, eSQL, O-R mapping)

HDF5 EDM model

SQL Server
HDF5

HDF5
TVFs

Index

HDF5
type

HDF5 FS
blob

HDF5
files
Thank you all
and
Thank you NASA!
Acknowledgement
This report is based upon work supported in part by a
Cooperative Agreement with NASA under NASA
NNG05GC60A. Any opinions, findings, and
conclusions or recommendations expressed in this
material are those of the author(s) and do not
necessarily reflect the views of the National
Aeronautics and Space Administration.
Questions/comments?
Information Sources
• HDF website
http://hdfgroup.org/

• HDF5 Information Center
http://hdfgroup.org/HDF5/

• HDF Helpdesk
hdfhelp@hdfgroup.org

• HDF users mailing list
hdfnews@ncsa.uiuc.edu
coming soon: news@hdfgroup.org

Weitere ähnliche Inhalte

Was ist angesagt?

Querying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisQuerying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisPeter Bouda
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsQian Lin
 

Was ist angesagt? (20)

Product Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the WebProduct Designer Hub - Taking HPD to the Web
Product Designer Hub - Taking HPD to the Web
 
Introduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming ModelsIntroduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming Models
 
HDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF ConverterHDF-EOS 2/5 to netCDF Converter
HDF-EOS 2/5 to netCDF Converter
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
Querying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisQuerying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysis
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
Hierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) UpdateHierarchical Data Formats (HDF) Update
Hierarchical Data Formats (HDF) Update
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFViewHDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
 
Images of HDF5
Images of HDF5Images of HDF5
Images of HDF5
 
HDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve InteroperabilityHDF Product Designer: Using Templates to Achieve Interoperability
HDF Product Designer: Using Templates to Achieve Interoperability
 
HDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGISHDF and netCDF Data Support in ArcGIS
HDF and netCDF Data Support in ArcGIS
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
 
Parallel Computing with HDF Server
Parallel Computing with HDF ServerParallel Computing with HDF Server
Parallel Computing with HDF Server
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
GDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS ProjectGDAL Enhancement for ESDIS Project
GDAL Enhancement for ESDIS Project
 
Tools to improve the usability of NASA HDF Data
Tools to improve the usability of NASA HDF DataTools to improve the usability of NASA HDF Data
Tools to improve the usability of NASA HDF Data
 
NASA HDF/HDF-EOS Data for Dummies (and Developers)
NASA HDF/HDF-EOS Data for Dummies (and Developers)NASA HDF/HDF-EOS Data for Dummies (and Developers)
NASA HDF/HDF-EOS Data for Dummies (and Developers)
 

Ähnlich wie HDF Update

Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallelmfolk
 
Hdf5 current future
Hdf5 current futureHdf5 current future
Hdf5 current futuremfolk
 
Pandas & Cloudera: Scaling the Python Data Experience
Pandas & Cloudera: Scaling the Python Data ExperiencePandas & Cloudera: Scaling the Python Data Experience
Pandas & Cloudera: Scaling the Python Data ExperienceTuri, Inc.
 
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data ExperienceIbis: Scaling the Python Data Experience
Ibis: Scaling the Python Data ExperienceWes McKinney
 

Ähnlich wie HDF Update (20)

HDF Updae
HDF UpdaeHDF Updae
HDF Updae
 
HDF Status and Development
HDF Status and DevelopmentHDF Status and Development
HDF Status and Development
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Hdf5 parallel
Hdf5 parallelHdf5 parallel
Hdf5 parallel
 
HDF Project Update
HDF Project UpdateHDF Project Update
HDF Project Update
 
HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)HDF Update for DAAC Managers (2017-02-27)
HDF Update for DAAC Managers (2017-02-27)
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
HDF5 Backward and Forward Compatibility Issues
HDF5 Backward and Forward Compatibility IssuesHDF5 Backward and Forward Compatibility Issues
HDF5 Backward and Forward Compatibility Issues
 
Hdf5 current future
Hdf5 current futureHdf5 current future
Hdf5 current future
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
What will be new in HDF5?
What will be new in HDF5?What will be new in HDF5?
What will be new in HDF5?
 
HDF Product Designer
HDF Product DesignerHDF Product Designer
HDF Product Designer
 
Update on HDF5 1.8
Update on HDF5 1.8Update on HDF5 1.8
Update on HDF5 1.8
 
HDF Update
HDF UpdateHDF Update
HDF Update
 
Support for NPP/NPOESS by The HDF Group
Support for NPP/NPOESS by The HDF GroupSupport for NPP/NPOESS by The HDF Group
Support for NPP/NPOESS by The HDF Group
 
HDF Project Update
HDF Project UpdateHDF Project Update
HDF Project Update
 
Moving applications to HDF5 1.8
Moving applications to HDF5 1.8Moving applications to HDF5 1.8
Moving applications to HDF5 1.8
 
Pandas & Cloudera: Scaling the Python Data Experience
Pandas & Cloudera: Scaling the Python Data ExperiencePandas & Cloudera: Scaling the Python Data Experience
Pandas & Cloudera: Scaling the Python Data Experience
 
Ibis: Scaling the Python Data Experience
Ibis: Scaling the Python Data ExperienceIbis: Scaling the Python Data Experience
Ibis: Scaling the Python Data Experience
 
HDF-EOS Status and Developments
HDF-EOS Status and DevelopmentsHDF-EOS Status and Developments
HDF-EOS Status and Developments
 

Mehr von The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

Mehr von The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 
Google Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOSGoogle Colaboratory for HDF-EOS
Google Colaboratory for HDF-EOS
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
HDF Status Update
HDF Status UpdateHDF Status Update
HDF Status Update
 
NASA Terra Data Fusion
NASA Terra Data FusionNASA Terra Data Fusion
NASA Terra Data Fusion
 
HDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at ScaleHDF Cloud: HDF5 at Scale
HDF Cloud: HDF5 at Scale
 
HDF for the Cloud
HDF for the CloudHDF for the Cloud
HDF for the Cloud
 

Kürzlich hochgeladen

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Kürzlich hochgeladen (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

HDF Update

  • 1. HDF Update Mike Folk The HDF Group HDF and HDF-EOS Workshop X November 29, 2006 HDF
  • 2. Outline • Organizational info • HDF Software Update • Other Activities of Interest
  • 4. “The HDF Group” = “THG” Founded Dec. 2006 Went solo July 15, 2006 Non-profit
  • 5. THG mission To support the vast community of HDF users and to ensure the sustainable development of HDF technologies and the ongoing accessibility of HDF-stored data.
  • 6. The HDF Team Frank Baker Christian Chilan Peter Cao Vailin Choi Mike Folk Anne Jennings Barbara Jones Quincey Koziol James Laird Raymond Lu John Mainzer Matthew Needham Pedro Nunes Tammi O’Neill Elena Pourmal Binh-minh Ribler Randy Ribler Rishi Sinha Kent Yang And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and support.
  • 9. Platforms to be dropped • Operating systems • • • • • • • • HPUX 11.00 Crays SV1 and TS IEEE AIX 5.1 and 5.2 SGI IRIX64-6.5 Linux 2.4 Solaris 2.7, 2.8, 2.9 Windows 2000 MAC OSX 10.3 • Compilers • GNU C compilers older than 3.4 (Linux) • Intel 8.* • PGI V. 5.*, 6.0
  • 10. Platforms to be added • Systems • • • • • • • MAC OSX 10.4 (Intel) Solaris 2.* on Intel Cray XT3 Windows 64-bit (?) Linux 2.6 HPUX 11.23 IBM Power 5 • Compilers • g95 • PGI V. 6.1 • Intel 9.*
  • 11. New features • Configuration • Switched to use F77_FUNC macro for better Fortran support (no hard-coded compilers anymore!) • Support for shared libraries • Library • No hard-coded limit on number of opened files • New APIs to control number of files opened by application • Fortran support for SZIP compression
  • 12. Bugs fixes • Tools • A lot of improvements to the hdp, hrepack, hdiff and hdfimport utilites based on users’ feedback • Library • Data corruption bug for several opened unlimited dimension SDSs • Better handling of SDSs with duplicated names in SDgetdimscale and more
  • 14. No new releases! • Focus on HDF5 release 1.8 • HDF5-1.8.0 Alpha 5 release is available from: hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html
  • 15. Platforms to be dropped • Operating systems • • • • • • HPUX 11.00 MAC OS 10.3 AIX 5.1 and 5.2 SGI IRIX64-6.5 Linux 2.4 Solaris 2.8 and 2.9 • Compilers • GNU C compilers older than 3.4 (Linux) • Intel 8.* • PGI V. 5.*, 6.0 • MPICH 1.2.5 http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html
  • 16. Platforms to be added • Systems • • • • • Alpha Open VMS MAC OSX 10.4 (Intel) Solaris 2.* on Intel (?) Cray XT3 Windows 64-bit (32-bit binaries) • Linux 2.6 • BG/L • Compilers • • • • • g95 PGI V. 6.1 Intel 9.* MPICH 1.2.7 MPICH2
  • 18. HDF5 1.8 new library features • Datatype and dataspace features • • • • • • • • Serialized dataspaces and datatypes Ability to create data type from text description Integer to float conversions during I/O Revised exception handling during type conversion Compact storage for N-bit data types Offset+size storage filter, saving space “Null” dataspace – datasets with no elements Data transformation filter
  • 19. HDF5 1.8 – new library features • Group revisions • • • • Creation order access Compact groups – small groups take less space Large group storage improvements Intermediate group creation
  • 20. HDF5 1.8 – new library features • Link improvements • External links -- can refer to objects in another file • User defined links – apps create own kinds of links • Attribute improvments • Storage improvements for large numbers of attr • Iterate or look up by creation order
  • 21. HDF5 1.8 – new library features • Support for Unicode UTF-8 character set • Shared header info – duplicate header info shared, possibly saving space • Metadata cache improvements – faster I/O on files with many objects • Data transformation filter • Stackable Virtual File Drivers • Better UNIX/Linux portability
  • 22. HDF5 1.8– new APIs • New extendible error-handling API • New APIs to copy objects between files fast • Dimension scale model and API • “HDFpacket” – API to read/write packets efficiently
  • 23. HDF5 1.8 – backward and forward compatibility
  • 24. HDF5 1.8 vs. 1.6.5 • Differences between 1.8 vs. 1.6.5 • Some file format changes • Several new routines added • Old APIs deprecated -- removed in later release • Consequences • Application requiring 1.8 format changes will write objects that 1.6.5 library cannot read • To exploit 1.8 changes, apps need to be rewritten
  • 25. Principle of “Maximum file format compatibility” Unless instructed otherwise, the HDF5 library will write objects using the earliest version of the format possible for describing the information. Assures forward compatibility with the older versions whenever possible – objects in new files can be read with old libraries if those objects are “known” to the old libraries.
  • 27. New features for old tools • h5dump • Dump data in binary format • h5diff • Compare dataset regions • Parallel h5diff (ph5diff) • Compare two files in MPI parallel environment • h5repack • Efficient data copy using H5Gcopy() • Able to handle big datasets
  • 28. New HDF5 Tools • h5copy • Copies an group, dataset or named datatype from one location to another location • Copies within a file or across files • h5check • Verifies an HDF5 file against the defined HDF5 File Format Specification • h5stat • Reports statistics about a file and objects in a file
  • 30. HDFView changes • Quality improvements for HDF-java package • Full documentation of hdf-java object package • Test suite for hdf-java object package • Support 64-bit Java on Linux and Solaris • Many new features, including • • • • • Change font size easily Grab and move image Create new table (compound dataset) from template Filter out fill value for image creation -geometry option for very high resolution displays
  • 31. Future work for Java • Update HDF5 JNI APIs for HDF5 1.8 release • Release HDFView 2.4 with bug fixes/new features with HDF5 1.8 release • New GUI features dealing with table, image and animation • Writing capability for HDF5-SRB model
  • 32. Website Development for HDF-EOS Tools & Information Center
  • 33. Website for HDF-EOS Tools • THG now manages HDF-EOS web site • • • • Registered domain names: hdfeos.net/.org/.com Re-implemented major topic areas Re-designed interface Registered google search • Will continue maintenance • Phase two • Host mailing list • Support simple forum features
  • 37. HDF5 - PnetCDF performance comparison Flash I/O Benchmark (Checkpoint files) PnetCDF HDF5 collective HDF5 independent 2500 MB/s 2000 1500 1000 uP: Power 5 500 0 10 110 210 310 Number of Processors I/O performance of PnetCDF is comparable with parallel HDF5 when the libraries are used in similar manners.
  • 38. PnetCDF4 - PnetCDF comparison Bandwidth (MB/S) PNetCDF collective NetCDF4 collective 160 140 120 100 80 60 40 20 0 0 16 32 48 64 80 96 112 128 144 Number of processors I/O performance of parallel NetCDF4 is comparable with PnetCDF with about 15% slowness on average for the output of ROMS history file.
  • 39. Collective I/O improvements • HDF5 supports collective IO for non-regular selections • Collective IO for chunked storage is not trivial. • Non-regular selection performance optimizations: • Added IO options to achieve good collective IO performance • Added APIs for applications to participate in the optimization process • See the poster
  • 41. DOE ASC* and Others • Support HDF5 on major systems at Sandia & Lawrence Livermore National Laboratories • R&D efforts underway • • • • File recovery after a crash Very fast write speed – goal is 300 MB/sec Read-while-writing capability Java library and HDFView improvements * Advanced Scientific Computing project
  • 43. Flight test – collect, then process
  • 44. Boeing HDF5 for flight test data • Boeing 787 active archive • 10 TB per flight-test day • Must handle raw, real-time data • High speed ingest, by “packet” • Post-processing, by “time-history” • Boeing High Level API’s • HDFpacket – released with HDF5 1.8 • HDFtime_history – new, open version likely
  • 47. C# HDF5 API for Agilent
  • 48. Agilent C# project • Why? • Heavy use of C# at Agilent • Compatibility with Matlab • Other interest in HDF5 at Agilent • What? • Prototype API in C# for Windows XP • Basic functions to create, open, close, read, write • Limited datatypes, no partial I/O • When? • March 2007
  • 49. HDF5 Software Tools & Applications Fortran C++ Java C# C API HDF I/O Library HDF File
  • 51. NetCDF 4 project • Enhanced NetCDF-4 Interface to HDF5 • Combine features of netCDF and HDF5 • Take advantage of their separate strengths • Collaboration between NCSA, THG, Unidata • Currently in Alpha Release • Waiting for beta release
  • 53. Archival formats • Proposal to NOAA Scientific Data Stewardship program • Will investigate use of OAIS “Archive Information Package” standard with HDF5 • PI: Ruth Duerr (NSIDC) and Kent Yang OAIS: Open Archival Information System
  • 55. • Huge streams of data collected … • To be accessed in little bits…
  • 56. Challenge – efficient remote access • How do we efficiently find and access data from distributed repositories, when the data are big and complex? • Storage Resource Broker (SRB) • Efficient access to HDF5 objects in repository • OPeNDAP • Powerful protocol for remote querying and subsetting of scientific data
  • 57. Example – Storage resource broker • Storage Resource Broker – repository for heterogeneous data collections • Simplifies storage, query and access to massive amounts of scientific data • Has data in HDF5, netCDF, other formats
  • 58. Normal SRB configuration client HDF5 HDF5 File (whole file or a sequence of bytes) SRB Server MCAT
  • 59. OPeNDAP-HDF5 project • OPeNDAP • Powerful protocol for remote querying and subsetting of scientific data • Replaces direct file access with remote query and access • Widely used in Earth Sciences
  • 60. OPeNDAP – HDF5 Project • A NASA ROSES NRA project • Tasks • • • • HDF5-DAP2 server (now a prototype) HDF5-DAP4 server DAP4 to HDF5 conversion utility Investigate integrated DAP-aware HDF5 library
  • 61. SQL Server and HDF5 with Microsoft
  • 62. SQL Server and HDF5 • Microsoft “dream environment for scientists” • Combine data management, computing • SQL Server 2005 solution • Combine RDBMS with scientific analysis tools, together in one integrated system. • HDF5 & other formats manage scientific objects
  • 63. HDF5 in SQL server Visualization Libraries (MATLAB,…) Web Services (XML, REST, RSS) OLAP and Data Mining Reporting .NET Languages with Language Integrated Query Entity Framework (EDM, eSQL, O-R mapping) HDF5 EDM model SQL Server HDF5 HDF5 TVFs Index HDF5 type HDF5 FS blob HDF5 files
  • 65. Acknowledgement This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.
  • 67. Information Sources • HDF website http://hdfgroup.org/ • HDF5 Information Center http://hdfgroup.org/HDF5/ • HDF Helpdesk hdfhelp@hdfgroup.org • HDF users mailing list hdfnews@ncsa.uiuc.edu coming soon: news@hdfgroup.org

Hinweis der Redaktion

  1. &lt;number&gt;
  2. &lt;number&gt;
  3. &lt;number&gt;
  4. &lt;number&gt;
  5. &lt;number&gt;
  6. &lt;number&gt;
  7. &lt;number&gt;
  8. &lt;number&gt;
  9. &lt;number&gt;
  10. Investigate integrated DAP-aware HDF5 library, that could provide seamless access to both local and remote data
  11. &lt;number&gt;