SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
SUBSETTING
Matt Smith
Information Technology and Systems Center (ITSC)
University of Alabama in Huntsville (UAH)
http://subset.itsc.uah.edu

UAH

The University of Alabama in
Huntsville
Subsetting
• Goal: to provide a science data user with only the data
they request as quickly as possible.
• Benefits science data users and data centers:
- reduces analysis time by reducing amount of data
- reduces time for data delivery
- reduces resources (network, personnel, media, etc.)
• Steps:
- locate spatial / temporal / spectral area of interest
- extract
- re-assemble for distribution

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
HEW
• HDF-EOS Web-based Subsetter
• Prototype software designed to be dataset-independent
(HDF-EOS)
• Front-end/GUI
• Uses HTML forms and JavaScript
• Optional

• Back-end
• Needs subset criteria file and HDF-EOS data
• Performs subsetting as a “batch” job

• http://subset.itsc.uah.edu/hew2k

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Subset Criteria File
•
•
•
•

File(s) to subset
Parameters/channels
E-mail address
Bounding box

Req’d
Req’d
Req’d
Opt.

• Latitude/Longitude bounds
• Row/Column bounds (grids only)

•
•
•
•
•
•

Time range
Subsampling stride
Non-geolocated objects (also_include)
Output file prefix
.met file
Sub-scan subsetting (swaths only)

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

Opt.
Req’d
Opt.
Opt.
Opt.
Opt.

UAH

The University of Alabama in
Huntsville
Example Subset Criteria File
GROUP = SUBSET
PARENT_FILE =(“/AQUA/AMSR/AE_L2A.hdfeos”)
LATITUDE_RANGE = (35.000000, 40.000000)
LONGITUDE_RANGE = (-77.000000, -72.000000)
EMAIL = “user@company.com”
OUTPUT_PREFIX = “NC_coast”
MET_FILE = “YES”
GROUP = SPOG
NAME = “swath_1”
TYPE = “SWATH”
PARAMETERS = “89.0V_Res.1_TB”,
“89.0V_Res.2_TB”)
SUBSAMPLING = (“GeoTrack”, 1,
“GeoXtrack”, 1)
END_GROUP = SPOG
END_GROUP = SUBSET
END

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
HEW Back-end
•
•
•
•
•
•

Uses HDF-EOS (and HDF) library
Instructions via a subset criteria file (ODL)
Handles multiple similar files
Handles Swath and/or Grid objects
Unix (SGI & Sun) executables available
Subsetted output files contain:
•
•
•
•

StructMetadata (HDF-EOS)
ArchiveMetadata*
ProductMetadata (added by HEW ODL file)
CoreMetadata* (w/ modified bounding box & time info)
• optionally placed in . m e t file
•

27-28 February 2002

* if present in parent file
Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
HEW Subsettable data
EOS DATASETS
• Terra
MODIS
MOPITT
ASTER
• Aqua
AMSR-E

27-28 February 2002

OTHERS
TRMM
TMI
NOAA-15
AMSU-A
Any other HDF-EOS2 (HDF4) data
written with HDF-EOS library
subsetting calls in mind

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
HEW integration with ECS
EDG System

2

EDG
Order
submission
(HTML)

ECS

ECS

1

End
user

7
3

Output data
(Reingested)

4
Data order
and reply

Subset ODL
and reply

6
Output
data

Subsetter

5
Input
data

Subsetting System

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ECS integration plans
•
•
•
•
•

UAH/ITSC-written interface software
6a.05 to be released in March
NSIDC, GDAAC, EDC
EDG v3.4 will have subsetting options
Enhancements for DAACs

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Subsetting web-site
• http://www.subset.org
• Hope to create “portal”
• for everyone involved in subsetting
•
•
•
•
•
•
•

27-28 February 2002

Advertising
Forums
Data
Software
Glossary
Tutorials
Links to specialized subsetters

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Other HDF-EOS Tools
•
•
•
•

SPOT – Subsettability Checker
eospeek – HDF-EOS file display
hdfpeek – HDF file display
HDF-EOS user’s manual (in work)

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Subsetting Plans
•

•
•
•
•

Complete ECS Integration
• Maintain/Improve as needed for DAACs
• Front-end polar projection coverage map (NSIDC)
Certify software with new datasets (Aqua, Aura,…)
Incorporate ESML usage
Provide support for HDF-EOS5
Provide additional specialized subsetting applications
for instrument teams and others

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Earth Science Markup Language
“Define Once, Use Anywhere”

Information Technology and Systems Center
University of Alabama in Huntsville

http://esml.itsc.uah.edu

Contact Info:
Rahul Ramachandran
rramachandran@itsc.uah.edu
Research Effort Supported by:
Karen Moe
Earth Science Technology Office, NASA

UAH

The University of Alabama in
Huntsville
Data Characteristics
•

Different Data Formats
• BUFR (DoD, WMO)
• CDF, NetCDF
• GRIB (WMO)
• HDF, HDF-EOS (NASA)
• Free formats (Binary, ASCII)
• GRaDs, McIDAS, Pheonix, URF etc etc

•

Different states of processing
• raw, calibrated, derived, modeled or interpreted

•

Data/application interoperability problem

•

Most scientists are not programmers

•

Writing data decoders takes time and effort!

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Data/Application Interoperability Problem
DATA
DATA
FORMAT 11
FORMAT

DATA
DATA
FORMAT 22
FORMAT

DATA
DATA
FORMAT 33
FORMAT

FORMAT
CONVERTER

READER 1

READER 2
APPLICATION

•

Specialized code for every format
• Difficult to assimilate new data types

•

Enforce a Standard Data Format
•

Not practical for legacy datasets

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
What is ESML?
•

Specialized markup language for Earth Science metadata based on
XML

•

Machine-readable and -interpretable representation of the structure
and content of any data file, regardless of data format

•

ESML is NOT a new data format

•

External metadata files that can be generated by either data producer
or data consumer (at collection, data set, and/or granule level)

•

ESML will provide the benefits of a standard, self-describing data
format (like HDF, HDF-EOS, netCDF, geoTIFF, …) without the cost
of data conversion

•

ESML consists of three types of metadata:
• Syntactic
• Semantic
• Content

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Three types of metadata
• Syntactic
– Structural information, bits, word-length, endianness,
sequence

• Semantic
– Meaning of the data, units, frame of reference

• Content
– “Typical” metadata, general information
• Producer contact info, version

– Searchable information, keywords, etc.

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
WMO Sounding data
4
4
5
5
6
4
6
6
6
6
4
5
6
6
5
6
5
5
6
6
4
6
5
6
4

10000
9250
8890
8830
8765
8500
8136
7840
7554
7279
7000
6910
6753
6499
6130
6017
5840
5630
5563
5348
5000
4940
4890
4740
4000

27-28 February 2002

94 99999 99999 99999 99999
757 99999 99999 99999 99999
1102 142 -28 99999 99999
1159 150 -20 99999 99999
1219 99999 99999 310 26
1468 130 -40 330 36
1828 99999 99999 320 46
2133 99999 99999 315 62
2438 99999 99999 310 67
2743 99999 99999 295 62
3065 16 -124 275 67
3169 20 -170 99999 99999
3352 99999 99999 270 93
3657 99999 99999 270 118
4121 -39 -199 99999 99999
4267 99999 99999 255 93
4500 -73 -193 99999 99999
4784 -87 -297 99999 99999
4876 99999 99999 240 87
5181 99999 99999 230 103
5700 -161 -241 235 139
5791 99999 99999 235 139
5867 -173 -243 99999 99999
6096 99999 99999 235 129
7340 -287 -367 245 108
Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Ex: ESML for WMO Sounding Data
<a:ESML xmlns:a="ESML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="ESML R:SchemaESML.xsd">
<SyntacticMetaData>
<Ascii>
<AsciiStructure name="DataTable" geoInfo="NoGeoInfo" instances="1">
<Array occurs=“*">
<Field format="%d" name="number">
<Attribute/>
</Field>
<Field format="%d" name="PRESSURE">
<Data unit="mb" equation="X/10“ FillValue=“99999”/>
</Field>
<Field format="%d" name="HEIGHT">
<Data unit=“m”/>
</Field>
<Field format="%d" name="TEMPERATURE">
<Data unit=“K” equation="X/10/>
</Field>
<Field format="%d" name="DEWPOINT">
<Data unit=“K” equation="X/10/>
</Field>
<Field format="%d" name="WIND DIRECTION">
<Data unit=“DDD”/>
</Field>
<Field format="%d" name="WIND SPEED">
<Data unit=“m/s”/>
</Field>
</Array>
</AsciiStructure>
</Ascii>
</SyntacticMetaData>
</a:ESML>

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ESML Schema/Library
• ESML Schema Version 0.5
• Supports ASCII, Binary and HDF-EOS
• W3C compliant schema
• Extensible allowing new formats or modifications
• ESML Library Version 0.5
• C++ version
• Windows 95/98/NT/2000 version
• Porting to LINUX version
• Changed from Oracle XML parser to Apache XML parser

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Future Plans
• Addition of new data formats to both Schema and Library
• GRIB
• McIDAS
• BUFR, HDF4/5 and others
• Versions of the Library
• C/C++ version – UNIX
• C/C++ version – Mac
• Java version

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ESML Data Browser
•

Hybrid product (Java Client/C++ Library backend/JNI Interface)

•

Prototype version is the extension of the ESML Demo Tool

•

Features
• Browse and view data values using an ESML file
• Browse the metadata for each data field

•

Future Features:
• Allow format conversion with automatic generation of ESML
metadata file
• Allow selection of multiple fields
• Additional functionality such as Subsetting
• Browse data images

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ESML Editor
• 100% Java version prototype
• Unable to find a COTS editor
• Utilizes Expert System principles to give users correct
options
• Hides XML tags from the users
• Future Features:
• Allow text editing of the XML tags also
• Incorporate feedback from users

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
ESML Web Page
• URL: esml.itsc.uah.edu
• Post latest products, news, presentations, papers
• Schema and related documents available to all
• Beta version of the Library available on limited basis
• Beta versions of ESML Editor and ESML Data Browser
will be made available soon

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Other UAH/ITSC work
•
•
•
•

AMSR-E
Passive Microwave (PM)-ESIP
ADaM – Algorithm Development and Mining
EVE – An EnVironmEnt for On-board
Processing

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville
Contact UAH/ITSC
HEW (HDF-EOS Web-based Subsetter)
http://subset.itsc.uah.edu/hew2k
ESML (Earth Science Markup Language)
http://esml.itsc.uah.edu
General Purpose Subsetting (ADaM)
http://datamining.itsc.uah.edu
On-Demand Subsetting (Passive Microwave – ESIP)
http://pm-esip.msfc.nasa.gov
SSM/I Coarse-grain Subsetting
http://ghrc.msfc.nasa.gov/ssmi/ssmi_subset.html

27-28 February 2002

Science Data Processing Workshop
Greenbelt, MD

UAH

The University of Alabama in
Huntsville

Weitere ähnliche Inhalte

Andere mochten auch (7)

HDF-EOS Aura File Format Guidelines
HDF-EOS Aura File Format GuidelinesHDF-EOS Aura File Format Guidelines
HDF-EOS Aura File Format Guidelines
 
The LEISA Atmospheric Corrector (LAC) on Earth Observer 1 (EO1)
The LEISA Atmospheric Corrector (LAC) on Earth Observer 1 (EO1)The LEISA Atmospheric Corrector (LAC) on Earth Observer 1 (EO1)
The LEISA Atmospheric Corrector (LAC) on Earth Observer 1 (EO1)
 
HDF and HDF-EOS Experiences and Applications
HDF and HDF-EOS Experiences and ApplicationsHDF and HDF-EOS Experiences and Applications
HDF and HDF-EOS Experiences and Applications
 
Transitioning from HDF4 to HDF5
Transitioning from HDF4 to HDF5Transitioning from HDF4 to HDF5
Transitioning from HDF4 to HDF5
 
Welcome to HDF Workshop V
Welcome to HDF Workshop VWelcome to HDF Workshop V
Welcome to HDF Workshop V
 
Workshop Discussion: HDF & HDF-EOS Future Direction
Workshop Discussion: HDF & HDF-EOS Future DirectionWorkshop Discussion: HDF & HDF-EOS Future Direction
Workshop Discussion: HDF & HDF-EOS Future Direction
 
HDF Update
HDF UpdateHDF Update
HDF Update
 

Ähnlich wie Subsetting

Data Mobility Exhibition
Data Mobility ExhibitionData Mobility Exhibition
Data Mobility ExhibitionGlobus
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big DataOmnia Safaan
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingGlobus
 
Scientific
Scientific Scientific
Scientific marpierc
 
Memory-Driven Near-Data Acceleration and its application to DOME/SKA
 Memory-Driven Near-Data Acceleration and its application to DOME/SKA Memory-Driven Near-Data Acceleration and its application to DOME/SKA
Memory-Driven Near-Data Acceleration and its application to DOME/SKAinside-BigData.com
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational ScienceChelle Gentemann
 
2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)Rudolf Husar
 
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...The HDF-EOS Tools and Information Center
 
GSM UMTS LTE Site Commissioning software
GSM UMTS LTE Site Commissioning softwareGSM UMTS LTE Site Commissioning software
GSM UMTS LTE Site Commissioning softwareAhmet Ozturk
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 

Ähnlich wie Subsetting (20)

Earth Science Markup Language (ESML) - A Tutorial
Earth Science Markup Language (ESML) - A TutorialEarth Science Markup Language (ESML) - A Tutorial
Earth Science Markup Language (ESML) - A Tutorial
 
Data Mobility Exhibition
Data Mobility ExhibitionData Mobility Exhibition
Data Mobility Exhibition
 
Subsetting at UAH
Subsetting at UAHSubsetting at UAH
Subsetting at UAH
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
 
Dataset Independent Subsetting
Dataset Independent SubsettingDataset Independent Subsetting
Dataset Independent Subsetting
 
Metadata Requirements for EOSDIS Data Providers
Metadata Requirements for EOSDIS Data ProvidersMetadata Requirements for EOSDIS Data Providers
Metadata Requirements for EOSDIS Data Providers
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
Scientific
Scientific Scientific
Scientific
 
Memory-Driven Near-Data Acceleration and its application to DOME/SKA
 Memory-Driven Near-Data Acceleration and its application to DOME/SKA Memory-Driven Near-Data Acceleration and its application to DOME/SKA
Memory-Driven Near-Data Acceleration and its application to DOME/SKA
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Empowering Transformational Science
Empowering Transformational ScienceEmpowering Transformational Science
Empowering Transformational Science
 
2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)2004-11-13 Supersite Relational Database Project: (Data Portal?)
2004-11-13 Supersite Relational Database Project: (Data Portal?)
 
Srds Pres011120
Srds Pres011120Srds Pres011120
Srds Pres011120
 
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
Implementation of OGC Web Coverage Service Using HDF5/HDF-EOS5 as the Base Fi...
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLABAccessing Cloud Data and Services Using EDL, Pydap, MATLAB
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
 
SEEDS Standards Process
SEEDS Standards ProcessSEEDS Standards Process
SEEDS Standards Process
 
Subsetting
SubsettingSubsetting
Subsetting
 
GSM UMTS LTE Site Commissioning software
GSM UMTS LTE Site Commissioning softwareGSM UMTS LTE Site Commissioning software
GSM UMTS LTE Site Commissioning software
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 

Mehr von The HDF-EOS Tools and Information Center

STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...The HDF-EOS Tools and Information Center
 

Mehr von The HDF-EOS Tools and Information Center (20)

Cloud-Optimized HDF5 Files
Cloud-Optimized HDF5 FilesCloud-Optimized HDF5 Files
Cloud-Optimized HDF5 Files
 
Accessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDSAccessing HDF5 data in the cloud with HSDS
Accessing HDF5 data in the cloud with HSDS
 
The State of HDF
The State of HDFThe State of HDF
The State of HDF
 
Highly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance FeaturesHighly Scalable Data Service (HSDS) Performance Features
Highly Scalable Data Service (HSDS) Performance Features
 
Creating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 FilesCreating Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
 
HDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance DiscussionHDF5 OPeNDAP Handler Updates, and Performance Discussion
HDF5 OPeNDAP Handler Updates, and Performance Discussion
 
Hyrax: Serving Data from S3
Hyrax: Serving Data from S3Hyrax: Serving Data from S3
Hyrax: Serving Data from S3
 
HDF - Current status and Future Directions
HDF - Current status and Future DirectionsHDF - Current status and Future Directions
HDF - Current status and Future Directions
 
HDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and FutureHDFEOS.org User Analsys, Updates, and Future
HDFEOS.org User Analsys, Updates, and Future
 
HDF - Current status and Future Directions
HDF - Current status and Future Directions HDF - Current status and Future Directions
HDF - Current status and Future Directions
 
H5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only LibraryH5Coro: The Cloud-Optimized Read-Only Library
H5Coro: The Cloud-Optimized Read-Only Library
 
MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10MATLAB Modernization on HDF5 1.10
MATLAB Modernization on HDF5 1.10
 
HDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDFHDF for the Cloud - Serverless HDF
HDF for the Cloud - Serverless HDF
 
HDF5 <-> Zarr
HDF5 <-> ZarrHDF5 <-> Zarr
HDF5 <-> Zarr
 
HDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server FeaturesHDF for the Cloud - New HDF Server Features
HDF for the Cloud - New HDF Server Features
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
 
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
 
HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?HDF5 and Ecosystem: What Is New?
HDF5 and Ecosystem: What Is New?
 
HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020HDF5 Roadmap 2019-2020
HDF5 Roadmap 2019-2020
 
Leveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software TestingLeveraging the Cloud for HDF Software Testing
Leveraging the Cloud for HDF Software Testing
 

Kürzlich hochgeladen

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Subsetting

  • 1. SUBSETTING Matt Smith Information Technology and Systems Center (ITSC) University of Alabama in Huntsville (UAH) http://subset.itsc.uah.edu UAH The University of Alabama in Huntsville
  • 2. Subsetting • Goal: to provide a science data user with only the data they request as quickly as possible. • Benefits science data users and data centers: - reduces analysis time by reducing amount of data - reduces time for data delivery - reduces resources (network, personnel, media, etc.) • Steps: - locate spatial / temporal / spectral area of interest - extract - re-assemble for distribution 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 3. HEW • HDF-EOS Web-based Subsetter • Prototype software designed to be dataset-independent (HDF-EOS) • Front-end/GUI • Uses HTML forms and JavaScript • Optional • Back-end • Needs subset criteria file and HDF-EOS data • Performs subsetting as a “batch” job • http://subset.itsc.uah.edu/hew2k 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 4.
  • 5. Subset Criteria File • • • • File(s) to subset Parameters/channels E-mail address Bounding box Req’d Req’d Req’d Opt. • Latitude/Longitude bounds • Row/Column bounds (grids only) • • • • • • Time range Subsampling stride Non-geolocated objects (also_include) Output file prefix .met file Sub-scan subsetting (swaths only) 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD Opt. Req’d Opt. Opt. Opt. Opt. UAH The University of Alabama in Huntsville
  • 6. Example Subset Criteria File GROUP = SUBSET PARENT_FILE =(“/AQUA/AMSR/AE_L2A.hdfeos”) LATITUDE_RANGE = (35.000000, 40.000000) LONGITUDE_RANGE = (-77.000000, -72.000000) EMAIL = “user@company.com” OUTPUT_PREFIX = “NC_coast” MET_FILE = “YES” GROUP = SPOG NAME = “swath_1” TYPE = “SWATH” PARAMETERS = “89.0V_Res.1_TB”, “89.0V_Res.2_TB”) SUBSAMPLING = (“GeoTrack”, 1, “GeoXtrack”, 1) END_GROUP = SPOG END_GROUP = SUBSET END 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 7. HEW Back-end • • • • • • Uses HDF-EOS (and HDF) library Instructions via a subset criteria file (ODL) Handles multiple similar files Handles Swath and/or Grid objects Unix (SGI & Sun) executables available Subsetted output files contain: • • • • StructMetadata (HDF-EOS) ArchiveMetadata* ProductMetadata (added by HEW ODL file) CoreMetadata* (w/ modified bounding box & time info) • optionally placed in . m e t file • 27-28 February 2002 * if present in parent file Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 8. HEW Subsettable data EOS DATASETS • Terra MODIS MOPITT ASTER • Aqua AMSR-E 27-28 February 2002 OTHERS TRMM TMI NOAA-15 AMSU-A Any other HDF-EOS2 (HDF4) data written with HDF-EOS library subsetting calls in mind Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 9. HEW integration with ECS EDG System 2 EDG Order submission (HTML) ECS ECS 1 End user 7 3 Output data (Reingested) 4 Data order and reply Subset ODL and reply 6 Output data Subsetter 5 Input data Subsetting System 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 10. ECS integration plans • • • • • UAH/ITSC-written interface software 6a.05 to be released in March NSIDC, GDAAC, EDC EDG v3.4 will have subsetting options Enhancements for DAACs 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 11. Subsetting web-site • http://www.subset.org • Hope to create “portal” • for everyone involved in subsetting • • • • • • • 27-28 February 2002 Advertising Forums Data Software Glossary Tutorials Links to specialized subsetters Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 12. Other HDF-EOS Tools • • • • SPOT – Subsettability Checker eospeek – HDF-EOS file display hdfpeek – HDF file display HDF-EOS user’s manual (in work) 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 13. Subsetting Plans • • • • • Complete ECS Integration • Maintain/Improve as needed for DAACs • Front-end polar projection coverage map (NSIDC) Certify software with new datasets (Aqua, Aura,…) Incorporate ESML usage Provide support for HDF-EOS5 Provide additional specialized subsetting applications for instrument teams and others 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 14. Earth Science Markup Language “Define Once, Use Anywhere” Information Technology and Systems Center University of Alabama in Huntsville http://esml.itsc.uah.edu Contact Info: Rahul Ramachandran rramachandran@itsc.uah.edu Research Effort Supported by: Karen Moe Earth Science Technology Office, NASA UAH The University of Alabama in Huntsville
  • 15. Data Characteristics • Different Data Formats • BUFR (DoD, WMO) • CDF, NetCDF • GRIB (WMO) • HDF, HDF-EOS (NASA) • Free formats (Binary, ASCII) • GRaDs, McIDAS, Pheonix, URF etc etc • Different states of processing • raw, calibrated, derived, modeled or interpreted • Data/application interoperability problem • Most scientists are not programmers • Writing data decoders takes time and effort! 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 16. Data/Application Interoperability Problem DATA DATA FORMAT 11 FORMAT DATA DATA FORMAT 22 FORMAT DATA DATA FORMAT 33 FORMAT FORMAT CONVERTER READER 1 READER 2 APPLICATION • Specialized code for every format • Difficult to assimilate new data types • Enforce a Standard Data Format • Not practical for legacy datasets 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 17. What is ESML? • Specialized markup language for Earth Science metadata based on XML • Machine-readable and -interpretable representation of the structure and content of any data file, regardless of data format • ESML is NOT a new data format • External metadata files that can be generated by either data producer or data consumer (at collection, data set, and/or granule level) • ESML will provide the benefits of a standard, self-describing data format (like HDF, HDF-EOS, netCDF, geoTIFF, …) without the cost of data conversion • ESML consists of three types of metadata: • Syntactic • Semantic • Content 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 18. Three types of metadata • Syntactic – Structural information, bits, word-length, endianness, sequence • Semantic – Meaning of the data, units, frame of reference • Content – “Typical” metadata, general information • Producer contact info, version – Searchable information, keywords, etc. 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 19. WMO Sounding data 4 4 5 5 6 4 6 6 6 6 4 5 6 6 5 6 5 5 6 6 4 6 5 6 4 10000 9250 8890 8830 8765 8500 8136 7840 7554 7279 7000 6910 6753 6499 6130 6017 5840 5630 5563 5348 5000 4940 4890 4740 4000 27-28 February 2002 94 99999 99999 99999 99999 757 99999 99999 99999 99999 1102 142 -28 99999 99999 1159 150 -20 99999 99999 1219 99999 99999 310 26 1468 130 -40 330 36 1828 99999 99999 320 46 2133 99999 99999 315 62 2438 99999 99999 310 67 2743 99999 99999 295 62 3065 16 -124 275 67 3169 20 -170 99999 99999 3352 99999 99999 270 93 3657 99999 99999 270 118 4121 -39 -199 99999 99999 4267 99999 99999 255 93 4500 -73 -193 99999 99999 4784 -87 -297 99999 99999 4876 99999 99999 240 87 5181 99999 99999 230 103 5700 -161 -241 235 139 5791 99999 99999 235 139 5867 -173 -243 99999 99999 6096 99999 99999 235 129 7340 -287 -367 245 108 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 20. Ex: ESML for WMO Sounding Data <a:ESML xmlns:a="ESML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="ESML R:SchemaESML.xsd"> <SyntacticMetaData> <Ascii> <AsciiStructure name="DataTable" geoInfo="NoGeoInfo" instances="1"> <Array occurs=“*"> <Field format="%d" name="number"> <Attribute/> </Field> <Field format="%d" name="PRESSURE"> <Data unit="mb" equation="X/10“ FillValue=“99999”/> </Field> <Field format="%d" name="HEIGHT"> <Data unit=“m”/> </Field> <Field format="%d" name="TEMPERATURE"> <Data unit=“K” equation="X/10/> </Field> <Field format="%d" name="DEWPOINT"> <Data unit=“K” equation="X/10/> </Field> <Field format="%d" name="WIND DIRECTION"> <Data unit=“DDD”/> </Field> <Field format="%d" name="WIND SPEED"> <Data unit=“m/s”/> </Field> </Array> </AsciiStructure> </Ascii> </SyntacticMetaData> </a:ESML> 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 21. ESML Schema/Library • ESML Schema Version 0.5 • Supports ASCII, Binary and HDF-EOS • W3C compliant schema • Extensible allowing new formats or modifications • ESML Library Version 0.5 • C++ version • Windows 95/98/NT/2000 version • Porting to LINUX version • Changed from Oracle XML parser to Apache XML parser 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 22. Future Plans • Addition of new data formats to both Schema and Library • GRIB • McIDAS • BUFR, HDF4/5 and others • Versions of the Library • C/C++ version – UNIX • C/C++ version – Mac • Java version 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 23. ESML Data Browser • Hybrid product (Java Client/C++ Library backend/JNI Interface) • Prototype version is the extension of the ESML Demo Tool • Features • Browse and view data values using an ESML file • Browse the metadata for each data field • Future Features: • Allow format conversion with automatic generation of ESML metadata file • Allow selection of multiple fields • Additional functionality such as Subsetting • Browse data images 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 24. ESML Editor • 100% Java version prototype • Unable to find a COTS editor • Utilizes Expert System principles to give users correct options • Hides XML tags from the users • Future Features: • Allow text editing of the XML tags also • Incorporate feedback from users 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 25. ESML Web Page • URL: esml.itsc.uah.edu • Post latest products, news, presentations, papers • Schema and related documents available to all • Beta version of the Library available on limited basis • Beta versions of ESML Editor and ESML Data Browser will be made available soon 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 26. Other UAH/ITSC work • • • • AMSR-E Passive Microwave (PM)-ESIP ADaM – Algorithm Development and Mining EVE – An EnVironmEnt for On-board Processing 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville
  • 27. Contact UAH/ITSC HEW (HDF-EOS Web-based Subsetter) http://subset.itsc.uah.edu/hew2k ESML (Earth Science Markup Language) http://esml.itsc.uah.edu General Purpose Subsetting (ADaM) http://datamining.itsc.uah.edu On-Demand Subsetting (Passive Microwave – ESIP) http://pm-esip.msfc.nasa.gov SSM/I Coarse-grain Subsetting http://ghrc.msfc.nasa.gov/ssmi/ssmi_subset.html 27-28 February 2002 Science Data Processing Workshop Greenbelt, MD UAH The University of Alabama in Huntsville