This presentation was given at the 2019 GlobusWorld Conference in Chicago, IL by Giri Prakash from the ARM Data Center at Oak Ridge National Laboratory.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
1. May 6, 2019 1
Improving Data Transfer and Delivery using Globus
Recent Upgrades to The Atmospheric Radiation
Measurement (ARM) Facility Data Center Architecture
GIRI PRAKASH, ZACH PRICE, JOSEPH OLATT, AND JITU KUMAR
ARM Data Center, Oak Ridge National Laboratory
Globus World, May 01, 2019
palanisamyg@ornl.gov
2. ARM’s Vision
2
To provide a detailed & accurate description
of the earth atmosphere in diverse climate
regimes to resolve the uncertainties in climate
and earth system models toward the
development of sustainable solutions for the
Nation’s energy & environmental challenges.
4. ARM Data – Disaster Recovery
Offsite Data backup
ARM Data files that are copied into the ORNL HPSS system at ORNL are also
copied over to the HPSS system at ANL. The Globus-URL-copy program and the
ESnet network between the two labs are utilized for this purpose.
■ Date copying to ANL started: 03-26-2018
■ Total size transferred: 188.03 TB
■ Total number of files transferred: 3,938,465
4
7. Data Pipeline and Software Architecture
May 6, 2019 7
Data Processing
Storage &
Data
Model
Querying Analytics Scientific
Users
Data Pipeline
Software Architecture
Interface
Visualization
Analytics
Output
Spark
ARM HPC
Computing Clusters
JupyterLab
Relational Database NoSQL Database
• Supports fast analysis
of voluminous data
• Hides architectural
complexities
• Stage data in HPC
• Metadata
• Order History
• Data from multiple
instruments
Frontend
Analytic Server
Backend
Dr.Bhargavi Krishna, Yuping Lu, and Dr.Jitu Kumar
7
8. Data Retrieval, Packaging, and Delivery
§ Merging
§ DQR filtering
§ Conversion
Retrieval
Future
capability
Data-
streams
HPSS
Online
copy
Link to data access
Data quality
Access to plots
DOI based citation guidance
Publication request
Discovery
UI
&
Web services
NetCDF
data
extractions
Data
staging
order
Live Data WS
8
10. 10
§ Based on big data analysis platform
(NoSQL)
§ ARM HPC Clusters for data
processing
§ Provides an interactive web
interface for users to find
simulations of interest through
examination of the LES
performance relative to select ARM
observations
§ Allows user to visualize LASSO
data bundle diagnostics and skill
scores on the fly using plots and
tables
§ Globus as a delivery option
Cassandra
D3 &
NodeJS
Spark
Data Discovery for LASSO