SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Adaptive Transfer Adjustment in Efficient Bulk
Data Transfer Management for Climate
Datasets
Alex Sim, Mehmet Balman, Dean Williams,
Arie Shoshani, Vijaya Natarajan
Lawrence Berkeley National Laboratory
Lawrence Livermore National Laboratory
The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems
PDCS2010 - Nov 9th, 2010
Earth System Grid
ESG (Earth System Grid)
Supports the infrastructure for climate research
Provide technology to access, distributed, transport, catalog climate
simulation data
Production since 2004
ESG-I (1999-2001)
ESG-II (2002-2006)
ESG-CET(2006-present)
ANL, LANL, LBNL, LLNL,
NCAR, ORNL, NERSC, …
NCAR CCSM ESG portal
237 TB of data at four locations: (NCAR, LBNL, ORNL, LANL)
965,551 files
Includes the past 7 years of joint DOE/NSF climate modeling experiments
LLNL CMIP-3 (IPCC AR4) ESG portal
35 TB of data at one location
83,337 files
model data from 13 countries
Generated by a modeling campaign coordinated by the
Intergovernmental Panel on Climate Change (IPCC)
Over 565 scientific peer-review publications
ESG (Earth System Grid) / Climate Simulation Data
ESG (Earth System Grid) / Climate Simulation Data
ESG web portals distributed worldwide
Over 2,700 sites
120 countries
25,000 users
Over 1 PB downloaded
Climate Data : ever-increasing sizes
Early 1990’s (e.g., AMIP1, PMIP, CMIP1)
modest collection of monthly mean 2D files: ~1 GB
Late 1990’s (e.g., AMIP2)
large collection of monthly mean and 6-hourly 2D and 3D fields:
~500 GB
2000’s (e.g., IPCC/CMIP3)
fairly comprehensive output from both ocean and atmospheric
components; monthly, daily, and 3 hourly: ~35 TB
2011:
The IPCC 5th Assessment Report (AR5) in 2011: expected 5 to 15 PB
The Climate Science Computational End Station (CCES) project at
ORNL: expected around 3 PB
The North American Regional Climate Change Assessment Program
(NARCCAP): expected around 1 PB
The Cloud Feedback Model Intercomparison Project (CFMIP) archives:
expected to be .3 PB
ESG (Earth System Grid) / Climate Simulation Data
Results from the Parallel Climate Model (PCM)
depicting wind vectors, surface pressure, sea
surface temperature, and sea ice concentration.
Prepared from data published in the ESG using the
FERRET analysis tool by Gary Strand, NCAR.
Massive data collections:
● shared by thousands of
researchers
● distributed among many data
nodes around the world
Replication of published core
collection (for scalability and
availability)
Bulk Data Movement in ESG
● Move terabytes to petabytes (many thousands of files)
● Extreme variance in file sizes
● Reliability and Robustness
● Asynchronous long-lasting operation
● Recovery from transient failures and automatic restart
● Support for checksum verification
● On-demand transfer request status
● Estimation of request completion time
Replication Use-case
Compute Nodes
Compute Nodes
Data Flow
Data Nodes
Gateways
Compute Nodes
Clients
Data Nodes
Data Node
Japan Gateway
Germany Gateway
Data Node
Data Nodes
Data Node
Data Node
Data Nodes Data Nodes
Data Nodes
Australia Gateway
NCAR Gateway
LLNL Gateway
ORNL Gateway
Canada Gateway
LBNL Gateway
UK Gateway
Compute Nodes
Compute Nodes
WAN
Distributed File System
. . .
: : : : : :
BeStMan . . .
Data Node
Load Balancing Module
Parallelism / Concurrency
Module
File Replication Module
Transfer Servers
. . .
. . .
. . .
. . .
. . .
Hot File Replications
Transfer Servers Transfer Servers
Data Flow
Faster Data Transfers
End-to-end bulk data transfer (latency wall)
 TCP based solutions
 Fast TCP, Scalable TCP etc
 UDP based solutions
 RBUDP, UDT etc
 Most of these solutions require kernel level changes
 Not preferred by most domain scientists
Application Level Tuning
 Take an application-level transfer protocol (i.e.
GridFTP) and tune-up for better performance:
 Using Multiple (Parallel) streams
 Tuning Buffer size
(efficient utilization of available network capacity)
Level of Parallelism in End-to-end Data Transfer
 number of parallel data streams connected to a data transfer
service for increasing the utilization of network bandwidth
 number of concurrent data transfer operations that are
initiated at the same time for better utilization of system
resources.
Parallel TCP Streams
 Instead of a single connection at a time, multiple TCP
streams are opened to a single data transfer service in
the destination host.
 We gain larger bandwidth in TCP especially in a
network with less packet loss rate; parallel connections
better utilize the TCP buffer available to the data
transfer, such that N connections might be N times
faster than a single connection
 Multiple TCP streams puts extra system overhead
Parallel TCP Streams
Bulk Data Mover (BDM)
● Monitoring and statistics collection
● Support for multiple protocols (GridFTP, HTTP)
● Load balancing between multiple servers
● Data channel connection caching, pipelining
● Parallel TCP stream, concurrent data transfer operations
● Multi-threaded transfer queue management
● Adaptability to the available bandwidth
Bulk Data Mover (BDM)
DBFileFile File
…
FileFileFileFile
File File File
…
WAN
Network Connection and
Concurrency Manager
Transfer
Queue
Network
Connections
Transfer
Control Manager
Transfer Queue
Monitor and Manager
Concurrent Connections
…
DB/Storage
Queue Manager
Local Storage
…
File
File
…
File
File
…
File
File
…
Transfer Queue Management
* Plots generated from NetLogger
time time
timetime
The number of concurrent
transfers on the left
column shows consistent
over time in well-managed
transfers shown at the
bottom row, compared to
the ill or non-managed
data connections shown at
the top row. It leads to the
higher overall throughput
performance on the lower-
right column.
Adaptive Transfer Management
Bulk Data Mover (BDM)
● number parallel TCP streams?
● number of concurrent data transfer operations?
● Adaptability to the available bandwidth
Parallel Stream vs Concurrent Transfer
16x232x1 4x8
• Same number of total streams, but different number of concurrent connections
Parameter Estimation
 Can we predict this
behavior?
 Yes, we can come up with
a good estimation for the
parallelism level
 Network statistics
 Extra measurement
 Historical data
Parallel Stream Optimization
single stream, theoretical calculation of
throughput based on MSS, RTT and packet loss
rate:
n streams gains as much as total throughput of
n single stream: (not correct)
A better model: a relation is established
between RTT, p and the number of streams n:
Parameter Estimation
 Might not reflect the best possible current settings
(Dynamic Environment)
 What if network condition changes?
 Requires three sample transfers (curve fitting)
 need to probe the system and make
measurements with external profilers
 Does require a complex model for parameter
optimization
Adaptive Tuning
 Instead of predictive sampling, use data from
actual transfer
 transfer data by chunks (partial transfers) and
also set control parameters on the fly.
 measure throughput for every transferred data
chunk
 gradually increase the number of parallel
streams till it comes to an equilibrium point
Adaptive Tuning
 No need to probe the system and make
measurements with external profilers
 Does not require any complex model for
parameter optimization
 Adapts to changing environment
But, overhead in changing parallelism level
Fast start (exponentially increase the number of
parallel streams)
Adaptive Tuning
 Start with single stream (n=1)
 Measure instant throughput for every data chunk transferred
(fast start)
 Increase the number of parallel streams (n=n*2),
 transfer the data chunk
 measure instant throughput
 If current throughput value is better than previous one,
continue
 Otherwise, set n to the old value and gradually increase
parallelism level (n=n+1)
 If no throughput gain by increasing number of streams
(found the equilibrium point)
 Increase chunk size (delay measurement period)
Dynamic Tuning Algorithm
Dynamic Tuning Algorithm
Dynamic Tuning Algorithm
Parallel Streams (estimate starting point)
Log-log scale
Can we predict
this behavior?
Power-law model
Achievable throughput in percentage over the
number of streams with low/medium/high RTT;
(a) RTT=1ms, (b) RTT=5ms, (c) RTT=10ms, (d) RTT=30ms, (e)
RTT=70ms, (f) RTT=140ms (c=100 (n/c)<1 k =300 max RRT)
Power-law model
T = (n / c) (RTT / k)
80-20 rule-Pareto dist.
0.8 = (n / c) (RTT / k)
n = (e ( k * ln 0.8 / RTT )
) ∙ c
Extend power-law model:
Unlike other models in the literature (trying to find an approximation model for
the multiple streams and throughput relationship), this model only focuses on the
initial behavior of the transfer performance. When RTT is low, the achievable
throughput starts high with the low number of streams and quickly approaches to
the optimal throughput. When RTT is high, more number of streams is needed
for higher achievable throughput.
Get prepared to next-generation networks:
-100Gbps
-RDMA
Future Work
The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems
PDCS2010 - Nov 9th, 2010
Earth System Grid
Bulk Data Mover
http://sdm.lbl.gov/bdm/
Earth System Grid
http://www.earthsystemgrid.org
http://esg-pcmdi.llnl.gov/
Support emails
esg-support@earthsystemgrid.org
srm@lbl.gov

Weitere ähnliche Inhalte

Was ist angesagt?

Download-manuals-surface water-waterlevel-40howtocompiledischargedata
 Download-manuals-surface water-waterlevel-40howtocompiledischargedata Download-manuals-surface water-waterlevel-40howtocompiledischargedata
Download-manuals-surface water-waterlevel-40howtocompiledischargedatahydrologyproject001
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceSpiros Oikonomakis
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 
Paper2_CSE6331_Vivek_1001053883
Paper2_CSE6331_Vivek_1001053883Paper2_CSE6331_Vivek_1001053883
Paper2_CSE6331_Vivek_1001053883Vivek Sharma
 
HACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a SupercomputerHACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a Supercomputerinside-BigData.com
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Jonathan Rinfret
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy Ehsan Sharifi
 
Pcgrid presentation qos p2p grid
Pcgrid presentation   qos p2p gridPcgrid presentation   qos p2p grid
Pcgrid presentation qos p2p gridmarcuswac
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGijccsa
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTDavide Gallitelli
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)Josep Flix
 
NNPDF3.0: parton distributions for the LHC Run II
NNPDF3.0: parton distributions for the LHC Run IINNPDF3.0: parton distributions for the LHC Run II
NNPDF3.0: parton distributions for the LHC Run IIjuanrojochacon
 
An Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather ForecastingAn Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather ForecastingIOSR Journals
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-thingsnickyn
 
Continental division of load and balanced ant
Continental division of load and balanced antContinental division of load and balanced ant
Continental division of load and balanced antIJCI JOURNAL
 

Was ist angesagt? (19)

Download-manuals-surface water-waterlevel-40howtocompiledischargedata
 Download-manuals-surface water-waterlevel-40howtocompiledischargedata Download-manuals-surface water-waterlevel-40howtocompiledischargedata
Download-manuals-surface water-waterlevel-40howtocompiledischargedata
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Paper2_CSE6331_Vivek_1001053883
Paper2_CSE6331_Vivek_1001053883Paper2_CSE6331_Vivek_1001053883
Paper2_CSE6331_Vivek_1001053883
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
 
HACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a SupercomputerHACC: Fitting the Universe Inside a Supercomputer
HACC: Fitting the Universe Inside a Supercomputer
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)Rinfret, Jonathan poster(2)
Rinfret, Jonathan poster(2)
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
 
Pcgrid presentation qos p2p grid
Pcgrid presentation   qos p2p gridPcgrid presentation   qos p2p grid
Pcgrid presentation qos p2p grid
 
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTINGLOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
LOAD BALANCING ALGORITHM TO IMPROVE RESPONSE TIME ON CLOUD COMPUTING
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)
 
NNPDF3.0: parton distributions for the LHC Run II
NNPDF3.0: parton distributions for the LHC Run IINNPDF3.0: parton distributions for the LHC Run II
NNPDF3.0: parton distributions for the LHC Run II
 
An Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather ForecastingAn Enhanced Support Vector Regression Model for Weather Forecasting
An Enhanced Support Vector Regression Model for Weather Forecasting
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things829 tdwg-2015-nicolson-kew-strings-to-things
829 tdwg-2015-nicolson-kew-strings-to-things
 
Continental division of load and balanced ant
Continental division of load and balanced antContinental division of load and balanced ant
Continental division of load and balanced ant
 

Ähnlich wie Pdcs2010 balman-presentation

Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networksHarshavardhan Achrekar
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balmanbalmanme
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarbalmanme
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011balmanme
 
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...csandit
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksbalmanme
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networksbalmanme
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...IJMER
 
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Orchestrating Bulk Data Transfers across Geo-Distributed Datacentersnexgentechnology
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSNexgen Technology
 
Orchestrating bulk data transfers across
Orchestrating bulk data transfers acrossOrchestrating bulk data transfers across
Orchestrating bulk data transfers acrossnexgentech15
 
A location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsA location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsIAEME Publication
 
From System Logs to System Optimization with the Power of Data Science and Ma...
From System Logs to System Optimization with the Power of Data Science and Ma...From System Logs to System Optimization with the Power of Data Science and Ma...
From System Logs to System Optimization with the Power of Data Science and Ma...Globus
 

Ähnlich wie Pdcs2010 balman-presentation (20)

Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networks
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
 
Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011
 
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
REDUCING THE MONITORING REGISTER FOR THE DETECTION OF ANOMALIES IN SOFTWARE D...
 
Journal paper 1
Journal paper 1Journal paper 1
Journal paper 1
 
Streaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networksStreaming exa-scale data over 100Gbps networks
Streaming exa-scale data over 100Gbps networks
 
Experiences with High-bandwidth Networks
Experiences with High-bandwidth NetworksExperiences with High-bandwidth Networks
Experiences with High-bandwidth Networks
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
Fast Data Collection with Interference and Life Time in Tree Based Wireless S...
 
Bg4101335337
Bg4101335337Bg4101335337
Bg4101335337
 
cpc-152-2-2003
cpc-152-2-2003cpc-152-2-2003
cpc-152-2-2003
 
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
Orchestrating Bulk Data Transfers across Geo-Distributed Datacenters
 
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERSORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
ORCHESTRATING BULK DATA TRANSFERS ACROSS GEO-DISTRIBUTED DATACENTERS
 
Orchestrating bulk data transfers across
Orchestrating bulk data transfers acrossOrchestrating bulk data transfers across
Orchestrating bulk data transfers across
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
A location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applicationsA location based least-cost scheduling for data-intensive applications
A location based least-cost scheduling for data-intensive applications
 
From System Logs to System Optimization with the Power of Data Science and Ma...
From System Logs to System Optimization with the Power of Data Science and Ma...From System Logs to System Optimization with the Power of Data Science and Ma...
From System Logs to System Optimization with the Power of Data Science and Ma...
 
Telegraph Cq English
Telegraph Cq EnglishTelegraph Cq English
Telegraph Cq English
 

Mehr von balmanme

Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...balmanme
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...balmanme
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1balmanme
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09balmanme
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...balmanme
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...balmanme
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010balmanme
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterbalmanme
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerbalmanme
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarbalmanme
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networksbalmanme
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...balmanme
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12balmanme
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation balmanme
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11balmanme
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100gbalmanme
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2balmanme
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAbalmanme
 

Mehr von balmanme (20)

Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-poster
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summer
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balman
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
 
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
MemzNet: Memory-Mapped Zero-copy Network Channel -- Streaming exascala data o...
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100g
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
 

Kürzlich hochgeladen

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Pdcs2010 balman-presentation

  • 1. Adaptive Transfer Adjustment in Efficient Bulk Data Transfer Management for Climate Datasets Alex Sim, Mehmet Balman, Dean Williams, Arie Shoshani, Vijaya Natarajan Lawrence Berkeley National Laboratory Lawrence Livermore National Laboratory The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems PDCS2010 - Nov 9th, 2010 Earth System Grid
  • 2. ESG (Earth System Grid) Supports the infrastructure for climate research Provide technology to access, distributed, transport, catalog climate simulation data Production since 2004 ESG-I (1999-2001) ESG-II (2002-2006) ESG-CET(2006-present) ANL, LANL, LBNL, LLNL, NCAR, ORNL, NERSC, …
  • 3. NCAR CCSM ESG portal 237 TB of data at four locations: (NCAR, LBNL, ORNL, LANL) 965,551 files Includes the past 7 years of joint DOE/NSF climate modeling experiments LLNL CMIP-3 (IPCC AR4) ESG portal 35 TB of data at one location 83,337 files model data from 13 countries Generated by a modeling campaign coordinated by the Intergovernmental Panel on Climate Change (IPCC) Over 565 scientific peer-review publications ESG (Earth System Grid) / Climate Simulation Data
  • 4. ESG (Earth System Grid) / Climate Simulation Data ESG web portals distributed worldwide Over 2,700 sites 120 countries 25,000 users Over 1 PB downloaded
  • 5. Climate Data : ever-increasing sizes Early 1990’s (e.g., AMIP1, PMIP, CMIP1) modest collection of monthly mean 2D files: ~1 GB Late 1990’s (e.g., AMIP2) large collection of monthly mean and 6-hourly 2D and 3D fields: ~500 GB 2000’s (e.g., IPCC/CMIP3) fairly comprehensive output from both ocean and atmospheric components; monthly, daily, and 3 hourly: ~35 TB 2011: The IPCC 5th Assessment Report (AR5) in 2011: expected 5 to 15 PB The Climate Science Computational End Station (CCES) project at ORNL: expected around 3 PB The North American Regional Climate Change Assessment Program (NARCCAP): expected around 1 PB The Cloud Feedback Model Intercomparison Project (CFMIP) archives: expected to be .3 PB
  • 6. ESG (Earth System Grid) / Climate Simulation Data Results from the Parallel Climate Model (PCM) depicting wind vectors, surface pressure, sea surface temperature, and sea ice concentration. Prepared from data published in the ESG using the FERRET analysis tool by Gary Strand, NCAR. Massive data collections: ● shared by thousands of researchers ● distributed among many data nodes around the world Replication of published core collection (for scalability and availability)
  • 7. Bulk Data Movement in ESG ● Move terabytes to petabytes (many thousands of files) ● Extreme variance in file sizes ● Reliability and Robustness ● Asynchronous long-lasting operation ● Recovery from transient failures and automatic restart ● Support for checksum verification ● On-demand transfer request status ● Estimation of request completion time
  • 8. Replication Use-case Compute Nodes Compute Nodes Data Flow Data Nodes Gateways Compute Nodes Clients Data Nodes Data Node Japan Gateway Germany Gateway Data Node Data Nodes Data Node Data Node Data Nodes Data Nodes Data Nodes Australia Gateway NCAR Gateway LLNL Gateway ORNL Gateway Canada Gateway LBNL Gateway UK Gateway Compute Nodes Compute Nodes WAN Distributed File System . . . : : : : : : BeStMan . . . Data Node Load Balancing Module Parallelism / Concurrency Module File Replication Module Transfer Servers . . . . . . . . . . . . . . . Hot File Replications Transfer Servers Transfer Servers Data Flow
  • 9. Faster Data Transfers End-to-end bulk data transfer (latency wall)  TCP based solutions  Fast TCP, Scalable TCP etc  UDP based solutions  RBUDP, UDT etc  Most of these solutions require kernel level changes  Not preferred by most domain scientists
  • 10. Application Level Tuning  Take an application-level transfer protocol (i.e. GridFTP) and tune-up for better performance:  Using Multiple (Parallel) streams  Tuning Buffer size (efficient utilization of available network capacity) Level of Parallelism in End-to-end Data Transfer  number of parallel data streams connected to a data transfer service for increasing the utilization of network bandwidth  number of concurrent data transfer operations that are initiated at the same time for better utilization of system resources.
  • 11. Parallel TCP Streams  Instead of a single connection at a time, multiple TCP streams are opened to a single data transfer service in the destination host.  We gain larger bandwidth in TCP especially in a network with less packet loss rate; parallel connections better utilize the TCP buffer available to the data transfer, such that N connections might be N times faster than a single connection  Multiple TCP streams puts extra system overhead
  • 13. Bulk Data Mover (BDM) ● Monitoring and statistics collection ● Support for multiple protocols (GridFTP, HTTP) ● Load balancing between multiple servers ● Data channel connection caching, pipelining ● Parallel TCP stream, concurrent data transfer operations ● Multi-threaded transfer queue management ● Adaptability to the available bandwidth
  • 14. Bulk Data Mover (BDM) DBFileFile File … FileFileFileFile File File File … WAN Network Connection and Concurrency Manager Transfer Queue Network Connections Transfer Control Manager Transfer Queue Monitor and Manager Concurrent Connections … DB/Storage Queue Manager Local Storage … File File … File File … File File …
  • 15. Transfer Queue Management * Plots generated from NetLogger time time timetime The number of concurrent transfers on the left column shows consistent over time in well-managed transfers shown at the bottom row, compared to the ill or non-managed data connections shown at the top row. It leads to the higher overall throughput performance on the lower- right column.
  • 17. Bulk Data Mover (BDM) ● number parallel TCP streams? ● number of concurrent data transfer operations? ● Adaptability to the available bandwidth
  • 18. Parallel Stream vs Concurrent Transfer 16x232x1 4x8 • Same number of total streams, but different number of concurrent connections
  • 19. Parameter Estimation  Can we predict this behavior?  Yes, we can come up with a good estimation for the parallelism level  Network statistics  Extra measurement  Historical data
  • 20. Parallel Stream Optimization single stream, theoretical calculation of throughput based on MSS, RTT and packet loss rate: n streams gains as much as total throughput of n single stream: (not correct) A better model: a relation is established between RTT, p and the number of streams n:
  • 21. Parameter Estimation  Might not reflect the best possible current settings (Dynamic Environment)  What if network condition changes?  Requires three sample transfers (curve fitting)  need to probe the system and make measurements with external profilers  Does require a complex model for parameter optimization
  • 22. Adaptive Tuning  Instead of predictive sampling, use data from actual transfer  transfer data by chunks (partial transfers) and also set control parameters on the fly.  measure throughput for every transferred data chunk  gradually increase the number of parallel streams till it comes to an equilibrium point
  • 23. Adaptive Tuning  No need to probe the system and make measurements with external profilers  Does not require any complex model for parameter optimization  Adapts to changing environment But, overhead in changing parallelism level Fast start (exponentially increase the number of parallel streams)
  • 24. Adaptive Tuning  Start with single stream (n=1)  Measure instant throughput for every data chunk transferred (fast start)  Increase the number of parallel streams (n=n*2),  transfer the data chunk  measure instant throughput  If current throughput value is better than previous one, continue  Otherwise, set n to the old value and gradually increase parallelism level (n=n+1)  If no throughput gain by increasing number of streams (found the equilibrium point)  Increase chunk size (delay measurement period)
  • 28. Parallel Streams (estimate starting point)
  • 29. Log-log scale Can we predict this behavior?
  • 30. Power-law model Achievable throughput in percentage over the number of streams with low/medium/high RTT; (a) RTT=1ms, (b) RTT=5ms, (c) RTT=10ms, (d) RTT=30ms, (e) RTT=70ms, (f) RTT=140ms (c=100 (n/c)<1 k =300 max RRT) Power-law model T = (n / c) (RTT / k) 80-20 rule-Pareto dist. 0.8 = (n / c) (RTT / k) n = (e ( k * ln 0.8 / RTT ) ) ∙ c
  • 31. Extend power-law model: Unlike other models in the literature (trying to find an approximation model for the multiple streams and throughput relationship), this model only focuses on the initial behavior of the transfer performance. When RTT is low, the achievable throughput starts high with the low number of streams and quickly approaches to the optimal throughput. When RTT is high, more number of streams is needed for higher achievable throughput. Get prepared to next-generation networks: -100Gbps -RDMA Future Work
  • 32. The 22nd IASTED International Conference on Parallel and Distributed Computing and Systems PDCS2010 - Nov 9th, 2010 Earth System Grid Bulk Data Mover http://sdm.lbl.gov/bdm/ Earth System Grid http://www.earthsystemgrid.org http://esg-pcmdi.llnl.gov/ Support emails esg-support@earthsystemgrid.org srm@lbl.gov