SlideShare ist ein Scribd-Unternehmen logo
1 von 49
Downloaden Sie, um offline zu lesen
Experiences with 100Gbps
  Network Applications
Analyzing Data Movements and
  Identifying Techniques for
Next-generation High-bandwidth
           Networks
              Mehmet Balman
       Computational Research Division
     Lawrence Berkeley National Laboratory
Outline
Climate Data as a typical science scenario
 •  100Gbps Climate100 demo


MemzNet: Memory-mapped zero-copy Network
Channel

Advance Network Initiative (ANI) Testbed
 •  100Gbps Experiments


Future Research Plans
100Gbps networking has finally arrived!
Applications’ Perspective
Increasing the bandwidth is not sufficient by itself; we need
careful evaluation of high-bandwidth networks from the
applications’ perspective.


                           1Gbps to 10Gbps transition
                           (10 years ago)
                              Application did not run 10 times
                              faster because there was more
                              bandwidth available
The need for 100Gbps
Modern science is Data driven and Collaborative in
nature
 •  The largest collaborations are most likely to
    depend on distributed architectures.
  •  LHC (distributed architecture) data generation,
     distribution, and analysis.
  •  The volume of data produced by of genomic
     sequencers is rising exponentially.
  •  In climate science, researchers must analyze
     observational and simulation data located at facilities
     around the world
ANI
   100Gbps
   Demo
   Late 2011 – early 2012



•  100Gbps demo by ESnet and
                                        •  Visualization of remotely located data
   Internet2
                                           (Cosmology)
•  Application design issues and
   host tuning strategies to scale to   •  Data movement of large datasets with
   100Gbps rates                           many files (Climate analysis)
ESG (Earth Systems Grid)
•  Over 2,700 sites
•  25,000 users




                      •  IPCC Fifth Assessment
                         Report (AR5) 2PB
                      •  IPCC Forth Assessment
                         Report (AR4) 35TB
Data distribution for climate science
 How scientific data movement and analysis between
 geographically disparate supercomputing facilities can
 benefit from high-bandwidth networks?




•  Local copies
  •  data files are copied
   into temporary
   storage in HPC
   centers for post-
   processing and
   further climate
   analysis.
Climate Data Analysis
Petabytes of data
•  Data is usually in a remote location
•  How can we access efficiently?
Tuning Network Transfers
Adaptive Tuning




   Worked well with large files !
Climate Data over 100Gbps
•  Data volume in climate applications is increasing
   exponentially.
•  An important challenge in managing ever increasing data
   sizes in climate science is the large variance in file sizes.
  •  Climate simulation data consists of a mix of relatively small and
   large files with irregular file size distribution in each dataset.
       • Many small files
lots-of-small-files problem!
           file-centric tools?
                                 request
  request a file                 data


      send file
                                 send data




request a file


      send file




                                    RPC
       FTP

                 •  Concurrent transfers
                 •  Parallel streams
Moving climate files efficiently
MemzNet: memory-mapped zero-copy
               network channel




    Front-­‐end	
            Memory	
     network
    threads	
  (access	
     blocks                 Memory	
  
    to	
  memory	
                                               Front-­‐end	
  
                                                    blocks
    blocks)                                                        threads	
  
                                                                 (access	
  to	
  
                                                                  memory	
  
                                                                   blocks)




memory caches are logically mapped between client and
server
Advantages
•  Decoupling I/O and network operations
     •  front-end (I/O processing)
     •  back-end (networking layer)

•  Not limited by the characteristics of the file
   sizes
•    On the fly tar approach, bundling and sending many files
     together


•  Dynamic data channel management
      Can increase/decrease the parallelism level both in the network
     communication and I/O read/write operations, without closing and
     reopening the data channel connection (as is done in regular FTP
     variants).
Advantages
The synchronization of the memory cache is
accomplished based on the tag header.
•  Application processes interact with the memory
   blocks.
•  Enables out-of-order and asynchronous send
   receive

MemzNet is is not file-centric. Bookkeeping
information is embedded inside each block.
   •  Can increase/decrease the number of parallel
      streams without closing and reopening the
      data channel.
100Gbps demo environment




 RRT: Seattle – NERSC 16ms
      NERSC – ANL     50ms
      NERSC – ORNL 64ms
100Gbps Demo
•  CMIP3 data (35TB) from the GPFS filesystem at NERSC
 •  Block size 4MB
 •  Each block’s data section was aligned according to the
    system pagesize.
 •  1GB cache both at the client and the server


•  At NERSC, 8 front-end threads on each host for reading
  data files in parallel.
•  At ANL/ORNL, 4 front-end threads for processing
  received data blocks.
•  4 parallel TCP streams (four back-end threads) were
  used for each host-to-host connection.
83Gbps
throughput
Framework for the Memory-mapped
             Network Channel




memory caches are logically mapped between client and
server
MemzNet’s Features
Data files are aggregated and divided into simple blocks.
Blocks are tagged and streamed over the network. Each
data block’s tag includes information about the content
inside.

Decouples disk and network IO operations; so, read/write
threads can work independently.

Implements a memory cache managements system that is
accessed in blocks. These memory blocks are logically
mapped to the memory cache that resides in the remote site.
Performance at SC11 demo




          GridFTP                MemzNet




TCP buffer size is set to 50MB
Advance Network Initiative




     Esnets’s 100Gbps Testbed
Performance results in the 100Gbps ANI
               testbed
MemzNet’s Performance

                                           100Gbps demo

            GridFTP              MemzNet



TCP buffer size is set to 50MB




          ANI Testbed
Challenge?
•  High-bandwidth brings new challenges!


  •  We need substantial amount of processing power and involvement
    of multiple cores to fill a 40Gbps or 100Gbps network


•  Fine-tuning, both in network and application layers, to
 take advantage of the higher network capacity.

•  Incremental improvement in current tools?
    •  We cannot expect every application to tune and improve every time
       we change the link technology or speed.
MemzNet
•  MemzNet: Memory-mapped Network Channel
   •  High-performance data movement


MemzNet is an initial effort to put a new layer between the
application and the transport layer.



•  Main goal is to define a network channel so applications
 can directly use it without the burden of managing/tuning
 the network communication.
Framework for the Memory-mapped
             Network Channel




memory caches are logically mapped between client and
server
Advance Network Initiatieve




      Esnets’s 100Gbps Testbed
Initial Tests
²  Many TCP sockets oversubscribe the network
  and cause performance degradation.

²  Host system performance could easily be the
  bottleneck.

 •  TCP/UDP buffer tuning, using jumbo frames, and
  interrupt coalescing.

 •  Multi-core systems: IRQ binding is now
  essential for maximizing host performance.
Many Concurrent Streams




(a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single
NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps
pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a
different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).
Effects of many concurrent streams




ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min
intervals], TCP buffer size is 50M
Parallel Streams - 10Gbps




 ANI testbed 10Gbps connection: Interface traffic vs the number of parallel
 streams [1, 2, 4, 8, 16, 32 64 streams -5min intervals], TCP buffer size is
 set to 50MB
Parallel Streams – 40Gbps




ANI testbed 40Gbps (4x10NICs, single host): Interface traffic vs the
number of parallel streams [1, 2, 4, 8, 16, 32 64 streams - 5min intervals],
TCP buffer size is set to 50MB
Parallel streams - 40Gbps




 Figure 5: ANI testbed 40Gbps (4x10NICs, single host): Throughput vs the
 number of parallel streams [1, 2, 4, 8, 16, 32 64 streams - 5min intervals],
 TCP buffer size is set to 50MB
Host tuning
 With proper tuning, we achieved 98Gbps using
 only 3 sending hosts, 3 receiving hosts, 10 10GE
 NICS, and 10 TCP flows




Image source: “Experiences with 100Gbps network applications” In Proceedings of
the fifth international workshop on Data-Intensive Distributed Computing, 2012
NIC/TCP Tuning
•  We are using Myricom 10G NIC (100Gbps testbed)
  •  Download latest drive/firmware from vendor site
     •  Version of driver in RHEL/CentOS fairly old
  •  Enable MSI-X
  •  Increase txgueuelen
        /sbin/ifconfig eth2 txqueuelen 10000!
    •  Increase Interrupt coalescence!
        /usr/sbin/ethtool -C eth2 rx-usecs 100!
•  TCP Tuning:
    net.core.rmem_max = 67108864!
    net.core.wmem_max = 67108864!
    net.core.netdev_max_backlog = 250000


From “Experiences with 100Gbps network applications” In Proceedings of the fifth
international workshop on Data-Intensive Distributed Computing, 2012
100Gbps = It’s full of frames !
•  Problem:
  •  Interrupts are very expensive
  •  Even with jumbo frames and driver optimization, there is still
    too many interrupts.

•  Solution:
  •  Turn off Linux irqbalance (chkconfig irqbalance off)
  •  Use /proc/interrupt to get the list of interrupts
  •  Dedicate an entire processor core for each 10G interface
  •  Use /proc/irq/<irq-number>/smp_affinity to bind rx/tx queues to
    a specific core.



From “Experiences with 100Gbps network applications” In Proceedings of the fifth
international workshop on Data-Intensive Distributed Computing, 2012
Host Tuning Results
        45
        40
        35
        30
        25
 Gbps




        20
                                                                   without tuning
        15                                                         with tuning
        10
         5
         0
              Interrupt    Interrupt   IRQ Binding   IRQ Binding
             coalescing   coalescing     (TCP)         (UDP)
               (TCP)        (UDP)


Image source: “Experiences with 100Gbps network applications” In Proceedings of
the fifth international workshop on Data-Intensive Distributed Computing, 2012
Initial Tests
²  Many TCP sockets oversubscribe the network
  and cause performance degradation.

²  Host system performance could easily be the
  bottleneck.

 •  TCP/UDP buffer tuning, using jumbo frames, and
    interrupt coalescing.
 •  Multi-core systems: IRQ binding is now essential for
    maximizing host performance.
End-to-end Data Movement
•  Tuning parameters


•  Host performance


•  Multiple streams on the host systems


•  Multiple NICs and multiple cores


•  Effect of the application design
MemzNet’s Architecture for data streaming
MemzNet
•  MemzNet: Memory-mapped Network Channel
   •  High-performance data movement


MemzNet is an initial effort to put a new layer between the
application and the transport layer.



•  Main goal is to define a network channel so applications
 can directly use it without the burden of managing/tuning
 the network communication.
Framework for the Memory-mapped
             Network Channel




memory caches are logically mapped between client and
server
Related Work
•  Luigi Rizzo ’s netmap
 •  proposes a new API to send/receive data over
    the network
 •  http://info.iet.unipi.it/~luigi/netmap/


 •  RDMA programming model
    •  MemzNet can be used for RDMA-enabled transfers
     • memory-management middleware


 •  GridFTP popen + reordering ?
Future Work

•  Integrate MemzNet with scientific data formats for
 remote data analysis

•  Provide a user library for filtering and subsetting
 of data.
Testbed Results
  http://www.es.net/RandD/100g-testbed/

    •  Proposal process
    •  Testbed Description, etc.

•  http://www.es.net/RandD/100g-testbed/results/

  •  RoCE (RDMA over Ethernet) tests
  •  RDMA implementation (RFPT100)
  •  100Gbps demo paper
  •  Efficient Data Transfer Protocols
  •  MemzNet (Memory-mapped Zero-copy Network Channel)
Acknowledgements
Collaborators:
Eric Pouyoul, Yushu Yao, E. Wes Bethel Burlen
Loring, Prabhat, John Shalf, Alex Sim, Arie
Shoshani, and Brian L. Tierney

Special Thanks:
Peter Nugent, Zarija Lukic , Patrick Dorn, Evangelos
Chaniotakis, John Christman, Chin Guok, Chris Tracy,
Lauren Rotman, Jason Lee, Shane Canon, Tina Declerck,
Cary Whitney, Ed Holohan, Adam Scovel, Linda Winkler,
Jason Hill, Doug Fuller, Susan Hicks, Hank Childs, Mark
Howison, Aaron Thomas, John Dugan, Gopal Vaswani
NDM 2013 Workshop (tentative)

                      The 3rd International Workshop on
                      Network-aware Data Management
                          to be held in conjunction with
                           the EEE/ACM International
                        Conference for High Performance
                       Computing, Networking, Storage and
                                Analysis (SC’13)

                        http://sdm.lbl.gov/ndm


Editing a book (CRC press): NDM research
frontiers (tentative)

Weitere ähnliche Inhalte

Was ist angesagt?

Characteristics of an on chip cache on nec sx
Characteristics of an on chip cache on nec sxCharacteristics of an on chip cache on nec sx
Characteristics of an on chip cache on nec sxLéia de Sousa
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentEricsson
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
 
Conference Paper: Towards High Performance Packet Processing for 5G
Conference Paper: Towards High Performance Packet Processing for 5GConference Paper: Towards High Performance Packet Processing for 5G
Conference Paper: Towards High Performance Packet Processing for 5GEricsson
 
The Education of Computational Scientists
The Education of Computational ScientistsThe Education of Computational Scientists
The Education of Computational Scientistsinside-BigData.com
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technologyplenzogan
 
Panasas ® Los Alamos National Laboratory
Panasas ® Los Alamos National LaboratoryPanasas ® Los Alamos National Laboratory
Panasas ® Los Alamos National LaboratoryPanasas
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503Linaro
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialRoger Rafanell Mas
 
Dual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained NetworksDual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained Networksambitlick
 
DWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
DWDM-RAM: Enabling Grid Services with Dynamic Optical NetworksDWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
DWDM-RAM: Enabling Grid Services with Dynamic Optical NetworksTal Lavian Ph.D.
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226Nick Kypreos
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platformsSyed Zaid Irshad
 
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a MissionThe Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Missioninside-BigData.com
 

Was ist angesagt? (19)

Characteristics of an on chip cache on nec sx
Characteristics of an on chip cache on nec sxCharacteristics of an on chip cache on nec sx
Characteristics of an on chip cache on nec sx
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
Conference Paper: Towards High Performance Packet Processing for 5G
Conference Paper: Towards High Performance Packet Processing for 5GConference Paper: Towards High Performance Packet Processing for 5G
Conference Paper: Towards High Performance Packet Processing for 5G
 
The Education of Computational Scientists
The Education of Computational ScientistsThe Education of Computational Scientists
The Education of Computational Scientists
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technology
 
Panasas ® Los Alamos National Laboratory
Panasas ® Los Alamos National LaboratoryPanasas ® Los Alamos National Laboratory
Panasas ® Los Alamos National Laboratory
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
Dual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained NetworksDual-resource TCPAQM for Processing-constrained Networks
Dual-resource TCPAQM for Processing-constrained Networks
 
DWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
DWDM-RAM: Enabling Grid Services with Dynamic Optical NetworksDWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
DWDM-RAM: Enabling Grid Services with Dynamic Optical Networks
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Communication model of parallel platforms
Communication model of parallel platformsCommunication model of parallel platforms
Communication model of parallel platforms
 
Thesis11
Thesis11Thesis11
Thesis11
 
The Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a MissionThe Sierra Supercomputer: Science and Technology on a Mission
The Sierra Supercomputer: Science and Technology on a Mission
 

Ähnlich wie Analyzing Data Movements and Identifying Techniques for Next-generation Networks

Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011balmanme
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Simulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islandsSimulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islandsAPNIC
 
Morph : a novel accelerator
Morph : a novel acceleratorMorph : a novel accelerator
Morph : a novel acceleratorBaharJV
 
Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...QBiC_Tue
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...Jorge E. López de Vergara Méndez
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performanceSyed Zaid Irshad
 
Storage networks
Storage networksStorage networks
Storage networksAhmed Nour
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PROIDEA
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...Simon Lia-Jonassen
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3Shah Zaib
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioHostedbyConfluent
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 

Ähnlich wie Analyzing Data Movements and Identifying Techniques for Next-generation Networks (20)

Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011Balman climate-c sc-ads-2011
Balman climate-c sc-ads-2011
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
DPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles ShiflettDPDK Summit 2015 - Aspera - Charles Shiflett
DPDK Summit 2015 - Aspera - Charles Shiflett
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Simulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islandsSimulating the behavior of satellite Internet links to small islands
Simulating the behavior of satellite Internet links to small islands
 
Morph : a novel accelerator
Morph : a novel acceleratorMorph : a novel accelerator
Morph : a novel accelerator
 
Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...Data management for Quantitative Biology -Basics and challenges in biomedical...
Data management for Quantitative Biology -Basics and challenges in biomedical...
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Storage networks
Storage networksStorage networks
Storage networks
 
What is 3d torus
What is 3d torusWhat is 3d torus
What is 3d torus
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
 
What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...What should be done to IR algorithms to meet current, and possible future, ha...
What should be done to IR algorithms to meet current, and possible future, ha...
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
 
Hadoop
HadoopHadoop
Hadoop
 
Networking.pptx
Networking.pptxNetworking.pptx
Networking.pptx
 
Networking.pptx
Networking.pptxNetworking.pptx
Networking.pptx
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Mehr von balmanme

Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1balmanme
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...balmanme
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09balmanme
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...balmanme
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...balmanme
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balmanbalmanme
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010balmanme
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterbalmanme
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerbalmanme
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarbalmanme
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balmanbalmanme
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarbalmanme
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentationbalmanme
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12balmanme
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation balmanme
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11balmanme
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100gbalmanme
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2balmanme
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAbalmanme
 

Mehr von balmanme (20)

Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
A 100 gigabit highway for science: researchers take a 'test drive' on ani tes...
 
Balman stork cw09
Balman stork cw09Balman stork cw09
Balman stork cw09
 
Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...Available technologies: algorithm for flexible bandwidth reservations for dat...
Available technologies: algorithm for flexible bandwidth reservations for dat...
 
Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...Berkeley lab team develops flexible reservation algorithm for advance network...
Berkeley lab team develops flexible reservation algorithm for advance network...
 
Dynamic adaptation balman
Dynamic adaptation balmanDynamic adaptation balman
Dynamic adaptation balman
 
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
Nersc dtn-perf-100121.test_results-nercmeeting-jan21-2010
 
Cybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-posterCybertools stork-2009-cybertools allhandmeeting-poster
Cybertools stork-2009-cybertools allhandmeeting-poster
 
Presentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summerPresentation summerstudent 2009-aug09-lbl-summer
Presentation summerstudent 2009-aug09-lbl-summer
 
Lblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminarLblc sseminar jun09-2009-jun09-lblcsseminar
Lblc sseminar jun09-2009-jun09-lblcsseminar
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
Balman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet BalmanBalman dissertation Copyright @ 2010 Mehmet Balman
Balman dissertation Copyright @ 2010 Mehmet Balman
 
Aug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminarAug17presentation.v2 2009-aug09-lblc sseminar
Aug17presentation.v2 2009-aug09-lblc sseminar
 
Pdcs2010 balman-presentation
Pdcs2010 balman-presentationPdcs2010 balman-presentation
Pdcs2010 balman-presentation
 
Opening ndm2012 sc12
Opening ndm2012 sc12Opening ndm2012 sc12
Opening ndm2012 sc12
 
Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation Sc10 nov16th-flex res-presentation
Sc10 nov16th-flex res-presentation
 
Welcome ndm11
Welcome ndm11Welcome ndm11
Welcome ndm11
 
2011 agu-town hall-100g
2011 agu-town hall-100g2011 agu-town hall-100g
2011 agu-town hall-100g
 
Rdma presentation-kisti-v2
Rdma presentation-kisti-v2Rdma presentation-kisti-v2
Rdma presentation-kisti-v2
 
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CAAPM project meeting - June 13, 2012 - LBNL, Berkeley, CA
APM project meeting - June 13, 2012 - LBNL, Berkeley, CA
 

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Analyzing Data Movements and Identifying Techniques for Next-generation Networks

  • 1. Experiences with 100Gbps Network Applications Analyzing Data Movements and Identifying Techniques for Next-generation High-bandwidth Networks Mehmet Balman Computational Research Division Lawrence Berkeley National Laboratory
  • 2. Outline Climate Data as a typical science scenario •  100Gbps Climate100 demo MemzNet: Memory-mapped zero-copy Network Channel Advance Network Initiative (ANI) Testbed •  100Gbps Experiments Future Research Plans
  • 3. 100Gbps networking has finally arrived! Applications’ Perspective Increasing the bandwidth is not sufficient by itself; we need careful evaluation of high-bandwidth networks from the applications’ perspective. 1Gbps to 10Gbps transition (10 years ago) Application did not run 10 times faster because there was more bandwidth available
  • 4. The need for 100Gbps Modern science is Data driven and Collaborative in nature •  The largest collaborations are most likely to depend on distributed architectures. •  LHC (distributed architecture) data generation, distribution, and analysis. •  The volume of data produced by of genomic sequencers is rising exponentially. •  In climate science, researchers must analyze observational and simulation data located at facilities around the world
  • 5. ANI 100Gbps Demo Late 2011 – early 2012 •  100Gbps demo by ESnet and •  Visualization of remotely located data Internet2 (Cosmology) •  Application design issues and host tuning strategies to scale to •  Data movement of large datasets with 100Gbps rates many files (Climate analysis)
  • 6. ESG (Earth Systems Grid) •  Over 2,700 sites •  25,000 users •  IPCC Fifth Assessment Report (AR5) 2PB •  IPCC Forth Assessment Report (AR4) 35TB
  • 7. Data distribution for climate science How scientific data movement and analysis between geographically disparate supercomputing facilities can benefit from high-bandwidth networks? •  Local copies •  data files are copied into temporary storage in HPC centers for post- processing and further climate analysis.
  • 8. Climate Data Analysis Petabytes of data •  Data is usually in a remote location •  How can we access efficiently?
  • 10. Adaptive Tuning Worked well with large files !
  • 11. Climate Data over 100Gbps •  Data volume in climate applications is increasing exponentially. •  An important challenge in managing ever increasing data sizes in climate science is the large variance in file sizes. •  Climate simulation data consists of a mix of relatively small and large files with irregular file size distribution in each dataset. • Many small files
  • 12. lots-of-small-files problem! file-centric tools? request request a file data send file send data request a file send file RPC FTP •  Concurrent transfers •  Parallel streams
  • 13. Moving climate files efficiently
  • 14. MemzNet: memory-mapped zero-copy network channel Front-­‐end   Memory   network threads  (access   blocks Memory   to  memory   Front-­‐end   blocks blocks) threads   (access  to   memory   blocks) memory caches are logically mapped between client and server
  • 15. Advantages •  Decoupling I/O and network operations •  front-end (I/O processing) •  back-end (networking layer) •  Not limited by the characteristics of the file sizes •  On the fly tar approach, bundling and sending many files together •  Dynamic data channel management Can increase/decrease the parallelism level both in the network communication and I/O read/write operations, without closing and reopening the data channel connection (as is done in regular FTP variants).
  • 16. Advantages The synchronization of the memory cache is accomplished based on the tag header. •  Application processes interact with the memory blocks. •  Enables out-of-order and asynchronous send receive MemzNet is is not file-centric. Bookkeeping information is embedded inside each block. •  Can increase/decrease the number of parallel streams without closing and reopening the data channel.
  • 17. 100Gbps demo environment RRT: Seattle – NERSC 16ms NERSC – ANL 50ms NERSC – ORNL 64ms
  • 18. 100Gbps Demo •  CMIP3 data (35TB) from the GPFS filesystem at NERSC •  Block size 4MB •  Each block’s data section was aligned according to the system pagesize. •  1GB cache both at the client and the server •  At NERSC, 8 front-end threads on each host for reading data files in parallel. •  At ANL/ORNL, 4 front-end threads for processing received data blocks. •  4 parallel TCP streams (four back-end threads) were used for each host-to-host connection.
  • 20. Framework for the Memory-mapped Network Channel memory caches are logically mapped between client and server
  • 21. MemzNet’s Features Data files are aggregated and divided into simple blocks. Blocks are tagged and streamed over the network. Each data block’s tag includes information about the content inside. Decouples disk and network IO operations; so, read/write threads can work independently. Implements a memory cache managements system that is accessed in blocks. These memory blocks are logically mapped to the memory cache that resides in the remote site.
  • 22. Performance at SC11 demo GridFTP MemzNet TCP buffer size is set to 50MB
  • 23. Advance Network Initiative Esnets’s 100Gbps Testbed
  • 24. Performance results in the 100Gbps ANI testbed
  • 25. MemzNet’s Performance 100Gbps demo GridFTP MemzNet TCP buffer size is set to 50MB ANI Testbed
  • 26. Challenge? •  High-bandwidth brings new challenges! •  We need substantial amount of processing power and involvement of multiple cores to fill a 40Gbps or 100Gbps network •  Fine-tuning, both in network and application layers, to take advantage of the higher network capacity. •  Incremental improvement in current tools? •  We cannot expect every application to tune and improve every time we change the link technology or speed.
  • 27. MemzNet •  MemzNet: Memory-mapped Network Channel •  High-performance data movement MemzNet is an initial effort to put a new layer between the application and the transport layer. •  Main goal is to define a network channel so applications can directly use it without the burden of managing/tuning the network communication.
  • 28. Framework for the Memory-mapped Network Channel memory caches are logically mapped between client and server
  • 29. Advance Network Initiatieve Esnets’s 100Gbps Testbed
  • 30. Initial Tests ²  Many TCP sockets oversubscribe the network and cause performance degradation. ²  Host system performance could easily be the bottleneck. •  TCP/UDP buffer tuning, using jumbo frames, and interrupt coalescing. •  Multi-core systems: IRQ binding is now essential for maximizing host performance.
  • 31. Many Concurrent Streams (a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).
  • 32. Effects of many concurrent streams ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min intervals], TCP buffer size is 50M
  • 33. Parallel Streams - 10Gbps ANI testbed 10Gbps connection: Interface traffic vs the number of parallel streams [1, 2, 4, 8, 16, 32 64 streams -5min intervals], TCP buffer size is set to 50MB
  • 34. Parallel Streams – 40Gbps ANI testbed 40Gbps (4x10NICs, single host): Interface traffic vs the number of parallel streams [1, 2, 4, 8, 16, 32 64 streams - 5min intervals], TCP buffer size is set to 50MB
  • 35. Parallel streams - 40Gbps Figure 5: ANI testbed 40Gbps (4x10NICs, single host): Throughput vs the number of parallel streams [1, 2, 4, 8, 16, 32 64 streams - 5min intervals], TCP buffer size is set to 50MB
  • 36. Host tuning With proper tuning, we achieved 98Gbps using only 3 sending hosts, 3 receiving hosts, 10 10GE NICS, and 10 TCP flows Image source: “Experiences with 100Gbps network applications” In Proceedings of the fifth international workshop on Data-Intensive Distributed Computing, 2012
  • 37. NIC/TCP Tuning •  We are using Myricom 10G NIC (100Gbps testbed) •  Download latest drive/firmware from vendor site •  Version of driver in RHEL/CentOS fairly old •  Enable MSI-X •  Increase txgueuelen /sbin/ifconfig eth2 txqueuelen 10000! •  Increase Interrupt coalescence! /usr/sbin/ethtool -C eth2 rx-usecs 100! •  TCP Tuning: net.core.rmem_max = 67108864! net.core.wmem_max = 67108864! net.core.netdev_max_backlog = 250000 From “Experiences with 100Gbps network applications” In Proceedings of the fifth international workshop on Data-Intensive Distributed Computing, 2012
  • 38. 100Gbps = It’s full of frames ! •  Problem: •  Interrupts are very expensive •  Even with jumbo frames and driver optimization, there is still too many interrupts. •  Solution: •  Turn off Linux irqbalance (chkconfig irqbalance off) •  Use /proc/interrupt to get the list of interrupts •  Dedicate an entire processor core for each 10G interface •  Use /proc/irq/<irq-number>/smp_affinity to bind rx/tx queues to a specific core. From “Experiences with 100Gbps network applications” In Proceedings of the fifth international workshop on Data-Intensive Distributed Computing, 2012
  • 39. Host Tuning Results 45 40 35 30 25 Gbps 20 without tuning 15 with tuning 10 5 0 Interrupt Interrupt IRQ Binding IRQ Binding coalescing coalescing (TCP) (UDP) (TCP) (UDP) Image source: “Experiences with 100Gbps network applications” In Proceedings of the fifth international workshop on Data-Intensive Distributed Computing, 2012
  • 40. Initial Tests ²  Many TCP sockets oversubscribe the network and cause performance degradation. ²  Host system performance could easily be the bottleneck. •  TCP/UDP buffer tuning, using jumbo frames, and interrupt coalescing. •  Multi-core systems: IRQ binding is now essential for maximizing host performance.
  • 41. End-to-end Data Movement •  Tuning parameters •  Host performance •  Multiple streams on the host systems •  Multiple NICs and multiple cores •  Effect of the application design
  • 43. MemzNet •  MemzNet: Memory-mapped Network Channel •  High-performance data movement MemzNet is an initial effort to put a new layer between the application and the transport layer. •  Main goal is to define a network channel so applications can directly use it without the burden of managing/tuning the network communication.
  • 44. Framework for the Memory-mapped Network Channel memory caches are logically mapped between client and server
  • 45. Related Work •  Luigi Rizzo ’s netmap •  proposes a new API to send/receive data over the network •  http://info.iet.unipi.it/~luigi/netmap/ •  RDMA programming model •  MemzNet can be used for RDMA-enabled transfers • memory-management middleware •  GridFTP popen + reordering ?
  • 46. Future Work •  Integrate MemzNet with scientific data formats for remote data analysis •  Provide a user library for filtering and subsetting of data.
  • 47. Testbed Results http://www.es.net/RandD/100g-testbed/ •  Proposal process •  Testbed Description, etc. •  http://www.es.net/RandD/100g-testbed/results/ •  RoCE (RDMA over Ethernet) tests •  RDMA implementation (RFPT100) •  100Gbps demo paper •  Efficient Data Transfer Protocols •  MemzNet (Memory-mapped Zero-copy Network Channel)
  • 48. Acknowledgements Collaborators: Eric Pouyoul, Yushu Yao, E. Wes Bethel Burlen Loring, Prabhat, John Shalf, Alex Sim, Arie Shoshani, and Brian L. Tierney Special Thanks: Peter Nugent, Zarija Lukic , Patrick Dorn, Evangelos Chaniotakis, John Christman, Chin Guok, Chris Tracy, Lauren Rotman, Jason Lee, Shane Canon, Tina Declerck, Cary Whitney, Ed Holohan, Adam Scovel, Linda Winkler, Jason Hill, Doug Fuller, Susan Hicks, Hank Childs, Mark Howison, Aaron Thomas, John Dugan, Gopal Vaswani
  • 49. NDM 2013 Workshop (tentative) The 3rd International Workshop on Network-aware Data Management to be held in conjunction with the EEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC’13) http://sdm.lbl.gov/ndm Editing a book (CRC press): NDM research frontiers (tentative)