Global Network Advancement Group Next Generation Network-Integrated Systems

Larry Smarr
Larry SmarrInstitute Director, Calit2 um California Institute for Telecommunications and Information Technology
Global Network Advancement Group
Next Generation Network-Integrated Systems
4th National Research Platform Workshop:
NRP, GRP, GNA-G Partnership
Harvey Newman, Caltech, GNA DIS WG Chair
February 9, 2023
Higgs Boson
Couples to Mass
Ten Year Anniversary of the Higgs Boson Discovery: July 4 2022
Search for Higgs
Boson Pairs
A portrait of the Higgs boson by the CMS experiment ten years after the discovery | Nature
3
10th Anniversary of the Higgs Boson Discovery
and Beyond; 75 Years of Exploration !
Advanced Networks Were Essential
to Higgs Discovery and Every Ph.D
Thesis; They will be Essential
to All Future Discoveries
•NOTE: ~95% of Data Still to be Taken
•Greater Intensity: Upgraded
detectors for more complex events
•To 5X Data Taking Rate in 2029-40
Englert
2013 Nobel Prize
Higgs
48 Year Search; 75 Year Exploration
Theory (1964): 1950s – 1970s;
LHC + Experiments Concept: 1984
Construction: 2001; Operation: 2009
Run1: Higgs Boson Discovery 2012
Run2 and Going Forward:
Precision Measurements and BSM
Exploration: 2013 - 2042

Global Data Flow: A Worldwide System Conceived
by Caltech and UCSD. First Tier2 in 1999-2001
Tier 1
Tier2 Center
Online System
CERN Center
100 PBs of Disk;
Tape Robot
Chicago
Institute
Institute
Institute
Institute
Workstations
~1000
MBytes/sec
10 to 100 Gbps
Physics data
cache
~PByte/sec
N X100 Gbps
Tier2 Center
Tier2 Center
Tier2 Center
10 to 100 Gbps
Tier 0
Tier 3
Tier 4
Tier 2
Experiment
London Paris Taipei
CACR
13 Tier1, 170 Tier2
+ 300 Tier3 Centers:
1k to N X 10k Cores
 Tier0: Real Time Data Processing
Tier1 and Tier2: Data and Simulation
Production Processing
Tier2 and Tier3: Data Analysis
Increased Use as a Cloud Resource (Any Job Anywhere)
Increased “Elastic” Use of Additional HPC and Cloud Resources
A Global Dynamic System: Fertile Ground for Control with ML
Bologna
Core of LHC Networking LHCOPN,
LHCONE, GEANT, ESnet, Internet2, CENIC…
+ NRENs in Europe, Asia, Latin America, Au/NZ; US State Networks
LHCOPN: Simple & Reliable
Tier0+1 Ops GEANT
Internet2
ESnet (with EEX) CENIC and NRP
LHCONE VRF: 170 Tier2s ++
+ NRENs in Europe, Asia, Latin America, Au/NZ; US State Networks
LHCOPN: Simple & Reliable
Tier0+1 Ops GEANT
Internet2
ESnet (with EEX) CENIC and NRP
LHCONE VRF: 170 Tier2s
CENIC and Pacific Wave: A “Regional” Network for California
with Global Reach and Vision East and West
WLCG Transfers Dashboard: 3/2017 to 3/2022
HL LHC “10%” Data Challenge in October 2021 exceeded 100 Gbytes/sec for >1 Day
The “30%” Challenge Scheduled for 2024 will be a real challenge
General WLCG Traffic is again at or above the pre-pandemic level
https://monit-grafana.cern.ch/d/AfdonIvGk/wlcg-transfers?orgId=20
Towards a Computing Model for the HL LHC Era
Challenges: Capacity in the Core and at the Edges
 Programs such as the LHC have experienced rapid exponential traffic growth,
at the level of 40-60% per year
 At the January 2020 LHCONE/LHCOPN meeting at CERN, CMS and ATLAS
expressed the need for Terabit/sec links on major routes,
by the start of the HL-LHC in ~2029
 This is projected to outstrip the affordable capacity
 The needs are further specified in “blueprint” Requirements documents
by US CMS and US ATLAS, submitted to the ESnet Requirements Review in
August 2020, and captured in a comprehensive DOE Requirements Review
report for HEP: https://escholarship.org/uc/item/78j3c9v4
 Three areas of particular capacity-concern by 2028-9 were identified:
(1) Exceeding the capacity across oceans, notably the Atlantic,
served by the Advanced North Atlantic (ANA) network consortium
(2) Tier2 centers at universities requiring 100G 24 X 7 X 365 average
throughput with sustained 400G bursts (a petabyte in a shift), and
(3) Terabit/sec links to labs and HPC centers (and edge systems)
to support multi-petabyte transactions in hours rather than days 8
Mission: To Support global research and education using the
technology, infrastructures and investments of its participants
The GNA-G exists to bring together researchers, National Research and Education
Networks (NRENs), Global eXchange Point (GXP) operators, regionals and other R&E
providers, in developing a common global infrastructure to support the needs.
https://www.gna-g.net/
GREN: Collaboration on the intercontinental transmission layer
Telemetry
Data Intensive
Science
AutoGOLE/SENSE
GREN Mapping
GXP Architectures & Services
Connecting Offshore
Students
Research & Development Operational
Link consortia
APR ANA …
AER AmLight
GNA
Architecture
v2.0
GNA-G Leadership Team
Routing Anomalies Network Automation
GNA-G Executive Liaison
GNA-G participant CEOs etc
Global NREN CEO Forum
Securing the GREN
Strategic Security Focus Group
GNA-G overview
and Vision
The GNA-G Data Intensive Sciences WG
 Principal aims of the GNA-G DIS WG:
(1) To meet the needs and address the challenges
faced by major data intensive science programs
 In a manner consistent and compatible with support for the needs of
individuals and smaller groups in the at large A&R communities
(2) To provide a forum for discussion, a framework and shared tools for short
and longer term developments meeting the program and group needs
 To develop a persistent global testbed as a platform, to foster
ongoing developments among the science and network communities
 While sharing and advancing the (new) concepts, tools & systems needed
 Members of the WG partner in joint deployments and/or developments of
generally useful tools and systems that help operate and manage R&E
networks with limited resources across national and regional boundaries
 A special focus of the group is to address the growing demand for
 Network-integrated workflows
 Comprehensive cross-institution data management
 Automation, and
 Federated infrastructures encompassing networking, compute, and storage
 Working Closely with the AutoGOLE/SENSE WG, NRP and GRP
1
1
Charter: https://www.dropbox.com/s/4my5mjl8xd8a3y9/GNA-G_DataIntensiveSciencesWGCharter.docx?dl=0
The GNA-G Data Intensive Sciences WG
 Mission: Meet the challenges of globally distributed data and computation
faced by the major science programs
 Coordinate provisioning the feasible capacity across a global footprint,
and enable best use of the infrastructure:
 While meeting the needs of the participating groups, large and small
 In a manner Compatible and Consistent with use by the at-large A&R communities
 Members:
Alberto Santoro, Azher Mughal, Bijan Jabbari, Brian Yang, Buseung Cho, Caio Costa, Carlos Antonio
Ruggiero, Carlyn Ann-Lee, Chin Guok, Ciprian Popoviciu, Dale Carder, David Lange, David Wilde, Dima
Mishin, Edoardo Martelli, Eduardo Revoredo, Eli Dart, Eoin Kenney, Frank Wuerthwein, Frederic Loui,
Harvey Newman, Heidi Morgan, Iara Machado, Inder Monga, Jeferson Souza, Jensen Zhang, Jeonghoon
Moon, Jeronimo Bezerra, Jerry Sobieski, Joao Eduardo Ferreira, Joe Mambretti, John Graham, John Hess,
John Macauley, Julio Ibarra, Justas Balcas, Kai Gao, Karl Newell, Kaushik De, Kevin Sale, Lars Fischer,
Liang Zhang, Mahdi Solemani, Maria Del Carmen Misa Moreira, Marcos Schwarz, Mariam Kiran, Matt
Zekauskas, Michael Stanton, Mike Hildreth, Mike Simpson, Ney Lemke, Phil Demar, Raimondas Sirvinskas,
Richard Hughes-Jones, Rogerio Iope, Sergio Novaes, Shawn McKee, Siju Mammen, Susanne Naegele-
Jackson, Tom de Fanti, Tom Hutton, Tom Lehman, William Johnston, Xi Yang, Y. Richard Yang
 Participating Organizations/Projects:
 ESnet, Nordunet, SURFnet, AARNet, AmLight, KISTI, SANReN, GEANT, RNP, CERN, Internet2,
CENIC/Pacific Wave, StarLight, NetherLight, Southern Light, Pacific Research Platform,
FABRIC, RENATER, ATLAS, CMS, VRO, SKAO, OSG, Caltech, UCSD, Yale, FIU, UERJ,
GridUNESP, Fermilab, Michigan, UT Arlington, George Mason, East Carolina, KAUST
 Working Closely with the AutoGOLE/SENSE WG
 Meets Weekly or Bi-weekly; All are welcome to join.
12
Charter: https://www.dropbox.com/s/4my5mjl8xd8a3y9/GNA-G_DataIntensiveSciencesWGCharter.docx?dl=0
AutoGOLE / SENSE Working Group
 Worldwide collaboration of open exchange points and R&E networks
interconnected to deliver network services end-to-end in a fully
automated way. NSI for network connections, SENSE for integration
of End Systems and Domain Science Workflow facing APIs.
 Key Objective:
 The AutoGOLE Infrastructure should be persistent and reliable,
to allow most of the time to be spent on experiments and research.
 Key Work areas:
 Control Plane Monitoring:
Prometheus based, deployment now underway
 Data Plane Verification and Troubleshooting Service:
Study and design group formed
 AutoGOLE related software: Ongoing enhancements to facilitate
deployment and maintenance (Kubernetes, Docker based systems)
 Experiment, Research, Multiple Activity, Use Case support:
Including NOTED, Gradient Graph, P4 Topologies, Named Data
Networking (NDN), Data Transfer Systems integration and testing.
 WG information
https://www.gna-g.net/join-working-group/autogole-sense
14
Towards a Next Generation
Network-Integrated System
For Data Intensive Sciences
The Global Network
Advancement Group
The AutoGOLE/SENSE and
Data Intensive Sciences WGs
Partnering with GRP and NRP
GNA-G AutoGOLE/SENSE WG Global Persistent Testbed
Software Driven Network OSes; Programmability at the Edges and in the Core
P4 + SONIC Programmable Global PersistentTestbed
19 Active GNA-G/RARE P4, SONIC Testbed Sites
• Caltech, Pasadena-US: 4 FreeRtr/P4 + SONIC
• CERN, Geneva-CH: FreeRtr/P4
• FIU, Miami-US: FreeRtr/P4
• GEANT, Amsterdam-NL: FreeRtr/P4
• GEANT, Budapest-HU: FreeRtr/P4
• GEANT, Frankfurt-DE: FreeRtr/P4
• GEANT, Paris-FR: FreeRtr/DPDK
• GEANT, Poznan-PL: FreeRtr/P4
• GEANT, Prague-CZ: FreeRtr/DPDK
• RENATER, Paris-FR: FreeRtr/P4
• SouthernLight (FIU/RedClara/Rednesp/RNP),
São Paulo-BR: FreeRtr/P4
• StarLight, Chicago-US: FreeRtr/P4
• SWITCH, Geneva-CH: FreeRtr/P4
• TCD, Dublin-IE: FreeRtr/P4
• Tennessee Tech: FreeRtr/P4
+ 9 Expected sites (by SC22):
• HEAnet, Dublin-IE: FreeRtr/P4
• JISC, London-UK: FreeRtr/P4
• KAUST, Saudi Arabia: FreeRtr/DPDK
• KISTI, South Korea: SONiC/P4
• RNP, Rio de Janeiro-BR: FreeRtr/P4,
SONiC/P4
• SC22 Caltech Booth, Dallas-US:
FreeRtr/P4
• UCSD, San Diego-US: SONiC/P4
• UFES, Vitória-BR: 2x FreeRtr/P4
• UMd, College Park, Maryland-US:
FreeRtr/P4
GEANT (Now Global) P4 Lab
and Dataplane of P4 Programmable Switches
A new worldwide platform for
 New agile and flexible feature
development
 New use case development
corresponding to research
programs’ requirements
 Multiple research network
overlay slices
A global playing field for
 Next generation network
monitoring tools and systems
 Development of fully automated
network deployment
 New network operations paradigm
development
VISION: Federate and integrate multiple testbeds and toolsets
such as AutoGOLE / SENSE
Frederic Loui,
Casaba Mate,
Marcos Schwarz
et al.
SC15-22: SDN Next Generation
Terabit/sec Ecosystem for Exascale Science
SC16+: Consistent
Operations with
Agile Feedback
Major Science
Flow Classes
Up to High Water
Marks
supercomputing.caltech.edu
Tbps Rings for SC18-22: Caltech, Ciena, Scinet,
StarLight + Many HEP, Network, Vendor Partners
45
SDN-driven flow
steering, load
balancing, site
orchestration
Over Terabit/sec
Global Networks
LHC at SC15: Asynchronous Stageout
(ASO) with Caltech’s SDN Controller
Preview PetaByte
Transfers to/
from Site Edges of
Exascale Facilities
With
100G -1000G DTNs
Global Petascale to Exascale Workflows for Data Intensive
Sciences Accelerated by Next Generation Programmable
Network Architectures and Machine Learning Applications
 A Vast Partnership of Science and Computer Science Teams, R&E Networks
and R&D Projects
 Convened by the GNA-G Data Intensive Sciences WG
 Mission
 Meeting the challenges faced by leading edge data intensive experimental
programs in high energy physics, astrophysics, genomics and other fields
of data intensive science
 Clearing the path to the next round of discoveries
 Demonstrating a wide range of the latest advances in:
 Software defined and Terabit/sec networks
 Intelligent global operations and monitoring systems
 Workflow optimization methodologies with real time analytics
 State of the art long distance data transfer methods and tools,
local and metro optical networks and server designs
 Emerging technologies and concepts in programmable networks
and global-scale distributed system
Network Research Exhibition NRE-19. Abstract:
https://www.dropbox.com/s/qcm41g7f7etjxvy/SC22_NRE_GlobalPetascaleWorkflows_V8072022.docx?dl=0
Global Petascale to Exascale Workflows
Cornerstone system components and concepts (I)
 Integrated operations and orchestrated management of resources:
interworking with and advancing the site (Site-RM) and network resource
managers (Network-RM) developed in the SENSE program.
 An ontological model-driven framework with integration of an analytics engine,
API and workflow orchestrator extending work in the SENSE project, enhanced
by efficient multi-domain resource state abstractions and discovery mechanisms.
 Fine-grained end-to-end monitoring, data collection and analytics, supporting
applications with path selection + load balancing mechanisms optimized
by machine learning.
 The integration of Qualcomm Technology’s GradientGraph (G2) with
5G / OpenALTO / SRv6. These will be used to demonstrate how applications
including XR, auto/V2X, and IoT can benefit from the intelligent routing,
rate limiting, and service placement decisions computed by G2.
 Demonstrating how the integration of G2 with science networks, through the
integration of the LHC Rucio / FTS data management system, AutoGOLE / SENSE
virtual circuit and orchestration services / and OpenALTO monitoring may be used
towards optimizing large-scale data transfers, sharing data among CERN and
scientists at sites across the globe.
Global Petascale to Exascale Workflows
for Data Intensive Sciences
 Advances Embedded and Interoperate within a ‘composable’
architecture of subsystems, components and interfaces,
organized into several areas:
 Visibility: Monitoring and information tracking and management including
IETF ALTO/OpenALTO, BGP-LS, sFlow/NetFlow, Perfsonar, Traceroute,
Qualcomm Gradient Graph congestion information, Kubernetes statistics,
Prometheus, P4/Inband telemetry
 Intelligence: Stateful decisions using composable metrics (policy, priority,
network- and site-state, SLA constraints, responses to ‘events’ at sites and
in the networks, ...), using NetPredict, Hecate, RL-G2, Yale Bilevel
optimization, Coral, Elastiflow/Elastic Stack
 Controllability: SENSE/OpenNSA/AutoGOLE, P4/PINS, segment routing
with SRv6 and/or PolKA, BGP/PCEP
 Network OSes and Tools: GEANT RARE/freeRtr, SONIC, Calico VPP,
Bstruct-Mininet environment, ...
 Orchestration: SENSE, Kubernetes (+k8s namespace), dedicated code and
APIs for interoperation and progressive integration
Caltech, StarLight, NRL and Partners at SC22
Microcosm: Creating the Future of SCinet and of Networks for Science
GNA-G and its Working Groups
Worldwide Partnerships at SC22 and Beyond
Global Petascale to Exascale Workflows
Elements and Goals of the Demonstrations (Cont’d)
 400 Gbps Next Generation Networks
 High-Performance Routing of Science Network Traffic
 Creation of an overlay network with PolKA tunnels forming virtual circuits
 5G/Edge Computing Application Performance Optimization
 Integration of OpenALTO, SRv6 and Qualcomm Technologies’
GradientGraph
 N-DISE: Named Data Networking for Data Intensive Experiments
 Coral: Scalable Data Plane Checking via Distributed, On-Device Verification
 High performance networking with the Bella Link and the Sao Paulo
Backbone
 AmLight Express and Protect (AmLight-EXP)
 Global Research Platform (GRP)
 Software Defined International Open Exchanges (SDXs)
For further Information on NRE-19 Elements and Goals: See the Next Slides
NSF CC* Program: Integration and Innovation
SCIENTIFIC and BROADER IMPACT
▪ Lay groundwork for an NDN-based
data distribution and access system
for data-intensive science fields
▪ Benefit user community through
lowered costs, faster data access
and standardized naming structures
▪ Engage next generation of scientists
in emerging concepts of future
Internet architectures for
data intensive applications
▪ Advance, extend and test the NDN
paradigm to encompass the most
data intensive research applications
of global extent
SOLUTIONS + Deliverables
CHALLENGES
▪ LHC program in HEP is world’s largest
data intensive application: handling
One Exabyte by 2019 at hundreds
of sites
▪ Global data distribution, processing,
access, analysis; large but limited
computing, storage, network
resources
APPROACH
▪ Use Named Data Networking (NDN)
to redesign LHC HEP network;
optimize workflow
N-DISE: NDN for Data Intensive Science Experiments
TEAM
▪ Northeastern (PI: Edmund Yeh),
Caltech (PI: Harvey Newman) UCLA
(PI: Jason Cong), Tennessee Tech
(PI: Susmit Shannigrahi)
▪ In partnership with NIST, other LHC
sites and the NDN project team
▪ Simultaneously optimize caching
(“hot” datasets), forwarding, and
congestion control in both the
network core and site edges
▪ Development of naming scheme and
attributes for fast access and efficient
communication in HEP and other fields
 Feasibility of NDN-based data distribution system for LHC first
demonstrated at SC 18 (Dallas)
 SANDIE demo at SC 19 (Denver) showed improved throughput
and delay performance
 N-DISE demo at SC 21 (done remotely) showed further improvements, ICN
‘22 paper introduced the project to the literature
 SC22
 Jointly optimized caching and forwarding algorithms developed by
Northeastern
 High-speed NDN-DPDK forwarder developed by NIST
 NDN-based filesystem plugin for XRootD developed by Caltech
 FPGA acceleration of NDN-DPDK forwarder by UCLA
 Deploying genomics data lakes using K8s over NDN and name-based
compute placement to K8 clusters by Tennessee Tech
 Goals: (i) rapid and reliable delivery of LHC data over transcontinental
layer-2 demo testbed, (ii) demonstration of Kubernetes integration
for genomics applications
N-DISE
Caltech
Campus
A New Generation Persistent 400G/100G Super-DMZ: CENIC,
Pacific Wave, ESnet, Internet2, Caltech, UCSD, StarLight ++
SEA
100GE
SNVL
SDSC
LA
Pacific Wave
Seattle 1-1
Pacific Wave
Sunnyvale 2-1
NCS/Ciena
Transponders
LAX Agg10
Riser Panel
NCS/Ciena
Transponders
Riser Panel
Caltech IMSS
Waveserver Ai
200G
200G
Caltech IMSS
Waveserver Ai
CENIC LA
818 W 7th
6th Floor
CENIC LA
818 W 7th
10th Floor
Cisco NCS 1K4
Arista 7060 DX4
400GE & 100GE
Internet2
818 W 7th
10th Floor
Starlight
CENIC/
Pacific
Wave
I2 Optical
I2 Packet
To StarLight To SC22 Dallas
400G
400G
ESnet6 Fabric
ESnet/Fabric
818 W 7th
10th Floor
Pending Connections
to CENIC/PacWave
Pacific Wave
LosA 2-1
and Beyond
To Juniper QFX
after SC22
      Global Network Advancement GroupNext Generation Network-Integrated Systems
Next Generation Network-Integrated System
 Top Line Message: In order to meet the needs, we need to transition
to a new dynamic and adaptive software-driven system, which
 Coordinates worldwide networks as a first class resource
along with computing and storage, across multiple domains
 Simultaneously supports the LHC experiments, other major DIS programs
and the larger worldwide academic and research community
 Systems design approach: A global dynamic fabric that flexibly
allocates, balances and conserves the available network resources
Negotiating with site systems that aim to accelerate workflow; Use of ML
Builds on ongoing R&D projects: from regional caches/data lakes to
intelligent control and data planes to ML-based optimization
[E.g. SENSE/AutoGOLE, NOTED, ESNet HT, GEANT/RARE, AmLight, Fabric,
Bridges; NetPredict, DeepRoute, ALTO, PolKA ...]
 A key milestone: integration of SENSE + network services with FTS & Rucio
 We are also leveraging the worldwide move towards a fully programmable
ecosystem of networks and end-systems (P4, PINS; SRv6; PolKA),
plus operations platforms (OSG, NRP; global SENSE and P4 Testbeds)
 The GNA-G and its Working Groups, The LHC experiments together with the
WLCG and the worldwide R&E network community, are the key players
 Together with leaders and teams from: LBNF/DUNE, VRO, SKA
Global Petascale to Exascale Workflows
for Data Intensive Sciences
 Development Trajectory: Parallel developments + mission-driven
progressive interfacing and system-level integration
 Architecture: Data Center Analogue
 Classes of “Work” (work = transfers, or overall workflow),
defined by task parameters and/or priority and policy
 Adjust rate of progress in each class to respond to network
or site state changes, and “events”
 Moderate/balance the rates among the classes to optimize
a multivariate objective function with constraints
 Overarching Concept: Consistent Network Operations:
 Stable load balanced high throughput workflows cross optimally
chosen network paths
 Provided by autonomous site-resident services dynamically
interacting with network-resident services
 Up to preset or fliexible high water marks to accommodate
other traffic
 Responding to (or negotiating with) site demands from the science
programs’ principal data distribution and management systems
31
Towards a Next Generation
Network-Integrated System
For Data Intensive Sciences
The Global Network
Advancement Group
The AutoGOLE/SENSE and
Data Intensive Sciences WGs
Partnering with GRP and NRP
Extra Slides Follow
32
Steps to Arrive at a Fully Functional System by 2028
the Data Challenge Perspective
 Three Types of Challenges
1. Functionality Challenge : Where we establish the functionality we
want in our software stack, and do so incrementally over time
2. Software Scalability Challenge: Where we take the products that
passed the previous challenge, and exercise them at full scale
but not on the final hardware infrastructure
3. End-to-end Systems Challenge: On the actual hardware; can only
be done once the actual hardware systems are in place.
 Targets are 2022-24 for 1 and 2 (or 2024-25 if not all
components are ready earlier); 2026-2027 for 3
 Remark: it’s conceivable, maybe even likely that it takes multiple
attempts to achieve sustained performance at scale with all of the
new software we need, with the functionality we want.
 + Scaling Challenges: Demonstrate capability to fill ~50% full bandwidth
required in the minimal scenario with production-like traffic: Storage to
storage, using third party copy protocols and data management services used
in production: 2021: 10%; 2023-4: 30%; 2025-6: 60%; 2028: 100%
33
F. Wuerthwein, UCSD/SDSC
Rednesp and RNP: Expanding Capacity
Among Latin America, US, Europe and Africa
 Rednesp (formerly ANSP) has 3 links to the
USA, connecting Sao Paulo to Florida.
 Atlantic 100G link: direct São Paulo- Miami
 Pacific 100G link: São Paulo to Santiago
(Chile),
from there to Panama, San Juan and Miami.
 200 Gbps link: São Paulo to Florida through
Fortaleza, on the Monet cable (Angola
cables).
 RNP also a 200 Gbps link using the Monet
cable:
 Total 600 Gbps capacity between Brazil and the
USA.
 Transatlantic Links: The Bella link between São
Paulo and Sines in Portugal, and a link
connecting São Paulo, Angola and South Africa.
 The deployment of the "Backbone SP" has just
started.
Its main goal is to interconnect most
universities in Sao Paulo (state) with 100 gbps
links.
• The first link connects SP4 to UNIFESP (Federal
University at São Paulo). Next, links should
connect UNESP (State University of São Paulo),
USP (University of São Paulo), UNICAMP
BRIDGES: “Binding Research Infrastructures for the Deployment
of Global Experimental Science” Across the Atlantic
 George Mason and East Carolina Universities (NSF IRNC Program)
 High Level View of the Infrastructure and Collaborators:
 Virtualization as an Architectural (systems-level) concept, targeting:
 Deterministic, predictable performance
 Agile, customizable, integrated virtual resource model
 Operates as a “fully virtualized” services environment: Users (and Operations personnel)
can interact with the BRIDGES services through an interactive web portal, or software agents
may interact via a programmatic API to automate resource provisioning and orchestration.
(APONet): Closing the Global Ring East and West
KREONet2: Closing the Global Ring East and West
BRIDGES: “Binding Research Infrastructures for the Deployment
of Global Experimental Science” Across the Atlantic
AER Update - Aug 2022
● Since the AER MoU, KAUST
is coordinating with REN
partners deployment of
sharing spare
capacity
● KAUST is supporting the
following partners by offering
point-to-point circuits for
submarine cable backup
paths:
○ AARnet
○ GÉANT
○ NetherLight
○ NII/SINET
○ SingAREN
● The SC22 NRE Demonstrations
will also be supported by KAUST
closing the ring from Amsterdam
to Singapore and back to the US
○ SC22 NRE
Seattle
Los
Angeles
Chicag
o
SC22 NRE Demos (Caltech)
KAUST and the AsiaPacific Oceana Network
(APONet): Closing the Global Ring East and West
39
Gua
m
SingAR
EN
Hong
Kong
Daejeo
n Los
Angeles
Seattl
e
Chicag
o
Amsterda
m
Londo
n
Hawa
ii
Jedda
h
Toky
o
KAU
ST
Singapo
re
Cenic
KISTI
StarLigh
t
NII/SINE
T
NetherLig
ht
GÉANT
Open
Exchange
SC22 Caltech NRE Demonstrations
* The dotted paths in the diagram are the ones to be used for
the SC22
TransP
AC
KAU
ST
SingARE
N
NII /
SINET
KISTI
Univ. of
Hawaii
KAUST SC22 Demonstrations (NRE-19)
SC22 NREs: AutoGOLE / SENSE and NOTED (Draft)
Bill Johnston 9/23/22
SC21: Global footprint. Multiple 400G Optical Wide Area Links
SC22: Global footprint. Terabit/sec Triangle Starlight – McLean – Dallas; 400G to LA;
4 X 100G to Caltech and UCSD/SDSC; 3 X 400G to the Caltech Booth
Partner NREs at SC22
NRE-001
Edmund
Yeh
eyeh@ece.neu.edu N-DISE: NDN for Data Intensive Science Experiments
NRE-004
Joe
Mambretti
j-mambretti@northwestern.edu
1.2 Tbps Services WAN Services: Architecture,
Technology and Control Systems
NRE-005
Joe
Mambretti
j-mambretti@northwestern.edu
400 Gbps E2E WAN Services: Architecture, Technology
and Control Systems
NRE-007
Edoardo
Martelli
edoardo.martelli@cern.ch LHC Networking And NOTED
NRE-008
Joe
Mambretti
j-mambretti@northwestern.edu
IRNC Software Defined Exchange (SDX) Multi-Services
for Petascale Science
NRE-009 Jim Chen jim-chen@northwestern.edu
High Speed Network with International P4 Experimental
Networks for The Global Research Platform
and Other Research Platforms
NRE-010
Magnos
Martinello
magnos.martinello@ufes.br
Demonstrating PolKA Routing Approach to Support
Traffic Engineering for Data-intensive Science
NRE-011 Qiao Xiang xiangq27@gmail.com
Coral: Fast Data Plane Verification for Large-Scale
Science Networks via Distributed, On-Device Verification
NRE-013
Tom
Lehman
tlehman@es.net
AutoGOLE/SENSE: End-to-End Network Services
and Workflow Integration
NRE-015
Tom
Lehman
tlehman@es.net SENSE and Rucio/FTS/XRootD Interoperation
NRE-016
Marcos
Schwarz
marcos.schwarz@rnp.br
Programmable Networking with P4, GEANT RARE/freeRtr and
SONIC/PINS
The NRE demonstrations hosted at or partnering with the Caltech Booth 2820:
Global Petascale to Exascale Workflows
Cornerstone system components and concepts (II)
 Adapting NDN for data intensive sciences including advanced cache design and
algorithms and parallel code development and methods for fast and efficient access
over a global testbed, leveraging the experience in the NDN for Data Intensive Science
Experiments (N-DISE) project
 Demonstrating a paragon network at many sites equipped with P4 programmable
devices, including Tofino and Tofino2-based switches, Smart NICs and Xilinx FPGA-
based network interfaces providing packet-by- packet inspection, agile flow
management and state tracking, and real-time decisions
● High throughput platform demonstrations in support of workflows for the science
programs. Including reference designs of NVMe server systems to match a 400G
network core, as well as servers with multi-GPUs and programmable smart NICs
with FPGAs.
● Integration of edge-focused extreme telemetry data (from P4 switches, smart edge
devices and end hosts) and end facility/application caching stats and other metrics,
to facilitate automated decision-making processes.
● Development of dynamic regional caches or “data lakes” that treat nearby as
a unified data resource, building on the successful petabyte cache currently in
operation between Caltech and UCSD and in ESNet based on the XRootD federated
access protocol; extension of the cache concept to more remote sites such as
Fermilab, Nebraska and Vanderbilt.
Global Petascale to Exascale Workflows
Cornerstone system components and concepts (III)
● Creation of an overlay network with PolKA tunnels forming virtual circuits,
integrating persistent resources from the GNA-G AutoGOLE/SENSE and
GEANT RARE testbeds, to validate the PolKA source routing mechanisms
using 100G+ transfers of science data.
● Underlay congestion will be detected by tunnel monitoring and signaled
to the overlay so that the traffic is steered away from congested tunnels
to other paths.
● Comparisons between segment routing and PolKA regarding controllability and
performance metrics are also planned. From the application's point
of view.
● PolKA full deployment enables the extreme traffic engineering demands of data-
intensive sciences to be met, by offering a new range of network functionalities
such as: multipath routing, in-network telemetry and
proof-of-transit with path attributes to support higher level stateful traffic
engineering decisions.
● Network traffic prediction and engineering optimizations using the latest
graph neural network and other emerging deep learning methods, developed
by ESnet’s Hecate /DeepRoute project.
 LHC:
 End to end workflows for large scale data distribution and analysis in support of the CMS
experiment’s LHC workflow among Caltech, UCSD, LBL + Fermilab, Nebraska, Vanderbilt,
and GridUNESP (Sao Paulo) including automated flow steering, negotiation and DTN
autoconfiguration; bursting of some of these workflows to the NERSC HPC facility and the
cloud; use of both edge and in-network caches to increase data access and processing
efficiency.
 An ontological model-driven framework with integration of an analytics engine, API and
workflow orchestrator extending work in the SENSE project, enhanced by efficient multi-
domain resource state abstractions and discovery mechanisms.
 SENSE/AutoGOLE:
 The GNA-G SENSE/AutoGOLE WG demonstration will present key technologies, methods
and a system of dynamic Layer 2 and Layer 3 virtual circuit services to meet the challenges
and address the requirements of the largest data intensive science programs, including the
Large Hadron Collider (LHC) the Vera Rubin Observatory and programs challenges and
programs in many other disciplines. The services are designed to support multiple petabyte
transactions across a global footprint, represented by a persistent testbed spanning the US,
Europe, Asia Pacific and Latin American regions.
 Global Ring and KAUST: These demonstrations will also showcase the power of
collaboration in the global research and education network (REN) community. The demo will
have KAUST as a collaborator in the Asia-Pacific Global Ring (AER [*]), closing the global ring
by interconnecting Amsterdam to Singapore, and then onto Los Angeles with support from
partners including SingaREN, JGN-X and APOnet.
 [*] See Fast and stable Network Ring between Asia-Pacific and Europe | AARNet
https://www.aarnet.edu.au/fast-and-stable-network-ring-between-asia-pacific-and-europe/
Global Petascale to Exascale Workflows
Elements and Goals of the Demonstrations (I)
Global Petascale to Exascale Workflows
Elements and Goals of the Demonstrations (II)
 400 Gbps Next Generation Networks
 Science research collaborations continuing to prepare for increased network capacity, e.g.,
transitioning from 100 Gbps paths to 400 Gbps, 800 Gbps and beyond. An international
collaboration led by the StarLight/ Caltech/ UCSD/ NRP/GRP/SCinet consortium, with many
will demonstrate the architecture, services and techniques to enable efficient management of
high volume science data streams within multi-hundred Gbps channels.
 Global Research Platform (GRP):
 An international collaboration will demonstrate services and capabilities of the prototype
Global Research Platform, an international Science DMZ, optimized for large scale world-wide
science projects, especially those based on instruments producing high volumes of data.
 Software Defined Exchanges (SDXs):
 Several international open exchanges that are developing capabilities for Software Defined
Exchanges (SDXs) will showcase the capabilities of these facilities for providing highly
programmable capabilities for international science data transport.
 AmLight Express and Protect (AmLight-EXP)
 In support of the SENSE/AutoGOLE demonstrations and LHC-related use cases will be
shown, in association with high-throughput low latency experiments, and demonstrations of
auto-recovery from network events, using optical spectrum on the Monet submarine cable,
and its 100G ring network that interconnects the research and education communities in the
U.S. and South America.
Global Petascale to Exascale Workflows
Elements and Goals of the Demonstrations (III)
 5G/Edge Computing Application Performance Optimization
 UCSD, the Pacific Research Platform, Caltech, Yale and Qualcomm Technologies will attempt to demonstrate
the first end-to-end integrated traffic optimization framework based on bottleneck structure analysis for 5G
applications. This intended demonstration will focus on showing the integration of Qualcomm technology’s
GradientGraph with the IETF ALTO open standard, to support the optimization of edge computing
applications such as XR, holographic telepresence, and vehicle networks. Holodecks at UCSD and NYU will
be interconnected across the CENIC and NYSERNet regional networks via a transcontinental link.
 High-Performance Routing of Science Network Traffic
 UCSD, the Pacific Research Platform, Caltech, Yale and Qualcomm Technologies will attempt to demonstrate
the first end-to-end integrated traffic optimization framework based on bottleneck structure analysis for 5G
applications. This intended demonstration will focus on showing the integration of Qualcomm technology’s
GradientGraph with the IETF ALTO open standard, to support the optimization of edge computing
applications such as XR, holographic telepresence, and vehicle networks. Holodecks at UCSD and NYU
will be interconnected across the CENIC and NYSERNet regional networks via a transcontinental link.
 Integration of OpenALTO, SRv6 and Qualcomm Technologies’ GradientGraph
 We intend to demonstrate the integration of OpenALTO, SRv6 and Qualcomm Technologies’ GradientGraph,
showing how applications including XR, Auto/V2X, and holographic telepresence can benefit from the
intelligent routing, rate limiting, and service placement decisions computed by the GradientGraph platform.
We also intend to demonstrate the integration of OpenALTO and GradientGraph with science networks,
Rucio, FTS, AutoGOLE, SENSE and OpenALTO towards optimizing large-scale data transfers from the
Large Hadron Collider at CERN to scientists across the globe.
Global Petascale to Exascale Workflows
Elements and Goals of the Demonstrations (IV)
 SENSE: The Software-defined network for End-to-end Networked Science at Exascale
 The SENSE project is building smart network services to accelerate scientific discovery in the
era of ‘big data’ driven by Exascale, cloud computing, machine learning and AI. The SENSE
SC22 demonstration showcases a comprehensive approach to request and provision end- to-
end network services across domains that combines deployment of infrastructure across
multiple labs/campuses, SC booths and WAN with a focus on usability, performance and
resilience through:
 Priority QoS for SENSE enabled flows
 Multi-point and point-to-point services
 Auto-provisioning of network devices and Data Transfer Nodes (DTNs);
 Intent-based, interactive, real time application interfaces providing intuitive access
to intelligent SDN services for Virtual Organization (VO) services and managers;
 Policy-guided end-to-end orchestration of network resources, coordinated with the
science programs' systems, to enable real time orchestration of computing and
storage resources.
 Real time network measurement, analytics and feedback to provide the foundation
for full lifecycle status, problem resolution, resilience and coordination between
the SENSE intelligent network services, and the science programs' system services.
Global Petascale to Exascale Workflows
Elements and Goals of the Demonstrations (V)
 N-DISE: Named Data Networking for Data Intensive Science Experiments
 The NDN for Data Intensive Science Experiments (N-DISE) project aims to accelerate the pace of
breakthroughs and innovations in data-intensive science fields such as the Large Hadron Collider
(LHC) high energy physics program and the BioGenome and human genome projects. Based on
Named Data Networking (NDN), a data-centric future Internet architecture, N-DISE will deploy and
commission a highly efficient and field-tested petascale data distribution, caching, access and
analysis system serving major science programs. The N-DISE project will build on recently
developed high-throughput NDN caching and forwarding methods, containerization techniques,
leverage the integration of NDN and SDN systems concepts and algorithms with the mainstream
data distribution, processing, and management systems of CMS, as well as the integration with
Field Programmable Gate Arrays (FPGA) acceleration subsystems, to produce a system capable
of delivering LHC and genomic data over a wide area network at throughputs approaching 100
Gbits per second, while dramatically decreasing download times. N-DISE will leverage existing
infrastructure and build an enhanced testbed with high performance NDN data cache servers at
participating institutions.
 The N-DISE demonstration is designed to exhibit improved performance of the N-DISE system for
workflow acceleration within large-scale data-intensive programs such as the LHC high energy
physics, BioGenome and human genome programs. To achieve high performance, the
demonstration will leverage the following key components: (1) the transparent integration of NDN
with the current CMS software stack via an NDN based XRootD Open Storage System plugin, (2)
joint caching and multipath forwarding capabilities of the VIP algorithm, (3) integration with FPGA
acceleration subsystems, (4) SDN support for NDN through the work of the Global Network
Advancement Group (GNA-G) and its AutoGOLE/SENSE and Data Intensive Sciences Working
Group.
Global Petascale to Exascale Workflows
Elements and Goals of the Demonstrations (VI)
 High performance networking with the Bella Link and the Sao Paulo Backbone
 With the recent availability of an L2 circuit linking the University of São Paulo State
(UNESP) to CERN in Geneva, through the Bella Link, Brazil (and São Paulo State) now has
three 100 gbps academic links to the USA and to Europe. This new circuit is supposed to
present low latency, and that should open new opportunities for collaboration among
Brazilian and European academic institutions. All links pass through the SP4 Equinix
datacenter in the greater Sao Paulo area. At the SP4 datacenter, rednesp (Research and
Education Network at Sao Paulo) has a node called SouthernLight which is part of the GNA
Autogole/SENSE testbed.
 At the same time, the Rednesp Sao Paulo regional network in finishing the "Backbone SP"
connecting 9 important universities in Sao Paulo state. In this demo, we intend to compare
properties such as effective bandwidth, latency to different continents and jitter using
different paths of the 3 intercontinental links as well as the integration of the "Backbone
SP" to international connections. We also expect to show results and metrics of parallel
computations using some of the national and international computing facilities connected
to this infrastructure.
Global Petascale to Exascale Workflows
Elements and Goals of the Demonstrations (VII)
 Coral: Scalable Data Plane Checking via Distributed, On-Device Verification
 Data plane verification (DPV) is important for finding network errors, and therefore a fundamental
pillar for achieving consistently operating, autonomous-driving, high-performance science
networks. Current DPV tools employ a centralized architecture, where a server collects the data
planes of all devices and verifies them. Despite substantial efforts on accelerating DPV, this
centralized architecture is inherently non-scalable. To tackle the scalability challenge of DPV, the
team of Xiamen University, China, circumvents the scalability bottleneck of centralized design
and design Coral, a distributed, on-device DPV framework.
 The key insight of Coral is that DPV can be transformed into a counting problem on a directed
acyclic graph, which can be naturally decomposed into lightweight tasks executed at network
devices, enabling scalability. Coral consists of (1) a declarative requirement specification
language, (2) a planner that employs a novel data structure DVNet to systematically decompose
global verification into on-device counting tasks, and (3) a distributed verification (DV) protocol
that specifies how on-device verifiers communicate task results efficiently to collaboratively
verify the requirements. During SC'22, we will demonstrate, through both a testbed
reconstruction of the Internet2 WAN environment and large-scale simulations of WAN/LAN/DC
with real-world datasets, that Coral consistently achieves scalable DPV under various scenarios.
A preliminary manuscript of Coral can be found at [1] and a set of small-scale demos can be
found at [2].
 [1] Qiao Xiang et al., Switch as a Verifier: Toward Scalable Data Plane Checking via Distributed, On-
Device Verification, https://arxiv.org/abs/2205.07808
 [2] Qiao Xiang et al., Coral System Functionality Demonstration, http://distributeddpvdemo.tech/
Network Traffic Evolution 2021-22 to and from CERN
LHCOPN LHCOPN Traffic CERN to Tier1s
Overall CERN Traffic: LHCOPN, LHCONE, Internet
Out
In
E. Martelli, CERN/IT
HL-LHC Network Needs and Data Challenges
Current Understanding: Q2 2021
 Export of Raw Data from CERN to the Tier1s (350 Pbytes/Year):
 400 Gbps Flat each for ATLAS and CMS Tier1s;
+100G each for other data formats; +100 G each for ALICE, LHCb
 “Minimal” Scenario [*]: Network Infrastructure from CERN to Tier1s Required
 4.8 Tbps Aggregate: Includes 1.2 Tbps Flat (24 X 7 X 365) from the above,
x2 to Accommodate Bursts, and x2 for overprovisioning, for operational
headroom: including both non-LHC use, and other LHC use.
 This includes 1.4 Tbps Across the Atlantic for ATLAS and CMS alone
 Note that the above Minimal scenario is where the network is treated as a
scarce resource, unlike LHC Run1 and Run2 experience in 2009-18.
 In a “Flexible Scenario” [**]: 9.6 Tbps, including 2.7 Tbps Across the Atlantic
Leveraging the Network to obtain more flexibility in workload scheduling,
increase efficiency, improve turnaround time for production & analysis
 In this scenario: Links to Larger Tier1s in the US and Europe: ~ 1 Tbps
(some more); Links to Other Tier1s: ~500 Gbps
 Tier2 provisioning: 400Gbps bursts, 100G Yearly Avg: ~Petabyte Import in a shift
 Need to work with campuses to accommodate this: it may take years
[*] NOTE: Matches numbers presented at ESnet Requirements Review (Summer 2020)
[**] NOTE: Matches numbers presented at the January 2020 LHCONE/LHCOPN Meeting
(Southern) California ((So)Cal) Cache
53
Roughly 30,000 cores across Caltech & UCSD … half typically used for analysis
A 2 Pbyte Working Example in Production
Plan to include Riverside and other SoCal Tier3s; ESnet plan to install additional in-
network caches near US Tier2s; Move to Ceph underway
Scaling to HL LHC: ~30 Pbytes Per Tier2, ~10 Pbyte Caches, 1 Pbyte Refresh in a Shift
Requires 400G Link. Still relies on use of compact event forms,
efficiently managed data transport
SENSE: SDN for End-to-end Networked
Science at the Exascale
ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland
54
 Use Cases: Inter-DTN Priority Flows, LHC File Transfer Service (FTS),
SENSE/AutoGole Integration (GNA-G), ExaFEL, Big Data Express
Internet2
AL2S
ESnet
AutoGOLE / SENSE Deployment
StarLight
NetherLight SURFNet
MOXY
CENIC
PacWave
RNP
AMPATH
Southern
Light
CERN
UCSD
LOSA
SEAT
SUNY
KREONET/
KRLight
KISTI
AMPATH
RNP
MANLAN
WIX
Site/End Systems Exchange Point Network
ESnet
Testbed
STAR
SENSE: SDN for End-to-end Networked
Science at the Exascale
ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland
56
SENSE: SDN for End-to-end Networked
Science at the Exascale
ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland
57
Self-driving Networks| BERKELEY LAB
Self-Driving Networks
“Networks should learn to drive themselves”*
58
[*] Why (and How) Networks Should Run Themselves,
Feamster and Rexford
https://doi.org/10.48550/arXiv.1710.11583
Example controllers:
Ryu, OpenDaylight,
OSCARS
Listener APIs
Supervised/Unsupervised
classification:
anomaly detection
Regression: forecasting
Action
plan
Network
Telemetry
Execute
Control
Reward function Inputs to AI agent
Try different
optimization
strategies and/or
objective functions
Can take simple actions to achieve key goals. Such as:
improving availability, attack resilience and dealing with scale.
Our argument is AI is needed for mission critical actions.
M. Kiran, C, Guok Talk at APAN2021
Intelligent Controller (HECATE)
Case Studies:
1. Model free: Path selection for large data transfers: better load balancing
2. Model Free: Forwarding decisions for complex network topologies:
Deep RL to learn optimal packet delivery policies vs. network load level
3. Model Based: Predicting network patterns with Netpredict
Self Driving Network
Adaptive Routing (e.g. Real
time data for routing decisions)
Learns to Avoid Congestion
Congestion Free  Loss Free
Towards 100% Utilization
Proactive Fault Repair
Mariam Kiran (ESnet) et al. Intelligent Networks DOE Project
Use Deep Reinforcement
Learning to Optimize network
traffic engineering
Self-Driving Network for Science
60
Model-based
learning
Model-free
learning
Predicting Future
Congestion
Study traffic
patterns
action
observations
e.g.Network
telemetry or
monitoring
reward e.g.
flow
completion
time
e.g.
Reconfigure
network e.g.
update flow
rules
Using Deep Reinforcement Learning for
optimizing traffic engineering over
networks
Kiran et al. Intelligent Networks DOE Project
 D-DCRNN: Spatiotemporal forecasting with applications in neuroscience, climate,
traffic flow, smart grid, logistics, supply chain: https://arxiv.org/pdf/1707.01926.pdf
 Future: Hooks for a Richer, More Stateful Objective Function
(Policy, Priority, Deadlines, Path Quality, Co-Flows...);
Event Response; Applications to 5G and Quantum Networks
Mariam Kiran (ESnet)
mkiran@lbl.gov et al
Self-driving Networks| BERKELEY LAB
HECATE Simulation
61
UR: link utilization rate
Moves incoming traffic
to less used paths
Science Networks are Expanding to Wireless 5G, 6G
Sensor Networks and Beyond
62
 Statically deployed, unattended network
sensors need full-time monitoring
 While sensors in mobile networks
(including unmanned aerial systems)
can self-manage and optimize data
collection
 For continual monitoring of the sites, data
is processed in real-time and transmitted
to a central location with 5G speed for
further processing
 Novel mesh network topologies for flexible
inclusion of new sources - much like new
cells in the body - are being developed
 Can use 5G latency to handle the tradeoff
between edge and cloud processing
SENSE: SDN Enabled Networks for Science at the Exascale
ESnet, Caltech, Fermilab, LBNL https://arxiv.org/abs/2004.05953
Creates Virtual Circuit Overlays. Site and Network RMs, Orchestrator
SC20 to 22+: AutoGOLE/SENSE Persistent Testbed:
ESnet, SURFnet, Internet2, StarLight, CENIC, Pacific Wave, AmLight, RNP, USP, UFES, GridUNESP,
Hawaii, Guam, KISTI, Tokyo,Caltech, UCSD, PRP, FIU, CERN, Fermilab, UMd, DE-KIT
Federation
with the StarLight
GEANT/RARE
and AmLight
P4 Testbeds
Underway
2022-23 Outlook:
ESnet6/High Touch
KAUST
FABRIC
BRIDGES
US CMS Tier2s
GridUNESP, UERJ
SKAO
AarNet
TIFR et al
APONet
Courtesy T. Lehman
Caltech/
UCSD/
LA 100G to 4 X
100 to 400G
with CENIC
Enhanced for
SC22
and Beyond
400G Link(s)
NetherLight-
CERN
Automation
Following
SENSE, NRP,
ALTO, Atlantic
Wave SDX
Persistent Operations: Beginning this Year
NRP
●Goal is to more flexibly control
how these are utilized on
a per flow, group, or use basis
●Do not want to manage "every"
flow in the network, however
should be able to manage
"any" flow in the network
●There are multiple transatlantic
and transpacific links, operated
by multiple organizations
transatlantic links
Important Link Management
● Two timescales: SENSE overlay network of virtual circuits
with BW guarantees is relatively stable; IPv6 subnets and
Directors provide more dynamic flow handling
● An equally important
goal is to understand
the load and leave room
for other traffic
● Compatible with other
network operations
● Result is to make better
use of the available
resources overall
SENSE Services for LHC and Other Open Science Grid
(OSG) Workflows: Rucio, FTS, XRootD Integration
 Goals center around SENSE providing key network-related
capabilities to LHC + 30 other science programs
 Primary Motivations:
 Developing mechanisms for an application workflow to obtain information
regarding the network services, capabilities, and options;
to a degree similar to what is possible for compute resources
 Giving applications the ability to interact with the network:
to exchange information, negotiate performance parameters,
discover expected performance metrics, and receive status/
troubleshooting information in real time, during long transactions.
 Key pathways:
(1) Interfacing and interaction with RUCIO, FTS and XRootD data federation
and storage systems, through Engagement:
with CMS, ATLAS and those development teams
(2) Enabled by the ESnet6 plan: with up to ~75% of link capacity
on key routes available for policy-driven dynamic provisioning
and management by ~2027-28
 4 Phase 1 Year Schedule, Completing in Q1 2023:
from system evaluation to design, to prototype deployments and tests,
to a plan for the transition to operations and the additional R&D needed 66
European Science
Data Center
OSG Data Federation
Vera Rubin Observatory
Interfacing to Multiple VOs With FTS/Rucio/XRootD
LHC, Dark Matter, n, Heavy Ions, VRO, SKAO, LIGO/Virgo/Kagra; Bioinformatics
Beyond Programmability Alone: A Systems Approach
Qualcomm Technologies GradientGraph Analytics
68
 System Wide
information to identify,
deal with root causes
Key Components
• Bottleneck precedence
+ flow gradient graphs
• Impactful flow and
flow group ID
• Alternate path
recommendations
www.reservoir.com/gradientgraph/
 Objective: Flow performance optimization in high
speed networks, with fairness
 Approach: Built on a new mathematical Theory of
Bottleneck Structures and an analytical framework
 Enables operators to understand and precisely control
flow(-group) and bottleneck performance
 Value: Improved capacity planning, traffic engineering
 Greater, more effective network throughput and stability
as a function of capacity and cost
 "On the Bottleneck Structure of Congestion-Controlled Networks," ACM
SIGMETRICS, Boston, June 2020 [https://bit.ly/3eGOPrb]
 "Designing Data Center Networks Using Bottleneck Structures,"
accepted for publication at ACM SIGCOMM 2021 [https://bit.ly/2UZCb1M]
 "Computing Bottleneck Structures at Scale for High-Precision Network
Performance Analysis," SC 2020 INDIS, November 2020
[https://bit.ly/3BriwaB]
 "A Quantitative Theory of Bottleneck Structures for Data Networks",
Technical Report, available upon request [https://bit.ly/38u8ARs]
Global P4 Lab and Production Grade
Programmable Switches
Marcos Schwarz
SC22
November 14-17 2022, Dallas - TX
Motivation
Can we increase the rate of evolution without interfering with production networks?
To develop and operate end-to-end / multi-domain orchestration services:
• Resource reservation (guaranteed bandwidth)
• Resource provisioning (Circuits, VRFs)
• Underlay observability
• Dynamic traffic steering/engineering
• Dynamic creation of L3 VPNs
• Closed loop multi-domain visibility/intelligence/controllability
How can we create/sustain an integration initiative/platform to propose and
validate next generation protocol and services?
• Proposition: Use programmable P4 devices to experiment on pre-production
networks leveraging industry/R&E open ecosystems
Global P4 Lab Goals
Current (SC22)
• Persistent global L3 overlay network based on P4 switches
• Intercontinental high capacity transfers (100G and over) exploring multiple
source routing solutions (Segment Routing and PolKA)
• Management infrastructure and tools used to operate this global network
• Core network based on RARE/FreeRtr and edge networks based also on SONiC
Future (2023)
• Capability to support multiple virtual networks, that implement on the same
devices different choices of routing stacks, traditional and SDN based
• Integration with initiatives for visibility, controllability and intelligence
• Work as a reference state of the art / next generation R&E network
Integration Path with other Initiatives
1 von 72

Recomendados

Session 33 - Production Grids von
Session 33 - Production GridsSession 33 - Production Grids
Session 33 - Production GridsISSGC Summer School
542 views51 Folien
UK e-Infrastructure: Widening Access, Increasing Participation von
UK e-Infrastructure: Widening Access, Increasing ParticipationUK e-Infrastructure: Widening Access, Increasing Participation
UK e-Infrastructure: Widening Access, Increasing ParticipationNeil Chue Hong
931 views44 Folien
Lesson1 esa summer_school_brovelli von
Lesson1 esa summer_school_brovelliLesson1 esa summer_school_brovelli
Lesson1 esa summer_school_brovelliMaria Antonia Brovelli
1.2K views42 Folien
2. the grid von
2. the grid2. the grid
2. the gridDr Sandeep Kumar Poonia
672 views30 Folien
Victoria A. White Head, Computing Division Fermilab von
Victoria A. White Head, Computing Division FermilabVictoria A. White Head, Computing Division Fermilab
Victoria A. White Head, Computing Division FermilabVideoguy
480 views40 Folien
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth... von
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...
NextGEOSS: The Next Generation European Data Hub and Cloud Platform for Earth...Wolfgang Ksoll
2K views20 Folien

Más contenido relacionado

Similar a Global Network Advancement Group Next Generation Network-Integrated Systems

Security Challenges and the Pacific Research Platform von
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformLarry Smarr
883 views38 Folien
The Earth System Grid Federation: Origins, Current State, Evolution von
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
15 views30 Folien
International collaboration with Cloud v1 von
International collaboration with Cloud v1International collaboration with Cloud v1
International collaboration with Cloud v1GovCloud Network
650 views11 Folien
GLENNA: The Nordic cloud von
GLENNA: The Nordic cloud GLENNA: The Nordic cloud
GLENNA: The Nordic cloud EOSC-hub project
122 views9 Folien
GEOSHARE_GLP_pres.pdf von
GEOSHARE_GLP_pres.pdfGEOSHARE_GLP_pres.pdf
GEOSHARE_GLP_pres.pdfJose Lozano
3 views41 Folien
EGI Engage: Impact & Results von
EGI Engage: Impact & ResultsEGI Engage: Impact & Results
EGI Engage: Impact & ResultsEGI Federation
52 views17 Folien

Similar a Global Network Advancement Group Next Generation Network-Integrated Systems(20)

Security Challenges and the Pacific Research Platform von Larry Smarr
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
Larry Smarr883 views
The Earth System Grid Federation: Origins, Current State, Evolution von Ian Foster
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster15 views
International collaboration with Cloud v1 von GovCloud Network
International collaboration with Cloud v1International collaboration with Cloud v1
International collaboration with Cloud v1
GovCloud Network650 views
OGF Introductory Overview - FAS* 2014 von Alan Sill
OGF Introductory Overview -  FAS* 2014OGF Introductory Overview -  FAS* 2014
OGF Introductory Overview - FAS* 2014
Alan Sill302 views
David Park APAN Slid.. von Videoguy
David Park APAN Slid..David Park APAN Slid..
David Park APAN Slid..
Videoguy230 views
Panel: NRP Science Impacts​ von Larry Smarr
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
Larry Smarr92 views
Data accessibility and the role of informatics in predicting the biosphere von Alex Hardisty
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
Alex Hardisty707 views
Gergely Sipos (EGI): Exploiting scientific data in the international context ... von Gergely Sipos
Gergely Sipos (EGI): Exploiting scientific data in the international context ...Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos (EGI): Exploiting scientific data in the international context ...
Gergely Sipos11 views
How to expand the Galaxy from genes to Earth in six simple steps (and live sm... von Raffaele Montella
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
Raffaele Montella760 views
Geant gn3plus fact sheet von GÉANT
Geant gn3plus fact sheetGeant gn3plus fact sheet
Geant gn3plus fact sheet
GÉANT235 views
Software Sustainability Institute von Neil Chue Hong
Software Sustainability InstituteSoftware Sustainability Institute
Software Sustainability Institute
Neil Chue Hong547 views
GÉANT GN3plus an introduction to europe's 500 gbps research and education net... von GÉANT
GÉANT GN3plus an introduction to europe's 500 gbps research and education net...GÉANT GN3plus an introduction to europe's 500 gbps research and education net...
GÉANT GN3plus an introduction to europe's 500 gbps research and education net...
GÉANT1.3K views
The Pacific Research Platform von Larry Smarr
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
Larry Smarr800 views
Open Based Systems Eng 010706 Titech von Farhan Helmy
Open Based Systems Eng 010706 TitechOpen Based Systems Eng 010706 Titech
Open Based Systems Eng 010706 Titech
Farhan Helmy1K views

Más de Larry Smarr

Panel: Reaching More Minority Serving Institutions von
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
80 views100 Folien
Global Network Advancement Group - Next Generation Network-Integrated Systems von
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated SystemsLarry Smarr
109 views72 Folien
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... von
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...Larry Smarr
98 views13 Folien
Panel Discussion: Engaging underrepresented technologists, researchers, and e... von
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Larry Smarr
84 views12 Folien
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon von
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonThe Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonLarry Smarr
93 views22 Folien
Panel: Reaching More Minority Serving Institutions von
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsLarry Smarr
8 views100 Folien

Más de Larry Smarr(20)

Panel: Reaching More Minority Serving Institutions von Larry Smarr
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
Larry Smarr80 views
Global Network Advancement Group - Next Generation Network-Integrated Systems von Larry Smarr
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated Systems
Larry Smarr109 views
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... von Larry Smarr
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Larry Smarr98 views
Panel Discussion: Engaging underrepresented technologists, researchers, and e... von Larry Smarr
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Larry Smarr84 views
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon von Larry Smarr
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonThe Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
Larry Smarr93 views
Panel: Reaching More Minority Serving Institutions von Larry Smarr
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
Larry Smarr8 views
Panel: The Global Research Platform: An Overview von Larry Smarr
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An Overview
Larry Smarr94 views
Panel: Future Wireless Extensions of Regional Optical Networks von Larry Smarr
Panel: Future Wireless Extensions of Regional Optical NetworksPanel: Future Wireless Extensions of Regional Optical Networks
Panel: Future Wireless Extensions of Regional Optical Networks
Larry Smarr119 views
Global Research Platform Workshops - Maxine Brown von Larry Smarr
Global Research Platform Workshops - Maxine BrownGlobal Research Platform Workshops - Maxine Brown
Global Research Platform Workshops - Maxine Brown
Larry Smarr92 views
Built around answering questions von Larry Smarr
Built around answering questionsBuilt around answering questions
Built around answering questions
Larry Smarr101 views
Democratizing Science through Cyberinfrastructure - Manish Parashar von Larry Smarr
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish Parashar
Larry Smarr114 views
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses; von Larry Smarr
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Larry Smarr92 views
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je... von Larry Smarr
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...
Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Je...
Larry Smarr101 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr193 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr7 views
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B... von Larry Smarr
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Larry Smarr6 views
Frank Würthwein - NRP and the Path forward von Larry Smarr
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forward
Larry Smarr130 views
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks von Larry Smarr
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical NetworksRobert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks
Robert Kwon: Panel - Future Wireless Extensions of Regional Optical Networks
Larry Smarr5 views
Larry Smarr - NRP Application Drivers von Larry Smarr
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application Drivers
Larry Smarr141 views
Richard Alo: Panel - Reaching More Minority-Serving Campuses von Larry Smarr
Richard Alo: Panel -  Reaching More Minority-Serving CampusesRichard Alo: Panel -  Reaching More Minority-Serving Campuses
Richard Alo: Panel - Reaching More Minority-Serving Campuses
Larry Smarr22 views

Último

Climate Change.pptx von
Climate Change.pptxClimate Change.pptx
Climate Change.pptxlaurenmortensen1
106 views1 Folie
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx von
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptxENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptxMN
6 views13 Folien
Workshop Chemical Robotics ChemAI 231116.pptx von
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptxMarco Tibaldi
90 views41 Folien
A training, certification and marketing scheme for informal dairy vendors in ... von
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...ILRI
8 views13 Folien
Ecology von
Ecology Ecology
Ecology Abhijith Raj.R
6 views10 Folien
"How can I develop my learning path in bioinformatics? von
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?Bioinformy
17 views13 Folien

Último(20)

ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx von MN
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptxENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
ENTOMOLOGY PPT ON BOMBYCIDAE AND SATURNIIDAE.pptx
MN6 views
Workshop Chemical Robotics ChemAI 231116.pptx von Marco Tibaldi
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptx
Marco Tibaldi90 views
A training, certification and marketing scheme for informal dairy vendors in ... von ILRI
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...
ILRI8 views
"How can I develop my learning path in bioinformatics? von Bioinformy
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?
Bioinformy17 views
Company Fashion Show ChemAI 231116.pptx von Marco Tibaldi
Company Fashion Show ChemAI 231116.pptxCompany Fashion Show ChemAI 231116.pptx
Company Fashion Show ChemAI 231116.pptx
Marco Tibaldi69 views
Types of Fluids - Newtonian and Non Newtonian Fluids in Continuous Culture Fe... von Pavithra B R
Types of Fluids - Newtonian and Non Newtonian Fluids in Continuous Culture Fe...Types of Fluids - Newtonian and Non Newtonian Fluids in Continuous Culture Fe...
Types of Fluids - Newtonian and Non Newtonian Fluids in Continuous Culture Fe...
Pavithra B R11 views
Gold Nanoparticle as novel Agent for Drug targeting (1).pptx von sakshijadhav9843
Gold Nanoparticle as novel Agent for Drug targeting (1).pptxGold Nanoparticle as novel Agent for Drug targeting (1).pptx
Gold Nanoparticle as novel Agent for Drug targeting (1).pptx
sakshijadhav984316 views
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... von GIFT KIISI NKIN
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
GIFT KIISI NKIN12 views
Ethical issues associated with Genetically Modified Crops and Genetically Mod... von PunithKumars6
Ethical issues associated with Genetically Modified Crops and Genetically Mod...Ethical issues associated with Genetically Modified Crops and Genetically Mod...
Ethical issues associated with Genetically Modified Crops and Genetically Mod...
PunithKumars618 views
PRINCIPLES-OF ASSESSMENT von rbalmagro
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENT
rbalmagro9 views
himalay baruah acid fast staining.pptx von HimalayBaruah
himalay baruah acid fast staining.pptxhimalay baruah acid fast staining.pptx
himalay baruah acid fast staining.pptx
HimalayBaruah5 views
Researching and Communicating Our Changing Climate von Zachary Labe
Researching and Communicating Our Changing ClimateResearching and Communicating Our Changing Climate
Researching and Communicating Our Changing Climate
Zachary Labe5 views
Matthias Beller ChemAI 231116.pptx von Marco Tibaldi
Matthias Beller ChemAI 231116.pptxMatthias Beller ChemAI 231116.pptx
Matthias Beller ChemAI 231116.pptx
Marco Tibaldi82 views
domestic waste_100013.pptx von padmasriv25
domestic waste_100013.pptxdomestic waste_100013.pptx
domestic waste_100013.pptx
padmasriv2510 views
Conventional and non-conventional methods for improvement of cucurbits.pptx von gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97614 views

Global Network Advancement Group Next Generation Network-Integrated Systems

  • 1. Global Network Advancement Group Next Generation Network-Integrated Systems 4th National Research Platform Workshop: NRP, GRP, GNA-G Partnership Harvey Newman, Caltech, GNA DIS WG Chair February 9, 2023
  • 2. Higgs Boson Couples to Mass Ten Year Anniversary of the Higgs Boson Discovery: July 4 2022 Search for Higgs Boson Pairs A portrait of the Higgs boson by the CMS experiment ten years after the discovery | Nature
  • 3. 3 10th Anniversary of the Higgs Boson Discovery and Beyond; 75 Years of Exploration ! Advanced Networks Were Essential to Higgs Discovery and Every Ph.D Thesis; They will be Essential to All Future Discoveries •NOTE: ~95% of Data Still to be Taken •Greater Intensity: Upgraded detectors for more complex events •To 5X Data Taking Rate in 2029-40 Englert 2013 Nobel Prize Higgs 48 Year Search; 75 Year Exploration Theory (1964): 1950s – 1970s; LHC + Experiments Concept: 1984 Construction: 2001; Operation: 2009 Run1: Higgs Boson Discovery 2012 Run2 and Going Forward: Precision Measurements and BSM Exploration: 2013 - 2042 
  • 4. Global Data Flow: A Worldwide System Conceived by Caltech and UCSD. First Tier2 in 1999-2001 Tier 1 Tier2 Center Online System CERN Center 100 PBs of Disk; Tape Robot Chicago Institute Institute Institute Institute Workstations ~1000 MBytes/sec 10 to 100 Gbps Physics data cache ~PByte/sec N X100 Gbps Tier2 Center Tier2 Center Tier2 Center 10 to 100 Gbps Tier 0 Tier 3 Tier 4 Tier 2 Experiment London Paris Taipei CACR 13 Tier1, 170 Tier2 + 300 Tier3 Centers: 1k to N X 10k Cores  Tier0: Real Time Data Processing Tier1 and Tier2: Data and Simulation Production Processing Tier2 and Tier3: Data Analysis Increased Use as a Cloud Resource (Any Job Anywhere) Increased “Elastic” Use of Additional HPC and Cloud Resources A Global Dynamic System: Fertile Ground for Control with ML Bologna
  • 5. Core of LHC Networking LHCOPN, LHCONE, GEANT, ESnet, Internet2, CENIC… + NRENs in Europe, Asia, Latin America, Au/NZ; US State Networks LHCOPN: Simple & Reliable Tier0+1 Ops GEANT Internet2 ESnet (with EEX) CENIC and NRP LHCONE VRF: 170 Tier2s ++
  • 6. + NRENs in Europe, Asia, Latin America, Au/NZ; US State Networks LHCOPN: Simple & Reliable Tier0+1 Ops GEANT Internet2 ESnet (with EEX) CENIC and NRP LHCONE VRF: 170 Tier2s CENIC and Pacific Wave: A “Regional” Network for California with Global Reach and Vision East and West
  • 7. WLCG Transfers Dashboard: 3/2017 to 3/2022 HL LHC “10%” Data Challenge in October 2021 exceeded 100 Gbytes/sec for >1 Day The “30%” Challenge Scheduled for 2024 will be a real challenge General WLCG Traffic is again at or above the pre-pandemic level https://monit-grafana.cern.ch/d/AfdonIvGk/wlcg-transfers?orgId=20
  • 8. Towards a Computing Model for the HL LHC Era Challenges: Capacity in the Core and at the Edges  Programs such as the LHC have experienced rapid exponential traffic growth, at the level of 40-60% per year  At the January 2020 LHCONE/LHCOPN meeting at CERN, CMS and ATLAS expressed the need for Terabit/sec links on major routes, by the start of the HL-LHC in ~2029  This is projected to outstrip the affordable capacity  The needs are further specified in “blueprint” Requirements documents by US CMS and US ATLAS, submitted to the ESnet Requirements Review in August 2020, and captured in a comprehensive DOE Requirements Review report for HEP: https://escholarship.org/uc/item/78j3c9v4  Three areas of particular capacity-concern by 2028-9 were identified: (1) Exceeding the capacity across oceans, notably the Atlantic, served by the Advanced North Atlantic (ANA) network consortium (2) Tier2 centers at universities requiring 100G 24 X 7 X 365 average throughput with sustained 400G bursts (a petabyte in a shift), and (3) Terabit/sec links to labs and HPC centers (and edge systems) to support multi-petabyte transactions in hours rather than days 8
  • 9. Mission: To Support global research and education using the technology, infrastructures and investments of its participants The GNA-G exists to bring together researchers, National Research and Education Networks (NRENs), Global eXchange Point (GXP) operators, regionals and other R&E providers, in developing a common global infrastructure to support the needs. https://www.gna-g.net/
  • 10. GREN: Collaboration on the intercontinental transmission layer Telemetry Data Intensive Science AutoGOLE/SENSE GREN Mapping GXP Architectures & Services Connecting Offshore Students Research & Development Operational Link consortia APR ANA … AER AmLight GNA Architecture v2.0 GNA-G Leadership Team Routing Anomalies Network Automation GNA-G Executive Liaison GNA-G participant CEOs etc Global NREN CEO Forum Securing the GREN Strategic Security Focus Group GNA-G overview and Vision
  • 11. The GNA-G Data Intensive Sciences WG  Principal aims of the GNA-G DIS WG: (1) To meet the needs and address the challenges faced by major data intensive science programs  In a manner consistent and compatible with support for the needs of individuals and smaller groups in the at large A&R communities (2) To provide a forum for discussion, a framework and shared tools for short and longer term developments meeting the program and group needs  To develop a persistent global testbed as a platform, to foster ongoing developments among the science and network communities  While sharing and advancing the (new) concepts, tools & systems needed  Members of the WG partner in joint deployments and/or developments of generally useful tools and systems that help operate and manage R&E networks with limited resources across national and regional boundaries  A special focus of the group is to address the growing demand for  Network-integrated workflows  Comprehensive cross-institution data management  Automation, and  Federated infrastructures encompassing networking, compute, and storage  Working Closely with the AutoGOLE/SENSE WG, NRP and GRP 1 1 Charter: https://www.dropbox.com/s/4my5mjl8xd8a3y9/GNA-G_DataIntensiveSciencesWGCharter.docx?dl=0
  • 12. The GNA-G Data Intensive Sciences WG  Mission: Meet the challenges of globally distributed data and computation faced by the major science programs  Coordinate provisioning the feasible capacity across a global footprint, and enable best use of the infrastructure:  While meeting the needs of the participating groups, large and small  In a manner Compatible and Consistent with use by the at-large A&R communities  Members: Alberto Santoro, Azher Mughal, Bijan Jabbari, Brian Yang, Buseung Cho, Caio Costa, Carlos Antonio Ruggiero, Carlyn Ann-Lee, Chin Guok, Ciprian Popoviciu, Dale Carder, David Lange, David Wilde, Dima Mishin, Edoardo Martelli, Eduardo Revoredo, Eli Dart, Eoin Kenney, Frank Wuerthwein, Frederic Loui, Harvey Newman, Heidi Morgan, Iara Machado, Inder Monga, Jeferson Souza, Jensen Zhang, Jeonghoon Moon, Jeronimo Bezerra, Jerry Sobieski, Joao Eduardo Ferreira, Joe Mambretti, John Graham, John Hess, John Macauley, Julio Ibarra, Justas Balcas, Kai Gao, Karl Newell, Kaushik De, Kevin Sale, Lars Fischer, Liang Zhang, Mahdi Solemani, Maria Del Carmen Misa Moreira, Marcos Schwarz, Mariam Kiran, Matt Zekauskas, Michael Stanton, Mike Hildreth, Mike Simpson, Ney Lemke, Phil Demar, Raimondas Sirvinskas, Richard Hughes-Jones, Rogerio Iope, Sergio Novaes, Shawn McKee, Siju Mammen, Susanne Naegele- Jackson, Tom de Fanti, Tom Hutton, Tom Lehman, William Johnston, Xi Yang, Y. Richard Yang  Participating Organizations/Projects:  ESnet, Nordunet, SURFnet, AARNet, AmLight, KISTI, SANReN, GEANT, RNP, CERN, Internet2, CENIC/Pacific Wave, StarLight, NetherLight, Southern Light, Pacific Research Platform, FABRIC, RENATER, ATLAS, CMS, VRO, SKAO, OSG, Caltech, UCSD, Yale, FIU, UERJ, GridUNESP, Fermilab, Michigan, UT Arlington, George Mason, East Carolina, KAUST  Working Closely with the AutoGOLE/SENSE WG  Meets Weekly or Bi-weekly; All are welcome to join. 12 Charter: https://www.dropbox.com/s/4my5mjl8xd8a3y9/GNA-G_DataIntensiveSciencesWGCharter.docx?dl=0
  • 13. AutoGOLE / SENSE Working Group  Worldwide collaboration of open exchange points and R&E networks interconnected to deliver network services end-to-end in a fully automated way. NSI for network connections, SENSE for integration of End Systems and Domain Science Workflow facing APIs.  Key Objective:  The AutoGOLE Infrastructure should be persistent and reliable, to allow most of the time to be spent on experiments and research.  Key Work areas:  Control Plane Monitoring: Prometheus based, deployment now underway  Data Plane Verification and Troubleshooting Service: Study and design group formed  AutoGOLE related software: Ongoing enhancements to facilitate deployment and maintenance (Kubernetes, Docker based systems)  Experiment, Research, Multiple Activity, Use Case support: Including NOTED, Gradient Graph, P4 Topologies, Named Data Networking (NDN), Data Transfer Systems integration and testing.  WG information https://www.gna-g.net/join-working-group/autogole-sense
  • 14. 14 Towards a Next Generation Network-Integrated System For Data Intensive Sciences The Global Network Advancement Group The AutoGOLE/SENSE and Data Intensive Sciences WGs Partnering with GRP and NRP
  • 15. GNA-G AutoGOLE/SENSE WG Global Persistent Testbed Software Driven Network OSes; Programmability at the Edges and in the Core
  • 16. P4 + SONIC Programmable Global PersistentTestbed 19 Active GNA-G/RARE P4, SONIC Testbed Sites • Caltech, Pasadena-US: 4 FreeRtr/P4 + SONIC • CERN, Geneva-CH: FreeRtr/P4 • FIU, Miami-US: FreeRtr/P4 • GEANT, Amsterdam-NL: FreeRtr/P4 • GEANT, Budapest-HU: FreeRtr/P4 • GEANT, Frankfurt-DE: FreeRtr/P4 • GEANT, Paris-FR: FreeRtr/DPDK • GEANT, Poznan-PL: FreeRtr/P4 • GEANT, Prague-CZ: FreeRtr/DPDK • RENATER, Paris-FR: FreeRtr/P4 • SouthernLight (FIU/RedClara/Rednesp/RNP), São Paulo-BR: FreeRtr/P4 • StarLight, Chicago-US: FreeRtr/P4 • SWITCH, Geneva-CH: FreeRtr/P4 • TCD, Dublin-IE: FreeRtr/P4 • Tennessee Tech: FreeRtr/P4 + 9 Expected sites (by SC22): • HEAnet, Dublin-IE: FreeRtr/P4 • JISC, London-UK: FreeRtr/P4 • KAUST, Saudi Arabia: FreeRtr/DPDK • KISTI, South Korea: SONiC/P4 • RNP, Rio de Janeiro-BR: FreeRtr/P4, SONiC/P4 • SC22 Caltech Booth, Dallas-US: FreeRtr/P4 • UCSD, San Diego-US: SONiC/P4 • UFES, Vitória-BR: 2x FreeRtr/P4 • UMd, College Park, Maryland-US: FreeRtr/P4
  • 17. GEANT (Now Global) P4 Lab and Dataplane of P4 Programmable Switches A new worldwide platform for  New agile and flexible feature development  New use case development corresponding to research programs’ requirements  Multiple research network overlay slices A global playing field for  Next generation network monitoring tools and systems  Development of fully automated network deployment  New network operations paradigm development VISION: Federate and integrate multiple testbeds and toolsets such as AutoGOLE / SENSE Frederic Loui, Casaba Mate, Marcos Schwarz et al.
  • 18. SC15-22: SDN Next Generation Terabit/sec Ecosystem for Exascale Science SC16+: Consistent Operations with Agile Feedback Major Science Flow Classes Up to High Water Marks supercomputing.caltech.edu Tbps Rings for SC18-22: Caltech, Ciena, Scinet, StarLight + Many HEP, Network, Vendor Partners 45 SDN-driven flow steering, load balancing, site orchestration Over Terabit/sec Global Networks LHC at SC15: Asynchronous Stageout (ASO) with Caltech’s SDN Controller Preview PetaByte Transfers to/ from Site Edges of Exascale Facilities With 100G -1000G DTNs
  • 19. Global Petascale to Exascale Workflows for Data Intensive Sciences Accelerated by Next Generation Programmable Network Architectures and Machine Learning Applications  A Vast Partnership of Science and Computer Science Teams, R&E Networks and R&D Projects  Convened by the GNA-G Data Intensive Sciences WG  Mission  Meeting the challenges faced by leading edge data intensive experimental programs in high energy physics, astrophysics, genomics and other fields of data intensive science  Clearing the path to the next round of discoveries  Demonstrating a wide range of the latest advances in:  Software defined and Terabit/sec networks  Intelligent global operations and monitoring systems  Workflow optimization methodologies with real time analytics  State of the art long distance data transfer methods and tools, local and metro optical networks and server designs  Emerging technologies and concepts in programmable networks and global-scale distributed system Network Research Exhibition NRE-19. Abstract: https://www.dropbox.com/s/qcm41g7f7etjxvy/SC22_NRE_GlobalPetascaleWorkflows_V8072022.docx?dl=0
  • 20. Global Petascale to Exascale Workflows Cornerstone system components and concepts (I)  Integrated operations and orchestrated management of resources: interworking with and advancing the site (Site-RM) and network resource managers (Network-RM) developed in the SENSE program.  An ontological model-driven framework with integration of an analytics engine, API and workflow orchestrator extending work in the SENSE project, enhanced by efficient multi-domain resource state abstractions and discovery mechanisms.  Fine-grained end-to-end monitoring, data collection and analytics, supporting applications with path selection + load balancing mechanisms optimized by machine learning.  The integration of Qualcomm Technology’s GradientGraph (G2) with 5G / OpenALTO / SRv6. These will be used to demonstrate how applications including XR, auto/V2X, and IoT can benefit from the intelligent routing, rate limiting, and service placement decisions computed by G2.  Demonstrating how the integration of G2 with science networks, through the integration of the LHC Rucio / FTS data management system, AutoGOLE / SENSE virtual circuit and orchestration services / and OpenALTO monitoring may be used towards optimizing large-scale data transfers, sharing data among CERN and scientists at sites across the globe.
  • 21. Global Petascale to Exascale Workflows for Data Intensive Sciences  Advances Embedded and Interoperate within a ‘composable’ architecture of subsystems, components and interfaces, organized into several areas:  Visibility: Monitoring and information tracking and management including IETF ALTO/OpenALTO, BGP-LS, sFlow/NetFlow, Perfsonar, Traceroute, Qualcomm Gradient Graph congestion information, Kubernetes statistics, Prometheus, P4/Inband telemetry  Intelligence: Stateful decisions using composable metrics (policy, priority, network- and site-state, SLA constraints, responses to ‘events’ at sites and in the networks, ...), using NetPredict, Hecate, RL-G2, Yale Bilevel optimization, Coral, Elastiflow/Elastic Stack  Controllability: SENSE/OpenNSA/AutoGOLE, P4/PINS, segment routing with SRv6 and/or PolKA, BGP/PCEP  Network OSes and Tools: GEANT RARE/freeRtr, SONIC, Calico VPP, Bstruct-Mininet environment, ...  Orchestration: SENSE, Kubernetes (+k8s namespace), dedicated code and APIs for interoperation and progressive integration
  • 22. Caltech, StarLight, NRL and Partners at SC22 Microcosm: Creating the Future of SCinet and of Networks for Science
  • 23. GNA-G and its Working Groups Worldwide Partnerships at SC22 and Beyond
  • 24. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (Cont’d)  400 Gbps Next Generation Networks  High-Performance Routing of Science Network Traffic  Creation of an overlay network with PolKA tunnels forming virtual circuits  5G/Edge Computing Application Performance Optimization  Integration of OpenALTO, SRv6 and Qualcomm Technologies’ GradientGraph  N-DISE: Named Data Networking for Data Intensive Experiments  Coral: Scalable Data Plane Checking via Distributed, On-Device Verification  High performance networking with the Bella Link and the Sao Paulo Backbone  AmLight Express and Protect (AmLight-EXP)  Global Research Platform (GRP)  Software Defined International Open Exchanges (SDXs) For further Information on NRE-19 Elements and Goals: See the Next Slides
  • 25. NSF CC* Program: Integration and Innovation SCIENTIFIC and BROADER IMPACT ▪ Lay groundwork for an NDN-based data distribution and access system for data-intensive science fields ▪ Benefit user community through lowered costs, faster data access and standardized naming structures ▪ Engage next generation of scientists in emerging concepts of future Internet architectures for data intensive applications ▪ Advance, extend and test the NDN paradigm to encompass the most data intensive research applications of global extent SOLUTIONS + Deliverables CHALLENGES ▪ LHC program in HEP is world’s largest data intensive application: handling One Exabyte by 2019 at hundreds of sites ▪ Global data distribution, processing, access, analysis; large but limited computing, storage, network resources APPROACH ▪ Use Named Data Networking (NDN) to redesign LHC HEP network; optimize workflow N-DISE: NDN for Data Intensive Science Experiments TEAM ▪ Northeastern (PI: Edmund Yeh), Caltech (PI: Harvey Newman) UCLA (PI: Jason Cong), Tennessee Tech (PI: Susmit Shannigrahi) ▪ In partnership with NIST, other LHC sites and the NDN project team ▪ Simultaneously optimize caching (“hot” datasets), forwarding, and congestion control in both the network core and site edges ▪ Development of naming scheme and attributes for fast access and efficient communication in HEP and other fields
  • 26.  Feasibility of NDN-based data distribution system for LHC first demonstrated at SC 18 (Dallas)  SANDIE demo at SC 19 (Denver) showed improved throughput and delay performance  N-DISE demo at SC 21 (done remotely) showed further improvements, ICN ‘22 paper introduced the project to the literature  SC22  Jointly optimized caching and forwarding algorithms developed by Northeastern  High-speed NDN-DPDK forwarder developed by NIST  NDN-based filesystem plugin for XRootD developed by Caltech  FPGA acceleration of NDN-DPDK forwarder by UCLA  Deploying genomics data lakes using K8s over NDN and name-based compute placement to K8 clusters by Tennessee Tech  Goals: (i) rapid and reliable delivery of LHC data over transcontinental layer-2 demo testbed, (ii) demonstration of Kubernetes integration for genomics applications N-DISE
  • 27. Caltech Campus A New Generation Persistent 400G/100G Super-DMZ: CENIC, Pacific Wave, ESnet, Internet2, Caltech, UCSD, StarLight ++ SEA 100GE SNVL SDSC LA Pacific Wave Seattle 1-1 Pacific Wave Sunnyvale 2-1 NCS/Ciena Transponders LAX Agg10 Riser Panel NCS/Ciena Transponders Riser Panel Caltech IMSS Waveserver Ai 200G 200G Caltech IMSS Waveserver Ai CENIC LA 818 W 7th 6th Floor CENIC LA 818 W 7th 10th Floor Cisco NCS 1K4 Arista 7060 DX4 400GE & 100GE Internet2 818 W 7th 10th Floor Starlight CENIC/ Pacific Wave I2 Optical I2 Packet To StarLight To SC22 Dallas 400G 400G ESnet6 Fabric ESnet/Fabric 818 W 7th 10th Floor Pending Connections to CENIC/PacWave Pacific Wave LosA 2-1 and Beyond To Juniper QFX after SC22
  • 29. Next Generation Network-Integrated System  Top Line Message: In order to meet the needs, we need to transition to a new dynamic and adaptive software-driven system, which  Coordinates worldwide networks as a first class resource along with computing and storage, across multiple domains  Simultaneously supports the LHC experiments, other major DIS programs and the larger worldwide academic and research community  Systems design approach: A global dynamic fabric that flexibly allocates, balances and conserves the available network resources Negotiating with site systems that aim to accelerate workflow; Use of ML Builds on ongoing R&D projects: from regional caches/data lakes to intelligent control and data planes to ML-based optimization [E.g. SENSE/AutoGOLE, NOTED, ESNet HT, GEANT/RARE, AmLight, Fabric, Bridges; NetPredict, DeepRoute, ALTO, PolKA ...]  A key milestone: integration of SENSE + network services with FTS & Rucio  We are also leveraging the worldwide move towards a fully programmable ecosystem of networks and end-systems (P4, PINS; SRv6; PolKA), plus operations platforms (OSG, NRP; global SENSE and P4 Testbeds)  The GNA-G and its Working Groups, The LHC experiments together with the WLCG and the worldwide R&E network community, are the key players  Together with leaders and teams from: LBNF/DUNE, VRO, SKA
  • 30. Global Petascale to Exascale Workflows for Data Intensive Sciences  Development Trajectory: Parallel developments + mission-driven progressive interfacing and system-level integration  Architecture: Data Center Analogue  Classes of “Work” (work = transfers, or overall workflow), defined by task parameters and/or priority and policy  Adjust rate of progress in each class to respond to network or site state changes, and “events”  Moderate/balance the rates among the classes to optimize a multivariate objective function with constraints  Overarching Concept: Consistent Network Operations:  Stable load balanced high throughput workflows cross optimally chosen network paths  Provided by autonomous site-resident services dynamically interacting with network-resident services  Up to preset or fliexible high water marks to accommodate other traffic  Responding to (or negotiating with) site demands from the science programs’ principal data distribution and management systems
  • 31. 31 Towards a Next Generation Network-Integrated System For Data Intensive Sciences The Global Network Advancement Group The AutoGOLE/SENSE and Data Intensive Sciences WGs Partnering with GRP and NRP
  • 33. Steps to Arrive at a Fully Functional System by 2028 the Data Challenge Perspective  Three Types of Challenges 1. Functionality Challenge : Where we establish the functionality we want in our software stack, and do so incrementally over time 2. Software Scalability Challenge: Where we take the products that passed the previous challenge, and exercise them at full scale but not on the final hardware infrastructure 3. End-to-end Systems Challenge: On the actual hardware; can only be done once the actual hardware systems are in place.  Targets are 2022-24 for 1 and 2 (or 2024-25 if not all components are ready earlier); 2026-2027 for 3  Remark: it’s conceivable, maybe even likely that it takes multiple attempts to achieve sustained performance at scale with all of the new software we need, with the functionality we want.  + Scaling Challenges: Demonstrate capability to fill ~50% full bandwidth required in the minimal scenario with production-like traffic: Storage to storage, using third party copy protocols and data management services used in production: 2021: 10%; 2023-4: 30%; 2025-6: 60%; 2028: 100% 33 F. Wuerthwein, UCSD/SDSC
  • 34. Rednesp and RNP: Expanding Capacity Among Latin America, US, Europe and Africa  Rednesp (formerly ANSP) has 3 links to the USA, connecting Sao Paulo to Florida.  Atlantic 100G link: direct São Paulo- Miami  Pacific 100G link: São Paulo to Santiago (Chile), from there to Panama, San Juan and Miami.  200 Gbps link: São Paulo to Florida through Fortaleza, on the Monet cable (Angola cables).  RNP also a 200 Gbps link using the Monet cable:  Total 600 Gbps capacity between Brazil and the USA.  Transatlantic Links: The Bella link between São Paulo and Sines in Portugal, and a link connecting São Paulo, Angola and South Africa.  The deployment of the "Backbone SP" has just started. Its main goal is to interconnect most universities in Sao Paulo (state) with 100 gbps links. • The first link connects SP4 to UNIFESP (Federal University at São Paulo). Next, links should connect UNESP (State University of São Paulo), USP (University of São Paulo), UNICAMP
  • 35. BRIDGES: “Binding Research Infrastructures for the Deployment of Global Experimental Science” Across the Atlantic  George Mason and East Carolina Universities (NSF IRNC Program)  High Level View of the Infrastructure and Collaborators:  Virtualization as an Architectural (systems-level) concept, targeting:  Deterministic, predictable performance  Agile, customizable, integrated virtual resource model  Operates as a “fully virtualized” services environment: Users (and Operations personnel) can interact with the BRIDGES services through an interactive web portal, or software agents may interact via a programmatic API to automate resource provisioning and orchestration.
  • 36. (APONet): Closing the Global Ring East and West KREONet2: Closing the Global Ring East and West
  • 37. BRIDGES: “Binding Research Infrastructures for the Deployment of Global Experimental Science” Across the Atlantic
  • 38. AER Update - Aug 2022 ● Since the AER MoU, KAUST is coordinating with REN partners deployment of sharing spare capacity ● KAUST is supporting the following partners by offering point-to-point circuits for submarine cable backup paths: ○ AARnet ○ GÉANT ○ NetherLight ○ NII/SINET ○ SingAREN ● The SC22 NRE Demonstrations will also be supported by KAUST closing the ring from Amsterdam to Singapore and back to the US ○ SC22 NRE Seattle Los Angeles Chicag o SC22 NRE Demos (Caltech) KAUST and the AsiaPacific Oceana Network (APONet): Closing the Global Ring East and West
  • 39. 39 Gua m SingAR EN Hong Kong Daejeo n Los Angeles Seattl e Chicag o Amsterda m Londo n Hawa ii Jedda h Toky o KAU ST Singapo re Cenic KISTI StarLigh t NII/SINE T NetherLig ht GÉANT Open Exchange SC22 Caltech NRE Demonstrations * The dotted paths in the diagram are the ones to be used for the SC22 TransP AC KAU ST SingARE N NII / SINET KISTI Univ. of Hawaii KAUST SC22 Demonstrations (NRE-19)
  • 40. SC22 NREs: AutoGOLE / SENSE and NOTED (Draft) Bill Johnston 9/23/22 SC21: Global footprint. Multiple 400G Optical Wide Area Links SC22: Global footprint. Terabit/sec Triangle Starlight – McLean – Dallas; 400G to LA; 4 X 100G to Caltech and UCSD/SDSC; 3 X 400G to the Caltech Booth
  • 41. Partner NREs at SC22 NRE-001 Edmund Yeh eyeh@ece.neu.edu N-DISE: NDN for Data Intensive Science Experiments NRE-004 Joe Mambretti j-mambretti@northwestern.edu 1.2 Tbps Services WAN Services: Architecture, Technology and Control Systems NRE-005 Joe Mambretti j-mambretti@northwestern.edu 400 Gbps E2E WAN Services: Architecture, Technology and Control Systems NRE-007 Edoardo Martelli edoardo.martelli@cern.ch LHC Networking And NOTED NRE-008 Joe Mambretti j-mambretti@northwestern.edu IRNC Software Defined Exchange (SDX) Multi-Services for Petascale Science NRE-009 Jim Chen jim-chen@northwestern.edu High Speed Network with International P4 Experimental Networks for The Global Research Platform and Other Research Platforms NRE-010 Magnos Martinello magnos.martinello@ufes.br Demonstrating PolKA Routing Approach to Support Traffic Engineering for Data-intensive Science NRE-011 Qiao Xiang xiangq27@gmail.com Coral: Fast Data Plane Verification for Large-Scale Science Networks via Distributed, On-Device Verification NRE-013 Tom Lehman tlehman@es.net AutoGOLE/SENSE: End-to-End Network Services and Workflow Integration NRE-015 Tom Lehman tlehman@es.net SENSE and Rucio/FTS/XRootD Interoperation NRE-016 Marcos Schwarz marcos.schwarz@rnp.br Programmable Networking with P4, GEANT RARE/freeRtr and SONIC/PINS The NRE demonstrations hosted at or partnering with the Caltech Booth 2820:
  • 42. Global Petascale to Exascale Workflows Cornerstone system components and concepts (II)  Adapting NDN for data intensive sciences including advanced cache design and algorithms and parallel code development and methods for fast and efficient access over a global testbed, leveraging the experience in the NDN for Data Intensive Science Experiments (N-DISE) project  Demonstrating a paragon network at many sites equipped with P4 programmable devices, including Tofino and Tofino2-based switches, Smart NICs and Xilinx FPGA- based network interfaces providing packet-by- packet inspection, agile flow management and state tracking, and real-time decisions ● High throughput platform demonstrations in support of workflows for the science programs. Including reference designs of NVMe server systems to match a 400G network core, as well as servers with multi-GPUs and programmable smart NICs with FPGAs. ● Integration of edge-focused extreme telemetry data (from P4 switches, smart edge devices and end hosts) and end facility/application caching stats and other metrics, to facilitate automated decision-making processes. ● Development of dynamic regional caches or “data lakes” that treat nearby as a unified data resource, building on the successful petabyte cache currently in operation between Caltech and UCSD and in ESNet based on the XRootD federated access protocol; extension of the cache concept to more remote sites such as Fermilab, Nebraska and Vanderbilt.
  • 43. Global Petascale to Exascale Workflows Cornerstone system components and concepts (III) ● Creation of an overlay network with PolKA tunnels forming virtual circuits, integrating persistent resources from the GNA-G AutoGOLE/SENSE and GEANT RARE testbeds, to validate the PolKA source routing mechanisms using 100G+ transfers of science data. ● Underlay congestion will be detected by tunnel monitoring and signaled to the overlay so that the traffic is steered away from congested tunnels to other paths. ● Comparisons between segment routing and PolKA regarding controllability and performance metrics are also planned. From the application's point of view. ● PolKA full deployment enables the extreme traffic engineering demands of data- intensive sciences to be met, by offering a new range of network functionalities such as: multipath routing, in-network telemetry and proof-of-transit with path attributes to support higher level stateful traffic engineering decisions. ● Network traffic prediction and engineering optimizations using the latest graph neural network and other emerging deep learning methods, developed by ESnet’s Hecate /DeepRoute project.
  • 44.  LHC:  End to end workflows for large scale data distribution and analysis in support of the CMS experiment’s LHC workflow among Caltech, UCSD, LBL + Fermilab, Nebraska, Vanderbilt, and GridUNESP (Sao Paulo) including automated flow steering, negotiation and DTN autoconfiguration; bursting of some of these workflows to the NERSC HPC facility and the cloud; use of both edge and in-network caches to increase data access and processing efficiency.  An ontological model-driven framework with integration of an analytics engine, API and workflow orchestrator extending work in the SENSE project, enhanced by efficient multi- domain resource state abstractions and discovery mechanisms.  SENSE/AutoGOLE:  The GNA-G SENSE/AutoGOLE WG demonstration will present key technologies, methods and a system of dynamic Layer 2 and Layer 3 virtual circuit services to meet the challenges and address the requirements of the largest data intensive science programs, including the Large Hadron Collider (LHC) the Vera Rubin Observatory and programs challenges and programs in many other disciplines. The services are designed to support multiple petabyte transactions across a global footprint, represented by a persistent testbed spanning the US, Europe, Asia Pacific and Latin American regions.  Global Ring and KAUST: These demonstrations will also showcase the power of collaboration in the global research and education network (REN) community. The demo will have KAUST as a collaborator in the Asia-Pacific Global Ring (AER [*]), closing the global ring by interconnecting Amsterdam to Singapore, and then onto Los Angeles with support from partners including SingaREN, JGN-X and APOnet.  [*] See Fast and stable Network Ring between Asia-Pacific and Europe | AARNet https://www.aarnet.edu.au/fast-and-stable-network-ring-between-asia-pacific-and-europe/ Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (I)
  • 45. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (II)  400 Gbps Next Generation Networks  Science research collaborations continuing to prepare for increased network capacity, e.g., transitioning from 100 Gbps paths to 400 Gbps, 800 Gbps and beyond. An international collaboration led by the StarLight/ Caltech/ UCSD/ NRP/GRP/SCinet consortium, with many will demonstrate the architecture, services and techniques to enable efficient management of high volume science data streams within multi-hundred Gbps channels.  Global Research Platform (GRP):  An international collaboration will demonstrate services and capabilities of the prototype Global Research Platform, an international Science DMZ, optimized for large scale world-wide science projects, especially those based on instruments producing high volumes of data.  Software Defined Exchanges (SDXs):  Several international open exchanges that are developing capabilities for Software Defined Exchanges (SDXs) will showcase the capabilities of these facilities for providing highly programmable capabilities for international science data transport.  AmLight Express and Protect (AmLight-EXP)  In support of the SENSE/AutoGOLE demonstrations and LHC-related use cases will be shown, in association with high-throughput low latency experiments, and demonstrations of auto-recovery from network events, using optical spectrum on the Monet submarine cable, and its 100G ring network that interconnects the research and education communities in the U.S. and South America.
  • 46. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (III)  5G/Edge Computing Application Performance Optimization  UCSD, the Pacific Research Platform, Caltech, Yale and Qualcomm Technologies will attempt to demonstrate the first end-to-end integrated traffic optimization framework based on bottleneck structure analysis for 5G applications. This intended demonstration will focus on showing the integration of Qualcomm technology’s GradientGraph with the IETF ALTO open standard, to support the optimization of edge computing applications such as XR, holographic telepresence, and vehicle networks. Holodecks at UCSD and NYU will be interconnected across the CENIC and NYSERNet regional networks via a transcontinental link.  High-Performance Routing of Science Network Traffic  UCSD, the Pacific Research Platform, Caltech, Yale and Qualcomm Technologies will attempt to demonstrate the first end-to-end integrated traffic optimization framework based on bottleneck structure analysis for 5G applications. This intended demonstration will focus on showing the integration of Qualcomm technology’s GradientGraph with the IETF ALTO open standard, to support the optimization of edge computing applications such as XR, holographic telepresence, and vehicle networks. Holodecks at UCSD and NYU will be interconnected across the CENIC and NYSERNet regional networks via a transcontinental link.  Integration of OpenALTO, SRv6 and Qualcomm Technologies’ GradientGraph  We intend to demonstrate the integration of OpenALTO, SRv6 and Qualcomm Technologies’ GradientGraph, showing how applications including XR, Auto/V2X, and holographic telepresence can benefit from the intelligent routing, rate limiting, and service placement decisions computed by the GradientGraph platform. We also intend to demonstrate the integration of OpenALTO and GradientGraph with science networks, Rucio, FTS, AutoGOLE, SENSE and OpenALTO towards optimizing large-scale data transfers from the Large Hadron Collider at CERN to scientists across the globe.
  • 47. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (IV)  SENSE: The Software-defined network for End-to-end Networked Science at Exascale  The SENSE project is building smart network services to accelerate scientific discovery in the era of ‘big data’ driven by Exascale, cloud computing, machine learning and AI. The SENSE SC22 demonstration showcases a comprehensive approach to request and provision end- to- end network services across domains that combines deployment of infrastructure across multiple labs/campuses, SC booths and WAN with a focus on usability, performance and resilience through:  Priority QoS for SENSE enabled flows  Multi-point and point-to-point services  Auto-provisioning of network devices and Data Transfer Nodes (DTNs);  Intent-based, interactive, real time application interfaces providing intuitive access to intelligent SDN services for Virtual Organization (VO) services and managers;  Policy-guided end-to-end orchestration of network resources, coordinated with the science programs' systems, to enable real time orchestration of computing and storage resources.  Real time network measurement, analytics and feedback to provide the foundation for full lifecycle status, problem resolution, resilience and coordination between the SENSE intelligent network services, and the science programs' system services.
  • 48. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (V)  N-DISE: Named Data Networking for Data Intensive Science Experiments  The NDN for Data Intensive Science Experiments (N-DISE) project aims to accelerate the pace of breakthroughs and innovations in data-intensive science fields such as the Large Hadron Collider (LHC) high energy physics program and the BioGenome and human genome projects. Based on Named Data Networking (NDN), a data-centric future Internet architecture, N-DISE will deploy and commission a highly efficient and field-tested petascale data distribution, caching, access and analysis system serving major science programs. The N-DISE project will build on recently developed high-throughput NDN caching and forwarding methods, containerization techniques, leverage the integration of NDN and SDN systems concepts and algorithms with the mainstream data distribution, processing, and management systems of CMS, as well as the integration with Field Programmable Gate Arrays (FPGA) acceleration subsystems, to produce a system capable of delivering LHC and genomic data over a wide area network at throughputs approaching 100 Gbits per second, while dramatically decreasing download times. N-DISE will leverage existing infrastructure and build an enhanced testbed with high performance NDN data cache servers at participating institutions.  The N-DISE demonstration is designed to exhibit improved performance of the N-DISE system for workflow acceleration within large-scale data-intensive programs such as the LHC high energy physics, BioGenome and human genome programs. To achieve high performance, the demonstration will leverage the following key components: (1) the transparent integration of NDN with the current CMS software stack via an NDN based XRootD Open Storage System plugin, (2) joint caching and multipath forwarding capabilities of the VIP algorithm, (3) integration with FPGA acceleration subsystems, (4) SDN support for NDN through the work of the Global Network Advancement Group (GNA-G) and its AutoGOLE/SENSE and Data Intensive Sciences Working Group.
  • 49. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (VI)  High performance networking with the Bella Link and the Sao Paulo Backbone  With the recent availability of an L2 circuit linking the University of São Paulo State (UNESP) to CERN in Geneva, through the Bella Link, Brazil (and São Paulo State) now has three 100 gbps academic links to the USA and to Europe. This new circuit is supposed to present low latency, and that should open new opportunities for collaboration among Brazilian and European academic institutions. All links pass through the SP4 Equinix datacenter in the greater Sao Paulo area. At the SP4 datacenter, rednesp (Research and Education Network at Sao Paulo) has a node called SouthernLight which is part of the GNA Autogole/SENSE testbed.  At the same time, the Rednesp Sao Paulo regional network in finishing the "Backbone SP" connecting 9 important universities in Sao Paulo state. In this demo, we intend to compare properties such as effective bandwidth, latency to different continents and jitter using different paths of the 3 intercontinental links as well as the integration of the "Backbone SP" to international connections. We also expect to show results and metrics of parallel computations using some of the national and international computing facilities connected to this infrastructure.
  • 50. Global Petascale to Exascale Workflows Elements and Goals of the Demonstrations (VII)  Coral: Scalable Data Plane Checking via Distributed, On-Device Verification  Data plane verification (DPV) is important for finding network errors, and therefore a fundamental pillar for achieving consistently operating, autonomous-driving, high-performance science networks. Current DPV tools employ a centralized architecture, where a server collects the data planes of all devices and verifies them. Despite substantial efforts on accelerating DPV, this centralized architecture is inherently non-scalable. To tackle the scalability challenge of DPV, the team of Xiamen University, China, circumvents the scalability bottleneck of centralized design and design Coral, a distributed, on-device DPV framework.  The key insight of Coral is that DPV can be transformed into a counting problem on a directed acyclic graph, which can be naturally decomposed into lightweight tasks executed at network devices, enabling scalability. Coral consists of (1) a declarative requirement specification language, (2) a planner that employs a novel data structure DVNet to systematically decompose global verification into on-device counting tasks, and (3) a distributed verification (DV) protocol that specifies how on-device verifiers communicate task results efficiently to collaboratively verify the requirements. During SC'22, we will demonstrate, through both a testbed reconstruction of the Internet2 WAN environment and large-scale simulations of WAN/LAN/DC with real-world datasets, that Coral consistently achieves scalable DPV under various scenarios. A preliminary manuscript of Coral can be found at [1] and a set of small-scale demos can be found at [2].  [1] Qiao Xiang et al., Switch as a Verifier: Toward Scalable Data Plane Checking via Distributed, On- Device Verification, https://arxiv.org/abs/2205.07808  [2] Qiao Xiang et al., Coral System Functionality Demonstration, http://distributeddpvdemo.tech/
  • 51. Network Traffic Evolution 2021-22 to and from CERN LHCOPN LHCOPN Traffic CERN to Tier1s Overall CERN Traffic: LHCOPN, LHCONE, Internet Out In E. Martelli, CERN/IT
  • 52. HL-LHC Network Needs and Data Challenges Current Understanding: Q2 2021  Export of Raw Data from CERN to the Tier1s (350 Pbytes/Year):  400 Gbps Flat each for ATLAS and CMS Tier1s; +100G each for other data formats; +100 G each for ALICE, LHCb  “Minimal” Scenario [*]: Network Infrastructure from CERN to Tier1s Required  4.8 Tbps Aggregate: Includes 1.2 Tbps Flat (24 X 7 X 365) from the above, x2 to Accommodate Bursts, and x2 for overprovisioning, for operational headroom: including both non-LHC use, and other LHC use.  This includes 1.4 Tbps Across the Atlantic for ATLAS and CMS alone  Note that the above Minimal scenario is where the network is treated as a scarce resource, unlike LHC Run1 and Run2 experience in 2009-18.  In a “Flexible Scenario” [**]: 9.6 Tbps, including 2.7 Tbps Across the Atlantic Leveraging the Network to obtain more flexibility in workload scheduling, increase efficiency, improve turnaround time for production & analysis  In this scenario: Links to Larger Tier1s in the US and Europe: ~ 1 Tbps (some more); Links to Other Tier1s: ~500 Gbps  Tier2 provisioning: 400Gbps bursts, 100G Yearly Avg: ~Petabyte Import in a shift  Need to work with campuses to accommodate this: it may take years [*] NOTE: Matches numbers presented at ESnet Requirements Review (Summer 2020) [**] NOTE: Matches numbers presented at the January 2020 LHCONE/LHCOPN Meeting
  • 53. (Southern) California ((So)Cal) Cache 53 Roughly 30,000 cores across Caltech & UCSD … half typically used for analysis A 2 Pbyte Working Example in Production Plan to include Riverside and other SoCal Tier3s; ESnet plan to install additional in- network caches near US Tier2s; Move to Ceph underway Scaling to HL LHC: ~30 Pbytes Per Tier2, ~10 Pbyte Caches, 1 Pbyte Refresh in a Shift Requires 400G Link. Still relies on use of compact event forms, efficiently managed data transport
  • 54. SENSE: SDN for End-to-end Networked Science at the Exascale ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland 54  Use Cases: Inter-DTN Priority Flows, LHC File Transfer Service (FTS), SENSE/AutoGole Integration (GNA-G), ExaFEL, Big Data Express
  • 55. Internet2 AL2S ESnet AutoGOLE / SENSE Deployment StarLight NetherLight SURFNet MOXY CENIC PacWave RNP AMPATH Southern Light CERN UCSD LOSA SEAT SUNY KREONET/ KRLight KISTI AMPATH RNP MANLAN WIX Site/End Systems Exchange Point Network ESnet Testbed STAR
  • 56. SENSE: SDN for End-to-end Networked Science at the Exascale ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland 56
  • 57. SENSE: SDN for End-to-end Networked Science at the Exascale ESnet Caltech Fermilab LBNL/NERSC Argonne Maryland 57
  • 58. Self-driving Networks| BERKELEY LAB Self-Driving Networks “Networks should learn to drive themselves”* 58 [*] Why (and How) Networks Should Run Themselves, Feamster and Rexford https://doi.org/10.48550/arXiv.1710.11583 Example controllers: Ryu, OpenDaylight, OSCARS Listener APIs Supervised/Unsupervised classification: anomaly detection Regression: forecasting Action plan Network Telemetry Execute Control Reward function Inputs to AI agent Try different optimization strategies and/or objective functions Can take simple actions to achieve key goals. Such as: improving availability, attack resilience and dealing with scale. Our argument is AI is needed for mission critical actions.
  • 59. M. Kiran, C, Guok Talk at APAN2021 Intelligent Controller (HECATE) Case Studies: 1. Model free: Path selection for large data transfers: better load balancing 2. Model Free: Forwarding decisions for complex network topologies: Deep RL to learn optimal packet delivery policies vs. network load level 3. Model Based: Predicting network patterns with Netpredict Self Driving Network Adaptive Routing (e.g. Real time data for routing decisions) Learns to Avoid Congestion Congestion Free  Loss Free Towards 100% Utilization Proactive Fault Repair Mariam Kiran (ESnet) et al. Intelligent Networks DOE Project Use Deep Reinforcement Learning to Optimize network traffic engineering
  • 60. Self-Driving Network for Science 60 Model-based learning Model-free learning Predicting Future Congestion Study traffic patterns action observations e.g.Network telemetry or monitoring reward e.g. flow completion time e.g. Reconfigure network e.g. update flow rules Using Deep Reinforcement Learning for optimizing traffic engineering over networks Kiran et al. Intelligent Networks DOE Project  D-DCRNN: Spatiotemporal forecasting with applications in neuroscience, climate, traffic flow, smart grid, logistics, supply chain: https://arxiv.org/pdf/1707.01926.pdf  Future: Hooks for a Richer, More Stateful Objective Function (Policy, Priority, Deadlines, Path Quality, Co-Flows...); Event Response; Applications to 5G and Quantum Networks Mariam Kiran (ESnet) mkiran@lbl.gov et al
  • 61. Self-driving Networks| BERKELEY LAB HECATE Simulation 61 UR: link utilization rate Moves incoming traffic to less used paths
  • 62. Science Networks are Expanding to Wireless 5G, 6G Sensor Networks and Beyond 62  Statically deployed, unattended network sensors need full-time monitoring  While sensors in mobile networks (including unmanned aerial systems) can self-manage and optimize data collection  For continual monitoring of the sites, data is processed in real-time and transmitted to a central location with 5G speed for further processing  Novel mesh network topologies for flexible inclusion of new sources - much like new cells in the body - are being developed  Can use 5G latency to handle the tradeoff between edge and cloud processing
  • 63. SENSE: SDN Enabled Networks for Science at the Exascale ESnet, Caltech, Fermilab, LBNL https://arxiv.org/abs/2004.05953 Creates Virtual Circuit Overlays. Site and Network RMs, Orchestrator
  • 64. SC20 to 22+: AutoGOLE/SENSE Persistent Testbed: ESnet, SURFnet, Internet2, StarLight, CENIC, Pacific Wave, AmLight, RNP, USP, UFES, GridUNESP, Hawaii, Guam, KISTI, Tokyo,Caltech, UCSD, PRP, FIU, CERN, Fermilab, UMd, DE-KIT Federation with the StarLight GEANT/RARE and AmLight P4 Testbeds Underway 2022-23 Outlook: ESnet6/High Touch KAUST FABRIC BRIDGES US CMS Tier2s GridUNESP, UERJ SKAO AarNet TIFR et al APONet Courtesy T. Lehman Caltech/ UCSD/ LA 100G to 4 X 100 to 400G with CENIC Enhanced for SC22 and Beyond 400G Link(s) NetherLight- CERN Automation Following SENSE, NRP, ALTO, Atlantic Wave SDX Persistent Operations: Beginning this Year NRP
  • 65. ●Goal is to more flexibly control how these are utilized on a per flow, group, or use basis ●Do not want to manage "every" flow in the network, however should be able to manage "any" flow in the network ●There are multiple transatlantic and transpacific links, operated by multiple organizations transatlantic links Important Link Management ● Two timescales: SENSE overlay network of virtual circuits with BW guarantees is relatively stable; IPv6 subnets and Directors provide more dynamic flow handling ● An equally important goal is to understand the load and leave room for other traffic ● Compatible with other network operations ● Result is to make better use of the available resources overall
  • 66. SENSE Services for LHC and Other Open Science Grid (OSG) Workflows: Rucio, FTS, XRootD Integration  Goals center around SENSE providing key network-related capabilities to LHC + 30 other science programs  Primary Motivations:  Developing mechanisms for an application workflow to obtain information regarding the network services, capabilities, and options; to a degree similar to what is possible for compute resources  Giving applications the ability to interact with the network: to exchange information, negotiate performance parameters, discover expected performance metrics, and receive status/ troubleshooting information in real time, during long transactions.  Key pathways: (1) Interfacing and interaction with RUCIO, FTS and XRootD data federation and storage systems, through Engagement: with CMS, ATLAS and those development teams (2) Enabled by the ESnet6 plan: with up to ~75% of link capacity on key routes available for policy-driven dynamic provisioning and management by ~2027-28  4 Phase 1 Year Schedule, Completing in Q1 2023: from system evaluation to design, to prototype deployments and tests, to a plan for the transition to operations and the additional R&D needed 66
  • 67. European Science Data Center OSG Data Federation Vera Rubin Observatory Interfacing to Multiple VOs With FTS/Rucio/XRootD LHC, Dark Matter, n, Heavy Ions, VRO, SKAO, LIGO/Virgo/Kagra; Bioinformatics
  • 68. Beyond Programmability Alone: A Systems Approach Qualcomm Technologies GradientGraph Analytics 68  System Wide information to identify, deal with root causes Key Components • Bottleneck precedence + flow gradient graphs • Impactful flow and flow group ID • Alternate path recommendations www.reservoir.com/gradientgraph/  Objective: Flow performance optimization in high speed networks, with fairness  Approach: Built on a new mathematical Theory of Bottleneck Structures and an analytical framework  Enables operators to understand and precisely control flow(-group) and bottleneck performance  Value: Improved capacity planning, traffic engineering  Greater, more effective network throughput and stability as a function of capacity and cost  "On the Bottleneck Structure of Congestion-Controlled Networks," ACM SIGMETRICS, Boston, June 2020 [https://bit.ly/3eGOPrb]  "Designing Data Center Networks Using Bottleneck Structures," accepted for publication at ACM SIGCOMM 2021 [https://bit.ly/2UZCb1M]  "Computing Bottleneck Structures at Scale for High-Precision Network Performance Analysis," SC 2020 INDIS, November 2020 [https://bit.ly/3BriwaB]  "A Quantitative Theory of Bottleneck Structures for Data Networks", Technical Report, available upon request [https://bit.ly/38u8ARs]
  • 69. Global P4 Lab and Production Grade Programmable Switches Marcos Schwarz SC22 November 14-17 2022, Dallas - TX
  • 70. Motivation Can we increase the rate of evolution without interfering with production networks? To develop and operate end-to-end / multi-domain orchestration services: • Resource reservation (guaranteed bandwidth) • Resource provisioning (Circuits, VRFs) • Underlay observability • Dynamic traffic steering/engineering • Dynamic creation of L3 VPNs • Closed loop multi-domain visibility/intelligence/controllability How can we create/sustain an integration initiative/platform to propose and validate next generation protocol and services? • Proposition: Use programmable P4 devices to experiment on pre-production networks leveraging industry/R&E open ecosystems
  • 71. Global P4 Lab Goals Current (SC22) • Persistent global L3 overlay network based on P4 switches • Intercontinental high capacity transfers (100G and over) exploring multiple source routing solutions (Segment Routing and PolKA) • Management infrastructure and tools used to operate this global network • Core network based on RARE/FreeRtr and edge networks based also on SONiC Future (2023) • Capability to support multiple virtual networks, that implement on the same devices different choices of routing stacks, traditional and SDN based • Integration with initiatives for visibility, controllability and intelligence • Work as a reference state of the art / next generation R&E network
  • 72. Integration Path with other Initiatives