SlideShare ist ein Scribd-Unternehmen logo
1 von 39
OPENFABRICS INTERFACES LIBFABRIC
Sean Hefty, OFIWG Co-Chair
January, 2019
Intel Corporation
libfabric.org
AGENDA – MOVING FORWARD WITH LIBFABRIC
2
• Charter
Software Strategy
• Object Model
Architecture
• Utility Providers
Provider Support
• Samples
Getting Started
• Calling fi_getinfo()
API Bootstrap
Design motivations,
architecture, usage guidance
Developer
Guide
libfabric.org
DESIGN GUIDELINES
3
Open
Fabrics
Interface
Application
Requirements
Deployment
Requirements
Fabric Vendor
Requirements
Charter: develop interfaces
aligned with user needs
• Protect software investment
• Enable new features without
application changes
• Find the ‘right’ abstraction
• Ensure high-performance
• Analyze impact of changes
under a variety of assumptions
• Build in extensibility
OFI = a portable low-level
network interface
Optimized SW path to HW
•Minimize cache and memory footprint
•Reduce instruction count
•Minimize memory accesses
Scalable
Implementation
Agnostic
Software interfaces aligned with
user requirements
•Careful requirement analysis
Inclusive development effort
•App and HW developers
Good impedance match with
multiple fabric hardware
•InfiniBand, iWarp, RoCE, raw Ethernet,
UDP offload, Omni-Path, GNI, BGQ, …
Open Source User-Centric
libfabric
User-centric interfaces will help foster fabric
innovation and accelerate their adoption
4
MPI CASE STUDY
 MPICH MPI implementation
 CH3 – internal portable network
API
• Lack of clear network abstractions led to poor
performance related choices
 CH4 – revised internal layering
• Driven based on work done for OFI
• 6x+ reduction in software overhead!
5
Moral: in the absence of a good abstraction, each application
will invent their own, and may result in lower performance
Image source: K. Raffenetti et al. Why is MPI So Slow? Analyzing the
Fundamental Limits in Implementing MPI-3.1. SC ’17
MPI_Put: 1342->44
MPI_Isend: 253->59
OFI USER REQUIREMENTS
6
Give us a high-
level interface!
Give us a low-
level interface!
MPI developers
OFI strives to meet
both requirements
Middleware is primary user
Looking to expand
beyond HPC
GURUEASY
OFI SOFTWARE DEVELOPMENT STRATEGIES
One Size Does Not Fit All
7
Fabric Services
User
OFI
Provider
User
OFI
Provider
Provider optimizes for
OFI features
Common optimization
for all apps/providers
Client uses OFI features
User
OFI
Provider
Client optimizes based
on supported features
Provider supports low-level features only
Linux, FreeBSD,
OS X, Windows
TCP and UDP
development support
GURUEASY
libfabric
OFI LIBFABRIC COMMUNITY
8
Because of an OFI-provider gap, not
all apps work with all providers
Intel® MPI
Library
MPICH
Open MPI
SHMEM
Sandia
SHMEM
GASNet
Clang
UPC
libfabric Enabled Middleware
Sockets
TCP, UDP
Verbs
Cisco
usNIC
Intel
OPA, PSM
Cray
GNI
AWS EFA
IBM Blue
Gene
Shared
Memory
* Other names and brands may be claimed as the property of others
RxM, RxD, Multi-
Rail, Hooks, …
PMDK, Spark, ZeroMQ, TensorFlow,
MxNET, NetIO, Intel MLSL, rsockets …
Charm++ Chapel
HPE
Gen-Z
Advanced application oriented semantics
Tag
Matching
Scalable memory
registration
Triggered
Operations
Multi-Receive
buffers
Reliable Datagram
Endpoints
Remote Completion
Semantics
Streaming
Endpoints
Shared
Addressing
Unexpected
Message Buffering
Network
Direct
OFI insulates applications
from fabric diversity
Providers
Persistent
Memory
SmartNICs and
FPGAs Acceleration
ARCHITECTURE
9
ARCHITECTURE
10
Modes
Capabilities
OBJECT-MODEL
11
OFI only defines
semantic requirements
NIC
Network
Peer address
table
Listener
Command
queues
RDMA
buffers
Manage multiple
CQs and counters
Share wait
objects (fd’s)
example mappings
ENDPOINT TYPES
12
Unconnected
Connected
ENDPOINT CONTEXTS
13
Default
Scalable
Endpoints
Shared
Contexts
Tx/Rx completions
may go to the same
or different CQs
Tx/Rx command
‘queues’
Share underlying
command queues
Targets multi-thread
access to hardware
ADDRESS VECTORS
14
Address Vector (Table)
fi_addr Fabric Address
0 100:3:50
1 100:3:51
2 101:3:83
3 102:3:64
… …
Address Vector (Map)
fi_addr
100003050
100003051
101003083
102003064
…
Addresses are referenced by an index
- No application storage required
- O(n) memory in provider
- Lookup required on transfers
OFI returns 64-bit value for address
- n x 8 bytes application memory
- No provider storage required
- Direct addressing possible
Converts portable addressing
(e.g. hostname or sockaddr)
to fabric specific address
Possible to share AV
between processes
User Address
IP:Port
10.0.0.1:7000
10.0.0.1:7001
10.0.0.2:7000
10.0.0.3:7003
…
example mappings
DATA TRANSFER TYPES
15
msg 2
msg 1msg 3
msg 2
msg 1msg 3
tag 2
tag 1tag 3
tag 2
tag 1 tag 3
TAGGED
MSG
Maintain message
boundaries, FIFO
Messages carry
user ‘tag’ or id
Receiver selects which
tag goes with each buffer
DATA TRANSFER TYPES
16
data 2
data 1data 3
data 1
data 2
data 3
MSG Stream
dgram
dgram
dgram
dgram
dgram
Multicast MSG
Send to or receive
from multicast group
Data sent and received as
‘stream’
(no message boundaries)
Uses ‘MSG’ APIs but different
endpoint capabilities
Synchronous completion
semantics (application
always owns buffer)
17
DATA TRANSFER TYPES
write 2
write 1write 3 write 1
write 3
write 2
RMA
RDMA semantics
Direct reads or writes of remote
memory from user perspective
Specify operation to perform
on selected datatype
f(x,y)
ATOMIC
f(x,y)
f(x,y)
f(x,y)
f()
y
g()
y
x
g(x,y)
g(x,y)
g(x,y)
x
x
Format of data at target is
known to fabric services
PROVIDER ARCHITECTURE
18
RXM PROVIDER
MPI / SHMEM
RxM RDM
MSG MSG MSG MSG
MSG
RDM
MSG
RDM
MSG
RDM
MSG
RDM
Verbs RC
NetworkDirect
TCP
OFI
OFI
High-priority
Broadly used
Primary path for HPC apps
accessing verbs hardware
Optimizes for
hardware features
• CQ data
• Inline data
• SRQ
 Dynamic connections
 Eager messages – small transfers
 Segmentation and reassembly –
medium transfers
 Rendezvous – large transfers
 Memory registration cache
 Ideal: tighter provider coupling
Connection
multiplexing
19
MPI / SHMEM
RxD RDM
DGRAM DGRAM
DGRAM
RDM
Verbs UD
usNIC
Raw Ethernet
OFI
OFI
Functional
Developing / optimizing
Path for HPC
scalability
 Re-designing for performance
and scalability
 Analyzing provider specific
optimizationsReliability, segmentation,
reassembly, flow control
UDP
Other..?
Fast development
path for hardware
support
DGRAM
RDM
DGRAM
RDM
DGRAM
RDM
Extend features
of simple RDM
provider
Offload large transfers
20
RXD PROVIDER
Version
Flags
PID
Region Size
Lock
Command Queue
Response Queue
Peer Address Map
Inject Buffers
SHM Provider
SMR
Shared
Memory Region
SMR SMR
Shared memory
primitives
One-sided and
two-sided transfers
CMA (cross-memory attach)
for large transfers
xpmem support
under consideration
Single command
queue
SHARED MEMORY PROVIDER
21
Available
HOOKING PROVIDER
Hook
Zero-impact
unless enabled
User
OFI
Core/Util Provider
OFI Core
Always available –
release and debug builds
Intercept calls
to any provider
Debugging, performance analysis,
feature enhancements, testing
22
PERFORMANCE MONITORING
Performance Data Set
Performance
Management Unit
CPU
Cache
NIC
Event Data
Count
Sum
Event Data
Count
Sum
Event Data
Count
Sum
Cycles
Instructions
Hits
Misses
Performance
‘domains’
?
Inline performance
tracking
Linux RDPMCEx: Sample CPU instructions
for various code paths
23
MULTI-RAIL PROVIDER
User
mRail
EP
EP 1 EP 2
EP 1
RDM
OFI
OFI
Active
EP 2
Under
development
Standby
Application or
admin configured
Increase bandwidth
and message rate
Failover
Rail selection
‘plug-in’
Require variable
message support
EP 1
RDM
EP 2
Multiple EPs,
ports, NICs, fabrics
Isolate rail
selection algorithm
One fi_info
structure per rail
TBD: recovery
fallback
24
ACCELERATORS AND HETEROGENEOUS MEMORY
CPU Memory PMEM
(Smart) NICPeer Device
(GPU)
FPGA
Device
Memory
Device
Memory
APIs assume memory
mapped regions
May not want to
write data through
CPU caches
Memory regions
may not be mapped
Results may be cached by
NIC for long transactions
CPU load/stores
Same coherency
domain
Programmable
offload capabilities
and flow processing
May need to sync
results with CPU
25
Accelerations may be
available over the fabric
Analyzing
GETTING STARTED
26
DOWNLOAD LIBFABRIC AND FABTESTS
27
libfabric.org – redirects to github
Link to all releases
Link to validation tests (fabtests)
Man pages (scroll down)
INSTALLATION
28
$ tar xvf libfabric-1.7.0.tar.bz2
$ cd libfabric-1.7.0
$ ./configure
$ make
$ sudo make install
Follows common conventions
Github only has tar files
RPMs from distro or OFED
• By default will auto-detect which providers are
usable
• Can enable/disable providers
• e.g. --enable-verbs=yes
• --enable-debug
• Development build
• --help
FI_INFO
29
$ fi_info
$ fi_info –l (--list)
$ fi_info –v (--verbose)
$ fi_info --env
libfabric utility program
Condensed view of usable providers
List all installed providers
Detailed attributes of usable providers
List all environment variables
Description and effect, default values
FI_LOG_LEVEL=[debug | info | warn | …]
FI_PROVIDER – enable/disable providers
FI_PROVIDER=[tcp,udp,…]
FI_PROVIDER=^[udp,…]
FI_HOOK=[perf]
API BOOTSTRAP
30
FI_GETINFO
struct fi_info *fi_allocinfo(void);
int fi_getinfo(
uint32_t version,
const char *node,
const char *service,
uint64_t flags,
struct fi_info *hints,
struct fi_info **info);
void fi_freeinfo(
struct fi_info *info);
31
struct fi_info {
struct fi_info *next;
uint64_t caps;
uint64_t mode;
uint32_t addr_format;
size_t src_addrlen;
size_t dest_addrlen;
void *src_addr;
void *dest_addr;
fid_t handle;
struct fi_tx_attr *tx_attr;
struct fi_rx_attr *rx_attr;
struct fi_ep_attr *ep_attr;
struct fi_domain_attr*domain_attr;
struct fi_fabric_attr*fabric_attr;
};
API version
~getaddrinfo
app needs
API semantics needed, and provider
requirements for using them
Detailed object attributes
CAPABILITY AND MODE BITS
32
• Desired services requested by app
• Primary – app must request to use
• E.g. FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC
• Secondary – provider can indicate
availability
• E.g. FI_SOURCE, FI_MULTI_RECV
Capabilities
• Requirements placed on the app
• Improves performance when implemented by
application
• App indicates which modes it supports
• Provider clears modes not needed
• Sample:
FI_CONTEXT, FI_LOCAL_MR
Modes
ATTRIBUTES
struct fi_fabric_attr {
struct fid_fabric *fabric;
char *name;
char *prov_name;
uint32_t prov_version;
uint32_t api_version;
};
33
struct fi_domain_attr {
struct fid_domain *domain;
char *name;
enum fi_threading threading;
enum fi_progress control_progress;
enum fi_progress data_progress;
enum fi_resource_mgmt resource_mgmt;
enum fi_av_type av_type;
int mr_mode;
/* provider limits – fields omitted */
...
uint64_t caps;
uint64_t mode;
uint8_t *auth_key;
...
};
Provider details
Can also use env var to filter
Already opened resource
(if available)
How resources are
allocated among threads
for lockless access
Provider protects
against queue overruns
Do app threads
drive transfers
Secure communication
(job key)
ATTRIBUTES
struct fi_ep_attr {
enum fi_ep_type type;
uint32_t protocol;
uint32_t protocol_version;
size_t max_msg_size;
size_t msg_prefix_size;
size_t max_order_raw_size;
size_t max_order_war_size;
size_t max_order_waw_size;
uint64_t mem_tag_format;
size_t tx_ctx_cnt;
size_t rx_ctx_cnt;
size_t auth_key_size;
uint8_t *auth_key;
};
34
Indicates interoperability
Order of data placement
between two messages
Default, shared, or scalable
ATTRIBUTES
struct fi_tx_attr {
uint64_t caps;
uint64_t mode;
uint64_t op_flags;
uint64_t msg_order;
uint64_t comp_order;
size_t inject_size;
size_t size;
size_t iov_limit;
size_t rma_iov_limit;
};
35
struct fi_rx_attr {
uint64_t caps;
uint64_t mode;
uint64_t op_flags;
uint64_t msg_order;
uint64_t comp_order;
size_t total_buffered_recv;
size_t size;
size_t iov_limit;
};
Are completions
reported in order
“Fast”
message size
Can messages be sent
and received out of order
THANK YOU
Sean Hefty
OFIWG Co-Chair
• All things libfabric:
• www.libfabric.org
• Meetings:
• https://www.openfabrics.org/my-calendar/
• https://www.openfabrics.org/ofiwg-webex/
• Mailing list
• https://lists.openfabrics.org/mailman/listinfo/ofiwg
LEGAL DISCLAIMER & OPTIMIZATION NOTICE
37
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for
use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of
that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
 INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,
TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER
AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY
PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
 Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are
trademarks of Intel Corporation in the U.S. and other countries.
MEMORY MONITOR AND REGISTRATION CACHE
Notification
Queue
Notification
Queue
Notification
Queue
Memory
Monitor Core
Monitor
‘Plug-in’
MR Map
MR MR MR
Registration Cache
LRU List
Custom Limits
Usage Stats
Provider
Merges overlapping
regions
events
subscribe
Driver notification, hook
alloc/free, provider specific
Tracks
active usage
Internal
API
Get/put MRs
Callbacks to
add/delete MRs
A generic solution
is desired here
38
PERSISTENT MEMORY
User
Commit complete
RMA
Write
Persistent
Memory
User
Register
PMEM
PMEM MR
 Keep implementation agnostic
• Handle offload and on-load models
• Support multi-rail
• Minimize state footprint
High-availability
model (v1.6)
Evolve APIs to support
other usage models
Documentation
limits use case
New completion
semantic
 Exploration
• Byte addressable or object aware
• Single or multi-transfer commit
• Advanced operations (e.g. atomics)
Work with SNIA (Storage
Networking Industry Association)
39

Weitere ähnliche Inhalte

Was ist angesagt?

Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic ControlSUSE Labs Taipei
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux NetworkingPLUMgrid
 
HOW AND WHY GRAALVM IS QUICKLY BECOMING RELEVANT FOR YOU
HOW AND WHY GRAALVM IS QUICKLY BECOMING RELEVANT FOR YOUHOW AND WHY GRAALVM IS QUICKLY BECOMING RELEVANT FOR YOU
HOW AND WHY GRAALVM IS QUICKLY BECOMING RELEVANT FOR YOULucas Jellema
 
Building Repeatable Infrastructure using Terraform
Building Repeatable Infrastructure using TerraformBuilding Repeatable Infrastructure using Terraform
Building Repeatable Infrastructure using TerraformJeeva Chelladhurai
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Ray Jenkins
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerShu Sugimoto
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
 
Open Source MANO(OSM)
Open Source MANO(OSM)Open Source MANO(OSM)
Open Source MANO(OSM)Eggy Cheng
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on LabMichelle Holley
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabTaeung Song
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
Universal flash storage
Universal flash storageUniversal flash storage
Universal flash storageDooyong Lee
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPFRogerColl2
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Weaveworks
 

Was ist angesagt? (20)

Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic Control
 
EBPF and Linux Networking
EBPF and Linux NetworkingEBPF and Linux Networking
EBPF and Linux Networking
 
HOW AND WHY GRAALVM IS QUICKLY BECOMING RELEVANT FOR YOU
HOW AND WHY GRAALVM IS QUICKLY BECOMING RELEVANT FOR YOUHOW AND WHY GRAALVM IS QUICKLY BECOMING RELEVANT FOR YOU
HOW AND WHY GRAALVM IS QUICKLY BECOMING RELEVANT FOR YOU
 
Building Repeatable Infrastructure using Terraform
Building Repeatable Infrastructure using TerraformBuilding Repeatable Infrastructure using Terraform
Building Repeatable Infrastructure using Terraform
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Open Source MANO(OSM)
Open Source MANO(OSM)Open Source MANO(OSM)
Open Source MANO(OSM)
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on Lab
 
Open shift 4-update
Open shift 4-updateOpen shift 4-update
Open shift 4-update
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
BPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLabBPF / XDP 8월 세미나 KossLab
BPF / XDP 8월 세미나 KossLab
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
Universal flash storage
Universal flash storageUniversal flash storage
Universal flash storage
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)
 

Ähnlich wie OFI Overview 2019 Webinar

Intel the-latest-on-ofi
Intel the-latest-on-ofiIntel the-latest-on-ofi
Intel the-latest-on-ofiTracy Johnson
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVGhodhbane Mohamed Amine
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systemsdairsie
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}Md. Sadhan Sarker
 
Advancing OpenFabrics Interfaces
Advancing OpenFabrics InterfacesAdvancing OpenFabrics Interfaces
Advancing OpenFabrics Interfacesinside-BigData.com
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
OpenFabrics Interfaces introduction
OpenFabrics Interfaces introductionOpenFabrics Interfaces introduction
OpenFabrics Interfaces introductionofiwg
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...Filipe Miranda
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Aheadinside-BigData.com
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System zShawn Wells
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin Kuberton
 
G rpc talk with intel (3)
G rpc talk with intel (3)G rpc talk with intel (3)
G rpc talk with intel (3)Intel
 
Building Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsBuilding Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsSrdjan Strbanovic
 

Ähnlich wie OFI Overview 2019 Webinar (20)

Intel the-latest-on-ofi
Intel the-latest-on-ofiIntel the-latest-on-ofi
Intel the-latest-on-ofi
 
Intel the-latest-on-ofi
Intel the-latest-on-ofiIntel the-latest-on-ofi
Intel the-latest-on-ofi
 
Learn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFVLearn more about the tremendous value Open Data Plane brings to NFV
Learn more about the tremendous value Open Data Plane brings to NFV
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
Up and Running with gRPC & Cloud Career [GDG-Cloud-Dhaka-IO/2022}
 
Advancing OpenFabrics Interfaces
Advancing OpenFabrics InterfacesAdvancing OpenFabrics Interfaces
Advancing OpenFabrics Interfaces
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
OpenFabrics Interfaces introduction
OpenFabrics Interfaces introductionOpenFabrics Interfaces introduction
OpenFabrics Interfaces introduction
 
509 512
509 512509 512
509 512
 
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
New Generation of IBM Power Systems Delivering value with Red Hat Enterprise ...
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
 
What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin  Monitoring&Logging - Stanislav Kolenkin
Monitoring&Logging - Stanislav Kolenkin
 
G rpc talk with intel (3)
G rpc talk with intel (3)G rpc talk with intel (3)
G rpc talk with intel (3)
 
Building Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJsBuilding Killer RESTful APIs with NodeJs
Building Killer RESTful APIs with NodeJs
 

Kürzlich hochgeladen

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 

Kürzlich hochgeladen (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 

OFI Overview 2019 Webinar

  • 1. OPENFABRICS INTERFACES LIBFABRIC Sean Hefty, OFIWG Co-Chair January, 2019 Intel Corporation libfabric.org
  • 2. AGENDA – MOVING FORWARD WITH LIBFABRIC 2 • Charter Software Strategy • Object Model Architecture • Utility Providers Provider Support • Samples Getting Started • Calling fi_getinfo() API Bootstrap Design motivations, architecture, usage guidance Developer Guide libfabric.org
  • 3. DESIGN GUIDELINES 3 Open Fabrics Interface Application Requirements Deployment Requirements Fabric Vendor Requirements Charter: develop interfaces aligned with user needs • Protect software investment • Enable new features without application changes • Find the ‘right’ abstraction • Ensure high-performance • Analyze impact of changes under a variety of assumptions • Build in extensibility OFI = a portable low-level network interface
  • 4. Optimized SW path to HW •Minimize cache and memory footprint •Reduce instruction count •Minimize memory accesses Scalable Implementation Agnostic Software interfaces aligned with user requirements •Careful requirement analysis Inclusive development effort •App and HW developers Good impedance match with multiple fabric hardware •InfiniBand, iWarp, RoCE, raw Ethernet, UDP offload, Omni-Path, GNI, BGQ, … Open Source User-Centric libfabric User-centric interfaces will help foster fabric innovation and accelerate their adoption 4
  • 5. MPI CASE STUDY  MPICH MPI implementation  CH3 – internal portable network API • Lack of clear network abstractions led to poor performance related choices  CH4 – revised internal layering • Driven based on work done for OFI • 6x+ reduction in software overhead! 5 Moral: in the absence of a good abstraction, each application will invent their own, and may result in lower performance Image source: K. Raffenetti et al. Why is MPI So Slow? Analyzing the Fundamental Limits in Implementing MPI-3.1. SC ’17 MPI_Put: 1342->44 MPI_Isend: 253->59
  • 6. OFI USER REQUIREMENTS 6 Give us a high- level interface! Give us a low- level interface! MPI developers OFI strives to meet both requirements Middleware is primary user Looking to expand beyond HPC GURUEASY
  • 7. OFI SOFTWARE DEVELOPMENT STRATEGIES One Size Does Not Fit All 7 Fabric Services User OFI Provider User OFI Provider Provider optimizes for OFI features Common optimization for all apps/providers Client uses OFI features User OFI Provider Client optimizes based on supported features Provider supports low-level features only Linux, FreeBSD, OS X, Windows TCP and UDP development support GURUEASY
  • 8. libfabric OFI LIBFABRIC COMMUNITY 8 Because of an OFI-provider gap, not all apps work with all providers Intel® MPI Library MPICH Open MPI SHMEM Sandia SHMEM GASNet Clang UPC libfabric Enabled Middleware Sockets TCP, UDP Verbs Cisco usNIC Intel OPA, PSM Cray GNI AWS EFA IBM Blue Gene Shared Memory * Other names and brands may be claimed as the property of others RxM, RxD, Multi- Rail, Hooks, … PMDK, Spark, ZeroMQ, TensorFlow, MxNET, NetIO, Intel MLSL, rsockets … Charm++ Chapel HPE Gen-Z Advanced application oriented semantics Tag Matching Scalable memory registration Triggered Operations Multi-Receive buffers Reliable Datagram Endpoints Remote Completion Semantics Streaming Endpoints Shared Addressing Unexpected Message Buffering Network Direct OFI insulates applications from fabric diversity Providers Persistent Memory SmartNICs and FPGAs Acceleration
  • 11. OBJECT-MODEL 11 OFI only defines semantic requirements NIC Network Peer address table Listener Command queues RDMA buffers Manage multiple CQs and counters Share wait objects (fd’s) example mappings
  • 13. ENDPOINT CONTEXTS 13 Default Scalable Endpoints Shared Contexts Tx/Rx completions may go to the same or different CQs Tx/Rx command ‘queues’ Share underlying command queues Targets multi-thread access to hardware
  • 14. ADDRESS VECTORS 14 Address Vector (Table) fi_addr Fabric Address 0 100:3:50 1 100:3:51 2 101:3:83 3 102:3:64 … … Address Vector (Map) fi_addr 100003050 100003051 101003083 102003064 … Addresses are referenced by an index - No application storage required - O(n) memory in provider - Lookup required on transfers OFI returns 64-bit value for address - n x 8 bytes application memory - No provider storage required - Direct addressing possible Converts portable addressing (e.g. hostname or sockaddr) to fabric specific address Possible to share AV between processes User Address IP:Port 10.0.0.1:7000 10.0.0.1:7001 10.0.0.2:7000 10.0.0.3:7003 … example mappings
  • 15. DATA TRANSFER TYPES 15 msg 2 msg 1msg 3 msg 2 msg 1msg 3 tag 2 tag 1tag 3 tag 2 tag 1 tag 3 TAGGED MSG Maintain message boundaries, FIFO Messages carry user ‘tag’ or id Receiver selects which tag goes with each buffer
  • 16. DATA TRANSFER TYPES 16 data 2 data 1data 3 data 1 data 2 data 3 MSG Stream dgram dgram dgram dgram dgram Multicast MSG Send to or receive from multicast group Data sent and received as ‘stream’ (no message boundaries) Uses ‘MSG’ APIs but different endpoint capabilities Synchronous completion semantics (application always owns buffer)
  • 17. 17 DATA TRANSFER TYPES write 2 write 1write 3 write 1 write 3 write 2 RMA RDMA semantics Direct reads or writes of remote memory from user perspective Specify operation to perform on selected datatype f(x,y) ATOMIC f(x,y) f(x,y) f(x,y) f() y g() y x g(x,y) g(x,y) g(x,y) x x Format of data at target is known to fabric services
  • 19. RXM PROVIDER MPI / SHMEM RxM RDM MSG MSG MSG MSG MSG RDM MSG RDM MSG RDM MSG RDM Verbs RC NetworkDirect TCP OFI OFI High-priority Broadly used Primary path for HPC apps accessing verbs hardware Optimizes for hardware features • CQ data • Inline data • SRQ  Dynamic connections  Eager messages – small transfers  Segmentation and reassembly – medium transfers  Rendezvous – large transfers  Memory registration cache  Ideal: tighter provider coupling Connection multiplexing 19
  • 20. MPI / SHMEM RxD RDM DGRAM DGRAM DGRAM RDM Verbs UD usNIC Raw Ethernet OFI OFI Functional Developing / optimizing Path for HPC scalability  Re-designing for performance and scalability  Analyzing provider specific optimizationsReliability, segmentation, reassembly, flow control UDP Other..? Fast development path for hardware support DGRAM RDM DGRAM RDM DGRAM RDM Extend features of simple RDM provider Offload large transfers 20 RXD PROVIDER
  • 21. Version Flags PID Region Size Lock Command Queue Response Queue Peer Address Map Inject Buffers SHM Provider SMR Shared Memory Region SMR SMR Shared memory primitives One-sided and two-sided transfers CMA (cross-memory attach) for large transfers xpmem support under consideration Single command queue SHARED MEMORY PROVIDER 21 Available
  • 22. HOOKING PROVIDER Hook Zero-impact unless enabled User OFI Core/Util Provider OFI Core Always available – release and debug builds Intercept calls to any provider Debugging, performance analysis, feature enhancements, testing 22
  • 23. PERFORMANCE MONITORING Performance Data Set Performance Management Unit CPU Cache NIC Event Data Count Sum Event Data Count Sum Event Data Count Sum Cycles Instructions Hits Misses Performance ‘domains’ ? Inline performance tracking Linux RDPMCEx: Sample CPU instructions for various code paths 23
  • 24. MULTI-RAIL PROVIDER User mRail EP EP 1 EP 2 EP 1 RDM OFI OFI Active EP 2 Under development Standby Application or admin configured Increase bandwidth and message rate Failover Rail selection ‘plug-in’ Require variable message support EP 1 RDM EP 2 Multiple EPs, ports, NICs, fabrics Isolate rail selection algorithm One fi_info structure per rail TBD: recovery fallback 24
  • 25. ACCELERATORS AND HETEROGENEOUS MEMORY CPU Memory PMEM (Smart) NICPeer Device (GPU) FPGA Device Memory Device Memory APIs assume memory mapped regions May not want to write data through CPU caches Memory regions may not be mapped Results may be cached by NIC for long transactions CPU load/stores Same coherency domain Programmable offload capabilities and flow processing May need to sync results with CPU 25 Accelerations may be available over the fabric Analyzing
  • 27. DOWNLOAD LIBFABRIC AND FABTESTS 27 libfabric.org – redirects to github Link to all releases Link to validation tests (fabtests) Man pages (scroll down)
  • 28. INSTALLATION 28 $ tar xvf libfabric-1.7.0.tar.bz2 $ cd libfabric-1.7.0 $ ./configure $ make $ sudo make install Follows common conventions Github only has tar files RPMs from distro or OFED • By default will auto-detect which providers are usable • Can enable/disable providers • e.g. --enable-verbs=yes • --enable-debug • Development build • --help
  • 29. FI_INFO 29 $ fi_info $ fi_info –l (--list) $ fi_info –v (--verbose) $ fi_info --env libfabric utility program Condensed view of usable providers List all installed providers Detailed attributes of usable providers List all environment variables Description and effect, default values FI_LOG_LEVEL=[debug | info | warn | …] FI_PROVIDER – enable/disable providers FI_PROVIDER=[tcp,udp,…] FI_PROVIDER=^[udp,…] FI_HOOK=[perf]
  • 31. FI_GETINFO struct fi_info *fi_allocinfo(void); int fi_getinfo( uint32_t version, const char *node, const char *service, uint64_t flags, struct fi_info *hints, struct fi_info **info); void fi_freeinfo( struct fi_info *info); 31 struct fi_info { struct fi_info *next; uint64_t caps; uint64_t mode; uint32_t addr_format; size_t src_addrlen; size_t dest_addrlen; void *src_addr; void *dest_addr; fid_t handle; struct fi_tx_attr *tx_attr; struct fi_rx_attr *rx_attr; struct fi_ep_attr *ep_attr; struct fi_domain_attr*domain_attr; struct fi_fabric_attr*fabric_attr; }; API version ~getaddrinfo app needs API semantics needed, and provider requirements for using them Detailed object attributes
  • 32. CAPABILITY AND MODE BITS 32 • Desired services requested by app • Primary – app must request to use • E.g. FI_MSG, FI_RMA, FI_TAGGED, FI_ATOMIC • Secondary – provider can indicate availability • E.g. FI_SOURCE, FI_MULTI_RECV Capabilities • Requirements placed on the app • Improves performance when implemented by application • App indicates which modes it supports • Provider clears modes not needed • Sample: FI_CONTEXT, FI_LOCAL_MR Modes
  • 33. ATTRIBUTES struct fi_fabric_attr { struct fid_fabric *fabric; char *name; char *prov_name; uint32_t prov_version; uint32_t api_version; }; 33 struct fi_domain_attr { struct fid_domain *domain; char *name; enum fi_threading threading; enum fi_progress control_progress; enum fi_progress data_progress; enum fi_resource_mgmt resource_mgmt; enum fi_av_type av_type; int mr_mode; /* provider limits – fields omitted */ ... uint64_t caps; uint64_t mode; uint8_t *auth_key; ... }; Provider details Can also use env var to filter Already opened resource (if available) How resources are allocated among threads for lockless access Provider protects against queue overruns Do app threads drive transfers Secure communication (job key)
  • 34. ATTRIBUTES struct fi_ep_attr { enum fi_ep_type type; uint32_t protocol; uint32_t protocol_version; size_t max_msg_size; size_t msg_prefix_size; size_t max_order_raw_size; size_t max_order_war_size; size_t max_order_waw_size; uint64_t mem_tag_format; size_t tx_ctx_cnt; size_t rx_ctx_cnt; size_t auth_key_size; uint8_t *auth_key; }; 34 Indicates interoperability Order of data placement between two messages Default, shared, or scalable
  • 35. ATTRIBUTES struct fi_tx_attr { uint64_t caps; uint64_t mode; uint64_t op_flags; uint64_t msg_order; uint64_t comp_order; size_t inject_size; size_t size; size_t iov_limit; size_t rma_iov_limit; }; 35 struct fi_rx_attr { uint64_t caps; uint64_t mode; uint64_t op_flags; uint64_t msg_order; uint64_t comp_order; size_t total_buffered_recv; size_t size; size_t iov_limit; }; Are completions reported in order “Fast” message size Can messages be sent and received out of order
  • 36. THANK YOU Sean Hefty OFIWG Co-Chair • All things libfabric: • www.libfabric.org • Meetings: • https://www.openfabrics.org/my-calendar/ • https://www.openfabrics.org/ofiwg-webex/ • Mailing list • https://lists.openfabrics.org/mailman/listinfo/ofiwg
  • 37. LEGAL DISCLAIMER & OPTIMIZATION NOTICE 37 Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804  Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.  INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.  Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
  • 38. MEMORY MONITOR AND REGISTRATION CACHE Notification Queue Notification Queue Notification Queue Memory Monitor Core Monitor ‘Plug-in’ MR Map MR MR MR Registration Cache LRU List Custom Limits Usage Stats Provider Merges overlapping regions events subscribe Driver notification, hook alloc/free, provider specific Tracks active usage Internal API Get/put MRs Callbacks to add/delete MRs A generic solution is desired here 38
  • 39. PERSISTENT MEMORY User Commit complete RMA Write Persistent Memory User Register PMEM PMEM MR  Keep implementation agnostic • Handle offload and on-load models • Support multi-rail • Minimize state footprint High-availability model (v1.6) Evolve APIs to support other usage models Documentation limits use case New completion semantic  Exploration • Byte addressable or object aware • Single or multi-transfer commit • Advanced operations (e.g. atomics) Work with SNIA (Storage Networking Industry Association) 39