Weitere ähnliche Inhalte
Ähnlich wie 高速ネットワーク最新動向と具体例 (ENOG58 Meeting) (20)
Mehr von Naoto MATSUMOTO (20)
高速ネットワーク最新動向と具体例 (ENOG58 Meeting)
- 3. でも、あまりパケット処理をCPUに頼りたくないなという方へは。
# mst start
# mlxconfig -d /dev/mst/mt4121_pciconf0 set SRIOV_EN=1
# mlxconfig -d /dev/mst/mt4121_pciconf0 set NUM_OF_VFS=32
# sync; sync; sync; reboot
# echo 1 > /sys/class/net/enp1s0f0/device/sriov_numvfs
# echo 0000:01:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind
# devlink dev eswitch set pci/0000:01:00.0 mode switchdev
# echo 0000:01:00.1 > /sys/bus/pci/drivers/mlx5_core/bind
# apt install openvswitch-switch -y
# /etc/init.d/openvswitch-switch start
# ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
# /etc/init.d/openvswitch-switch restart
# ovs-vsctl add-br ovs-sriov
# ovs-vsctl add-port ovs-sriov enp1s0f0
# ovs-vsctl add-port ovs-sriov enp1s0f0_0
# ifconfig enp1s0f0 up up (*PF)
# ifconfig enp1s0f0_0 up up (*VF representor)
# ip netns add TEST (*namespace TEST)
# ip link set enp1s0f1 netns TEST
# ip netns exec TEST ifconfig enp1s0f1 up up (*VF)
# ip netns exec TEST dhcient enp1s0f1 (*VF assigned ip address 1.2.3.4)
# ip netns exec TEST ping 8.8.8.8
Mellanox ASAP2 Direct / Full OVS Offload
CPU使いません
- 5. とはいえ、今はこんな簡単にH/W Offload NICでL4LBが触れる時代。
# dpkg -i linux-headers-4.19.0-041900rc3_4.19.0-041900rc3.201809120832_all.deb
# dpkg -i linux-headers-4.19.0-041900rc3-generic_4.19.0-041900rc3.201809120832_amd64.deb
# dpkg -i linux-modules-4.19.0-041900rc3-generic_4.19.0-041900rc3.201809120832_amd64.deb
# dpkg -i linux-image-unsigned-4.19.0-041900rc3-generic_4.19.0-041900rc3.201809120832_amd64.deb
# sync; sync; sync; reboot
# dpkg -i agilio-bpf-firmware-2.0.6.121-1.deb
# dpkg -i bpftool-4.18_amd64.deb
# modprobe -r nfp; modprobe nfp
# cd /opt
# apt install elfutils libelf-dev libmnl-dev bison flex pkg-config
# git clone https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git
# cd iproute2-next; ./configure; make; make install
# ip link set dev enp101s0np0 xdpoffload obj l4lb_xdp.o sec xdp
# ./l4lb_map.py -i enp101s0np0 -f ./destination_samples/32_destinations.csv
# ./l4lb_stats.py -i enp101s0np0
== Load balancer outbound statistics [Offload] ==
1 10.0.0.57 0 pkts/s 0 bits/s
2 10.0.125.19 0 pkts/s 0 bits/s
3 10.0.129.60 0 pkts/s 0 bits/s
:
- 7. よくこの手の話で昔から「GPUを使えば!」が出ますが、こういう実態。
a = cp.array(R, dtype=np.uint8) 2.27 sec
a = np.array(R, dtype=np.uint8) 0.46 sec
cp.sort(a) 0.54 sec
np.sort(a) 15.1sec
# apt install python-pip
# pip install --upgrade pip
# pip install --upgrade setuptools
# pip install numpy cupy time
# python
import time
import cupy as cp
import numpy as np
from numpy.random import *
R = randint(0,100,600000000)
a = cp.array(R, dtype=cp.uint8)
cp.sort(a)
numpy
(CPU)
cupy
(GPU)
15.56秒 (処理時間)
2.81秒 (処理時間)
↑
GPUメモリ空間への配列データ転送
- 8. それでも「GPUで何か高速化したい!」方へは、こういうモノもある。
# apt install -y curl apt-transport-https
# useradd -U mapd; ufw disable; ufw enable; ufw allow 9092/tcp; ufw allow 22/tcp
# curl https://releases.mapd.com/ce/mapd-ce-cuda.list | sudo tee /etc/apt/sources.list.d/mapd.list
# curl https://releases.mapd.com/GPG-KEY-mapd | sudo apt-key add -
# apt update
# apt install -y mapd
# vi ~/.bashrc
export MAPD_USER=mapd
export MAPD_GROUP=mapd
export MAPD_STORAGE=/var/lib/mapd
export MAPD_PATH=/opt/mapd
# source ~/.bashrc
# mkdir -p $MAPD_STORAGE; chown -R $MAPD_USER $MAPD_STORAGE
# cd $MAPD_PATH/system; ./install_mapd_systemd.sh; cd $MAPD_PATH
# systemctl start mapd_server; systemctl enable mapd_server
# systemctl start mapd_web_server; systemctl enable mapd_web_server
# $MAPD_PATH/insert_sample_data
# $MAPD_PATH/bin/mapdql -t
Password: HyperInteractive
mapdql> SELECT origin_city AS "Origin", dest_city AS "Destination", AVG(airtime) AS "Average Airtime"
FROM flights_2008_10k WHERE distance <= 33 GROUP BY origin_city, dest_city;
Execution time: 1268 ms, Total time: 1269 ms
CUDA9.1 (MapD Community Edition 3.4.0)
- 10. 最近PCI Express 4.0でバス速度も速くなりましたし、楽しい限りです。
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
Vendor ID: AuthenticAMD
# lspci -vv
01:00.0 Non-Volatile memory controller: Phison Electronics Corporation Device 5016 ...
Capabilities: [80] Express (v2) Endpoint, MSI 00
LnkCap: Port #1, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited,
LnkSta: Speed 16GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
# lspci -vv (PCIe 2.0 to 4.0 on BIOS Config)
PCIe2.0: LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+
PCIe3.0: LnkSta: Speed 8GT/s, Width x4, TrErr- Train- SlotClk+
PCIe4.0: LnkSta: Speed 16GT/s, Width x4, TrErr- Train- SlotClk+
# fio --directory=/root/ --rw=read --bs=4k --size=10G --numjobs=16 ...
PCIe2.0: read: IOPS=443k, BW=1732MiB/s (1816MB/s)(160GiB/94590msec)
PCIe3.0: read: IOPS=876k, BW=3421MiB/s (3587MB/s)(160GiB/47889msec)
PCIe4.0: read: IOPS=1209k, BW=4724MiB/s (4954MB/s)(160GiB/34681msec)