Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
ゼロから作る高速パケット転送用OS
東京大学大学院情報理工学系研究科
特任助教 浅井大史
<panda@hongo.wide.ad.jp>
2014年11月18日
Internet&Week&2014 &
S6&
!  SDN:%So'ware%Defined%Network%
"  Forwarding%Plane Control%Plane %
#  Control%Plane %
!  NFV:%Network%Func:on%Virtualiza:...
!  CPU %
=% OS%
"  Inexpensive)
"  Flexible)
"  Extensible)
X&as&a&Service&
Network&Func:on&Virtualiza:on&
Service&Func:on...
“Networked”%Opera:ng%System%manages%
% %“compu:ng”%resources%
“Networking”%Opera:ng%System%manages%
% %“compu:ng”%and%“net...
!  Not%good%for%networking%
"  VM %
%
#  Tick %
#  %
–  TLB etc.%
#  I/O %
–  I/O %
!  %
"  %
#  %
"  %
#  %
#  %
#  %
#  %
#  %
!  %
"  %
#  %
"  %
#  %
#  %
#  %
#  %
#  %
!  NIC %
%
"  %
#  CPU?%
#  Memory?%
#  PCIe%bus?%
#  or%something%else?%
!  Ethernet%
"  64SByte% = %
#  1GbE:%1.488Mpps%
=%672%ns/packet%
#  10GbE:%14.88Mpps%
=%67.2%ns/packet%
#  40GbE:%59.52Mp...
!  %
%
"  CPU CPU %
"  %
#  %
–  netmap%[Rizzo,%USENIX%ATC%2012]%
–  Intel®%DPDK%
"  %
#  Linux NAPI%
#  %
–  Intel®%DPDK%
PCIe
CPU
I/O Hub
Integrated
Memory
Controller
CPU
Memory Memory
Integrated
Memory
Controller
(a) (a)
(c)
(b)
I/O
Controlle...
!  Data%access%latency%(*)%
"  L1%cache:%4S5%cycles%~%1.2S1.5ns%
"  L2%cache:%12%cycles%~%3.6ns%
"  L3%cache:%27.85%cycles...
$
!  PCIe %
=%Memory%Mapped%I/O%(MMIO)%
–  %
"  1529.17%cycles%/%read%
#  392.1%ns%/%read%
"  282.621%cycles%/%write%
#  72....
Ring%buffer
Descriptors Buffer
Generic%NIC%architecture
Ring%buffer
Descriptors Buffer
Packet&recep:on&
1.  NIC%receives%a%packet%
2.  NIC%transfer%the%packet%data%to%
a%buffer%in%R...
Ring%buffer
Descriptors Buffer
Packet&transmission&
1.  So'ware%writes%a%packet%to%a%
buffer%in%RAM%
2.  So'ware%proceeds%the...
Ring%buffer
Descriptors Buffer
Packet&recep:on&
1.  NIC%receives%a%packet%
2.  NIC%transfer%the%packet%data%to%
a%buffer%in%R...
Ring%buffer
Descriptors Buffer
Packet&transmission&
1.  So'ware%writes%a%packet%to%a%
buffer%in%RAM%
2.  So'ware%proceeds%the...
!  %
"  UDP CPU %
"  Tx %
n% %
#  Descriptor %
#  n Tx%tail
txq_tail = 0;
for ( ;; ) {
txq_head = read_txq_head();
/* Avai...
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8
Packetrate[Mpps]
Bulk transfer size [packets]
Frame = 64B
96B
128B
192B
256B
384B
51...
RX%queue%ring TX%queue%ring
Timehw sw sw hw
Strategy&
•  %
•  PCIe 	
 %
rxq_tail = txq_tail = 0;
blkcnt = 0;
/* # of packets to be routed in bulk transfer */
nr_blk = 256 /* can be another value...
Transmitter Router
RX TX
RX
untag
untag
untag
CPU: % %Intel(R)%Core(TM)%i7%4770K%(3.90GHz,%quad%core)%%
Memory: %32GiB,%DD...
0
1
2
3
4
5
6
7
8
9
10
0 200 400 600 800 1000 1200 1400 1600
Throughput[Gbps]
Frame size [byte]
My implementation
Linux
Li...
!  %
"  %
#  Spirent%Communica:ons Spirent%TestCenter%
–  Interop%Tokyo%2014	
 %
%
#  %
–  SPTSN4US110%
–  CVS10GSS8%%
"  ...
1%
10%
100%
1000%
1% 2% 3% 4% 5% 6% 7% 8% 9% 10%
Latency&[us]
Test&traffic&(64Obyte&frame)&[Gbps]
avg%
min%
max%
90% ~10us
0...
!  Networking%Opera:ng%System%
"  %
#  I/O %etc.%
"  %
OS %
!  40GbE%NIC %
!  %
"  Not%CPU%
#  %
"  Not%memory%
"  PCIe%MMIO%
!  OS%
"  %
#  10GbE %
#  10GbE%x4% Tx%
Nächste SlideShare
Wird geladen in …5
×

ゼロから作るパケット転送用OS (Internet Week 2014)

This is my slides presented in a session at Internet Week 2014.

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen
  • Als Erste(r) kommentieren

ゼロから作るパケット転送用OS (Internet Week 2014)

  1. 1. ゼロから作る高速パケット転送用OS 東京大学大学院情報理工学系研究科 特任助教 浅井大史 <panda@hongo.wide.ad.jp> 2014年11月18日 Internet&Week&2014 & S6&
  2. 2. !  SDN:%So'ware%Defined%Network% "  Forwarding%Plane Control%Plane % #  Control%Plane % !  NFV:%Network%Func:on%Virtualiza:on% "  % #  CPU
  3. 3. !  CPU % =% OS% "  Inexpensive) "  Flexible) "  Extensible) X&as&a&Service& Network&Func:on&Virtualiza:on& Service&Func:on&Chaining&
  4. 4. “Networked”%Opera:ng%System%manages% % %“compu:ng”%resources% “Networking”%Opera:ng%System%manages% % %“compu:ng”%and%“network”%resources%
  5. 5. !  Not%good%for%networking% "  VM % % #  Tick % #  % –  TLB etc.% #  I/O % –  I/O %
  6. 6. !  % "  % #  % "  % #  % #  % #  % #  % #  %
  7. 7. !  % "  % #  % "  % #  % #  % #  % #  % #  %
  8. 8. !  NIC % % "  % #  CPU?% #  Memory?% #  PCIe%bus?% #  or%something%else?%
  9. 9. !  Ethernet% "  64SByte% = % #  1GbE:%1.488Mpps% =%672%ns/packet% #  10GbE:%14.88Mpps% =%67.2%ns/packet% #  40GbE:%59.52Mpps% =%16.8%ns/packet% #  100GbE:%148.8Mpps% =%6.72%ns/packet
  10. 10. !  % % "  CPU CPU % "  % #  % –  netmap%[Rizzo,%USENIX%ATC%2012]% –  Intel®%DPDK% "  % #  Linux NAPI% #  % –  Intel®%DPDK%
  11. 11. PCIe CPU I/O Hub Integrated Memory Controller CPU Memory Memory Integrated Memory Controller (a) (a) (c) (b) I/O Controller Hub On-board NIC Direct Media Interface (a)  3.3GHz%clock%CPU% •  0.3ns%per%cycle%(220%cycles%/%packet)% •  +% % (b)  CPUSMemory%bus%(N.B.,%64%bit%wide%access)% •  DDR3S1333%Dual%Channel:%21.333GB/s%(170.667Gbps)% •  DDR3S1600%Dual%Channel:%25.600GB/s%(204.800Gbps)% •  DDR3S1866%Dual%Channel:%29.867GB/s%(238.933Gbps)%% (c)  PCIe%bus% •  Gen2:%500MB/s%(x1)%=%4Gbps% •  usually%x8%for%a%twoSport%10GbE%NIC% •  x16%is%not%enough%for%a%twoSport%40GbE%NIC% •  Gen3:%985MB/s%(x1)%=%7.88Gbps% (d)  DMI%bus% •  v1.0:%2GB/s%(1GB/s%per%direc:on%=%8Gbps)% •  v2.0:%4GB/s%(2GB/s%per%direc:on%=%16Gbps) Bokleneck? Bokleneck?
  12. 12. !  Data%access%latency%(*)% "  L1%cache:%4S5%cycles%~%1.2S1.5ns% "  L2%cache:%12%cycles%~%3.6ns% "  L3%cache:%27.85%cycles%~%8.4ns% "  RAM:%28%cycles%+%49S56%ns%~%65ns% #  % (*)%hkp://www.7Scpu.com/cpu/SandyBridge.html
  13. 13. $
  14. 14. !  PCIe % =%Memory%Mapped%I/O%(MMIO)% –  % "  1529.17%cycles%/%read% #  392.1%ns%/%read% "  282.621%cycles%/%write% #  72.47%ns%/%write 1M % CPU Performance%Monitoring%Counter%(PMC) CPU:%Intel%Core%i7%4770K% Memory:%Corsair%DDR3S1866%8GB%x4% NIC:%Intel%X520SDA2%
  15. 15. Ring%buffer Descriptors Buffer Generic%NIC%architecture
  16. 16. Ring%buffer Descriptors Buffer Packet&recep:on& 1.  NIC%receives%a%packet% 2.  NIC%transfer%the%packet%data%to% a%buffer%in%RAM%via%DMA% 3.  NIC%proceeds%the%head%pointer% 4.  So'ware%processes%the%packet% 5.  So'ware%proceeds%the%tail% pointer%to%release%the%packet% (3)%head (2) (5)%tail Generic%NIC%architecture
  17. 17. Ring%buffer Descriptors Buffer Packet&transmission& 1.  So'ware%writes%a%packet%to%a% buffer%in%RAM% 2.  So'ware%proceeds%the%tail% pointer%to%commit%the%packet% 3.  NIC%transfer%the%packet%data% from%the%buffer%in%RAM%via% DMA% 4.  NIC%transmit%the%packet% 5.  NIC%proceeds%the%head%pointer% to%no:fy%the%packet%is% transmiked% (2)%tail (1) (5)%head Generic%NIC%architecture
  18. 18. Ring%buffer Descriptors Buffer Packet&recep:on& 1.  NIC%receives%a%packet% 2.  NIC%transfer%the%packet%data%to% a%buffer%in%RAM%via%DMA% 3.  NIC%proceeds%the%head%pointer% 4.  So'ware%processes%the%packet% 5.  So'ware%proceeds%the%tail% pointer%to%release%the%packet% (3)%head (2) (5)%tail
  19. 19. Ring%buffer Descriptors Buffer Packet&transmission& 1.  So'ware%writes%a%packet%to%a% buffer%in%RAM% 2.  So'ware%proceeds%the%tail% pointer%to%commit%the%packet% 3.  NIC%transfer%the%packet%data% from%the%buffer%in%RAM%via% DMA% 4.  NIC%transmit%the%packet% 5.  NIC%proceeds%the%head%pointer% to%no:fy%the%packet%is% transmiked% (2)%tail (1) (5)%head
  20. 20. !  % "  UDP CPU % "  Tx % n% % #  Descriptor % #  n Tx%tail txq_tail = 0; for ( ;; ) { txq_head = read_txq_head(); /* Available Tx queue length */ txq_len = txq_sz - (txq_sz - txq_head + txq_tail) % txq_sz; /* Check the available Tx queue length */ if ( txq_len < n ) continue; for ( i = 0; i < n; i++ ) { // Set packet to the ring buffer to txq_tail txq_ring[txq_tail].pkt = pkt_to_transmit; txq_tail = (txq_tail + 1) % txq_sz } /* Commit */ write_txq_tail(txq_tail); } ~392.1ns ~72.47ns
  21. 21. 0 2 4 6 8 10 12 14 16 1 2 3 4 5 6 7 8 Packetrate[Mpps] Bulk transfer size [packets] Frame = 64B 96B 128B 192B 256B 384B 512B 768B 1024B 1536B 14.88Mpps =%n ~500ns/packet ~250ns/packet ~125ns/packet
  22. 22. RX%queue%ring TX%queue%ring Timehw sw sw hw Strategy& •  % •  PCIe %
  23. 23. rxq_tail = txq_tail = 0; blkcnt = 0; /* # of packets to be routed in bulk transfer */ nr_blk = 256 /* can be another value */; for ( ;; ) { /* Rx queue head */ rx_desc = GET_RX_DESC_HEAD(netdev); if ( DMA_COMPLETED(rx_desc) ) { // Lookup routing table and copy from Rx to Tx // Rewrite destination MAC address, TTL--, // and calculate checksum blkcnt++; if ( blkcnt >= nr_blk ) { blkcnt = 0; write_rxq_tail(rxq_tail); write_txq_tail(txq_tail); } } else { blkcnt = 0; write_rxq_tail(rxq_tail); write_txq_tail(txq_tail); } }
  24. 24. Transmitter Router RX TX RX untag untag untag CPU: % %Intel(R)%Core(TM)%i7%4770K%(3.90GHz,%quad%core)%% Memory: %32GiB,%DDR3S1866% NIC: % %Intel(R)%X520SDA2%(2%ports)% % % 5 OS OS
  25. 25. 0 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 1000 1200 1400 1600 Throughput[Gbps] Frame size [byte] My implementation Linux Line rate 1 % TTL CPU
  26. 26. !  % "  % #  Spirent%Communica:ons Spirent%TestCenter% –  Interop%Tokyo%2014 % % #  % –  SPTSN4US110% –  CVS10GSS8%% "  PC % #  CPU:%Intel®%Core%i7%4770K%% #  Memory:%DDRS3S1866%(8GB%x4)% #  NIC:%Intel®%X520SDA2%(1 )%
  27. 27. 1% 10% 100% 1000% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% Latency&[us] Test&traffic&(64Obyte&frame)&[Gbps] avg% min% max% 90% ~10us 0.001Mpps &
  28. 28. !  Networking%Opera:ng%System% "  % #  I/O %etc.% "  % OS % !  40GbE%NIC %
  29. 29. !  % "  Not%CPU% #  % "  Not%memory% "  PCIe%MMIO% !  OS% "  % #  10GbE % #  10GbE%x4% Tx%

×