Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Ixgbe internals

615 Aufrufe

Veröffentlicht am

SUSE Labs Taipei Technology Sharing Day 2018

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

Ixgbe internals

  1. 1. ixgbe Internals SUSE Labs Taipei Technology Sharing Day 2018 Gary Lin Software Engineer, SUSE Labs glin@suse.com
  2. 2. ixgbe?
  3. 3. Intel 10 Gigabit PCI Express Linux Driver
  4. 4. Why ixgbe? ● The first driver supporting XDP_REDIRECT in Linux kernel mainline ● Using a different memory model for XDP ● Just For Fun™
  5. 5. Overview
  6. 6. NIC driver Application Network Stack QDISC From Driver to AP TX RX NAPI
  7. 7. TX – QDISC
  8. 8. ... CPU 1 CPU 2 CPU n TX Q 1 TX Q 2 TX Q n-1 TX Q n CPU n-1 Ring Buffer ... QDISC netdev_pick_tx() + ndo_start_xmit()
  9. 9. RX – NAPI
  10. 10. RAM Network Card DMA poll poll poll poll IRQ RX Queue
  11. 11. ... CPU 1 CPU 2 CPU n napi_poll() CPU Q 1 CPU Q 2 CPU Q n ... enqueue_to_backlog() RX Q 1 RX Q 2 RX Q n-1 RX Q n CPU n-1 CPU Q n-1 CPU i Ring Buffer
  12. 12. Ring Buffers in ixgbe
  13. 13. TX Ring RX Ring XDP Ring
  14. 14. RX Q 2 IRQ q_vector RX Ring 2 TX Q 1 TX Ring 1 Legacy Interrupt RX Q 1 TX Q 2 RX Ring 1 TX Ring 2
  15. 15. RX Q IRQ q_vector RX Ring TX Q IRQ q_vector TX Ring RX Q IRQ q_vector RX Ring TX Q TX Ring MSI-X
  16. 16. ... Ring Buffer buffer1 buffer2 buffer3 buffer4 next_to_use
  17. 17. TX Ring ixgbe_tx_buffer sk_buff QDISCstruct sk_buff *skb unsigned int bytescount
  18. 18. XDP Ring ixgbe_tx_buffer void *data xdp_buff XDP unsigned int bytescount void *data
  19. 19. TX Ring + XDP Ring ixgbe_q_vector struct ixgbe_ring_container tx TX Ring 1 TX Ring 2 TX Ring n XDP Ring 1 XDP Ring 2 XDP Ring n
  20. 20. RX Ring ixgbe_rx_buffer struct page *page __u32 *page_offset __u16 pagecnt_bias page dev_alloc_pages()
  21. 21. RX Ring ixgbe_rx_buffer struct page *page __u32 *page_offset __u16 pagecnt_bias page DMA
  22. 22. RX Ring ixgbe_rx_buffer struct page *page __u32 *page_offset __u16 pagecnt_bias page SKB
  23. 23. RX Ring ixgbe_rx_buffer struct page *page __u32 *page_offset __u16 pagecnt_bias page Flip Flip: page_offset ^= page_size / 2
  24. 24. RX Ring ixgbe_rx_buffer struct page *page __u32 *page_offset __u16 pagecnt_bias page
  25. 25. RX Ring – Recycle ixgbe_rx_buffer struct page *page __u32 *page_offset __u16 pagecnt_bias page Flip
  26. 26. RX Ring – Replace ixgbe_rx_buffer struct page *page __u32 *page_offset __u16 pagecnt_bias page page
  27. 27. Page Ref Count ● Tracking the number of users of the page ● The possible page ref count (ideally): – 1: The whole page is available. – 2: One half of the page is in use. – 3: The whole page is in use.
  28. 28. Page Count Operations static inline void set_page_count(struct page *page, int v) { atomic_set(&page->_refcount, v); if (page_ref_tracepoint_active(__tracepoint_page_ref_set)) __page_ref_set(page, v); } static inline void page_ref_add(struct page *page, int nr) { atomic_add(nr, &page->_refcount); if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) __page_ref_mod(page, nr); } static inline void page_ref_sub(struct page *page, int nr) { atomic_sub(nr, &page->_refcount); if (page_ref_tracepoint_active(__tracepoint_page_ref_mod)) __page_ref_mod(page, -nr); }
  29. 29. Atomic operations are expensive!
  30. 30. Adjusted Ref Count (1/3) ● A locally maintained pagecnt_bias for the RX page ● Initial value of pagecnt_bias: 1 ● adj_pagecnt = pagecnt – pagecnt_bias – 0: The whole page is available. – 1: One half of the page is in use. – 2: The whole page is in use.
  31. 31. Adjusted Ref Count (2/3) ● Harvesting a packet: pagecnt_bias-- ● Recycling the page: pagecnt_bias++ – The XDP program returns XDP_DROP or an error. – The packet is small enough to be copied into the allocated skb. – The packet is failed to be packed into skb. ● If pagecnt_bias == 0, set pagecnt_bias to USHRT_MAX and add USHRT_MAX to pagecnt.
  32. 32. Adjusted Ref Count (3/3) ● Packet consumed: pagecnt-- – The packet is consumed by the network stack. – XDP_TX or XDP_REDIRECT is completed. ● Releasing the page: pagecnt -= pagecnt_bias – With the help of __page_frag_cache_drain()
  33. 33. XDP
  34. 34. TX Q RX Q Network Stack PASS DROP eBPF Program TX NIC Driver eXpress Data Path REDIRECT Other CPU or NIC
  35. 35. Page 1 Page 2 Page 3 Conventional RX Buffer Page 1 Page 2 Page 3 One Packet Per Page Memory Model Switch XDP NOTE: The driver other than ixgbe
  36. 36. For ixgbe, memory model switch is not necessary!
  37. 37. XDP in ixgbe eBPF Program XDP_PASS Network Stack RX Ring XDP Ring XDP_DROP XDP_TX XDP_REDIRECT Other CPU or NICXDP_REDIRECT ixgbe_xdp_xmit() xdp_do_redirect() ixgbe_xmit_xdp_ring() pagecnt_bias++
  38. 38. Incoming New Features ● XDP for ixgbevf (linux-next) – ixgbe blocks XDP if SR-IOV is enabled. ● XDP redirect memory return API (net-next) – Managing pages across drivers – Adopted by ixgbe, i40e, mlx5, tuntap, and virtio_net – Preparing for the AF_XDP zero-copy patch set – ixgbe tweaked the page ref counting scheme for the new API.
  39. 39. Question?
  40. 40. Thank You!
  41. 41. References  Linux kernel v4.15 https://github.com/torvalds/linux/tree/v4.15/drivers/net/ethernet/intel/ixgb e  [0/5] Enable XDP for ixgbevf http://patchwork.ozlabs.org/cover/887197/  [net-next V11 PATCH 00/17] XDP redirect memory return API https://www.spinics.net/lists/netdev/msg495995.html  ixgbe: tweak page counting for XDP_REDIRECT https://patchwork.ozlabs.org/patch/889261/  Monitoring and Tuning the Linux Networking Stack: Sending Data https://blog.packagecloud.io/eng/2017/02/06/monitoring-tuning-linux-net working-stack-sending-data/  Monitoring and Tuning the Linux Networking Stack: Receiving Data https://blog.packagecloud.io/eng/2016/06/22/monitoring-tuning-linux-net working-stack-receiving-data/

×