"Session ID: HKG18-214
Session Name: HKG18-214 - Progress of WrapDrive
Speaker: Zaibo Xu
Track: Networking
★ Session Summary ★
As a Common Accelerator Framework for User Space Applications was brought up at last Connect SFO 2017 by Kenneth Lee, we have done a series of work to support user applications better. DMA mapping for accelerators in multiple processes and SVM(Share Virtual Memory) without page fault are running okay on our D06 board, which is enabled by new VFIO and IOMMU APIs based on the SVM patch (RFC v3) from Jean-Philippe Brucker of ARM. Also, we do some performance testing on the above scenarios with our SOC device (ZIP) to show the advantages of Wrapdrive.In the next, SVM with page fault from devices will be supported by Wrapdrive with a high performance and smart coding way.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-214/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-214.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-214.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Networking
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961"
3. Looking back on Wrapdrive: SFO 2017
What is Wrapdrive (WD)?
1. An accelerator framework for
user space, leveraging hardware
accelerators with native
performance.
2. Based on VFIO and VFIO Mdev,
allowing direct access of hardware
with improved security and
efficiency.
3. Hardware accelerator is accessed
via a ‘queue’, the minimal working
unit for user. The queue’s DMA
priority is controlled by the user.
4. Looking back on Wrapdrive: SFO 2017
Why Wrapdrive?
● Try to get the native performance of accelerator.
● Break the limit of one device serving only one process with ‘queue’.
● ‘Queue’ should be managed with more security.
Status back then
● Shared virtual memory (SVM, SVA) was in the phase of RFC.
● Substream-ID was not supported by SMMU driver .
● Wrapdrive supported only one process.
5. Wrapdrive Progress
● Support SVA
● DMA mapping one device from multiple processes
● Keep compatibility (native, normal mdev)
6. Support SVA
● Extends SVA patch set from Jean-Philippe Brucker.
○ https://www.spinics.net/lists/devicetree/msg214285.html
● Currently no I/O page fault (IOPF).
○ Uses ‘mlock’ to trigger any faults before DMA access and to prevent the pages from being
swapped.
○ In theory, WD can support SVA with IOPF, but no accelerator is yet available to test.
● Tested on a Hisilicon D06 board of Hisilicon with ZIP accelerator (Example)
● One Wrapdrive device (Wdev) serves multiple processes including kernel.
○ VFIO can bind one Mdev(queue) to serve one process with an IO page table, so accelerator
supported PASID with multiple queues can support multiple processes. Moreover, kernel
default IO page table’s PASID is zero, which is existing as before.
7. DMA Mapping in multiple processes
● Add VFIO APIs for private DMA map, similar to SVA VFIO bind/unbind.
+#define VFIO_IOMMU_ATTACH _IO(VFIO_TYPE, VFIO_BASE + 24)
+#define VFIO_IOMMU_DETACH _IO(VFIO_TYPE, VFIO_BASE + 25)
● VFIO_IOMMU_ATTACH creates a PASID linked IOVA address space for the
VFIO container.
● This IOVA address space is retrieved using a DMA map operation.
● iommu map/unmap versions with PASID is added to the iommu operations.
/* Actually, ‘io_mm’ denotes an address space and it includes a PASID */
struct iommu_ops {
…
int (*map)(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot);
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size);
+int (*sva_map)(struct iommu_domain *domain, struct io_mm *io_mm, unsigned long iova, phys_addr_t paddr, size_t size, int prot);
+size_t (*sva_unmap)(struct iommu_domain *domain, struct io_mm *io_mm, unsigned long iova, size_t size);
…
}
● May be merged with normal map/unmap in future.
8. Extend standard interfaces
● A pointer to the parent device iommu_group is added to the vfio_group.
struct vfio_group {
struct iommu_group *iommu_group;
+/* iommu_group of mdev's parent device */
+struct iommu_group *parent_group;
struct list_head next;
};
● A Wdev enabling method is added in ‘vfio_iommu_type1_attach_group’.
mdev_bus = symbol_get(mdev_bus_type);
if (mdev_bus) {
if ((bus == mdev_bus) && !iommu_present(bus)) {
…
+/* Check if it is wdev (wrapdrive device) and get parent device’s group, or go default logic */
+ret = iommu_group_for_each_dev(iommu_group, &pgroup, vfio_wdev_type);
…
+domain->domain = iommu_group_default_domain(pgroup);
+group->parent_group = pgroup;
…
+return 0;
+}
if (!iommu->external_domain) {
● Normal Mediated Device handling is untouched (vGPU etc)
9. Wrapdrive is moved into VFIO as VFIO_Wdev, and a hardware queue can be a Mdev if the DEV is a VFIO_Wdev. For
Mdev from VFIO_Wdev, VFIO operations are finally applied on Mdev’s parent device. One device supports multiple
processes without unbinding its driver.
VFIO
VFIO_Mdev
VFIO_Wdev
Relationship with VFIO / MDEV
IOMMU
DEV
…
Queue
Queue
Process
Process
…
IO PGTBL
IO PGTBL
PGTBL
PGTBL
…
…
DEV
…
Queue
Queue
Process
Process
…
SVA
Processes
DMA MAP
iommu_domain
iommu_domain
Kernel IO
PGTBL
PASID = 0
10. Example
● zlib acceleration on a Hisilicon D06 board (using a ZIP accelerator)
Source code:https://github.com/zaiboxu/wrapdrive.git Branch:wrapdrive-4.15-rc9
ZIP K_DRV
VFIO_WDEV
vfio_wdev_register
VFIO
ZIP U_DRV
U_WD
Test
Sample
1. Synchronous and asynchronous
mode APIs of WD compared.
2. Multiple DMA map and SVA no IOPF
scenarios compared.
3. Each scenario is run by 1 or 3
processes.
4. A range of different packet lengths.
11. Zlib acceleration performance
● zlib throughput on Hisilicon D06 board (using ZIP accelerator)
Source code:https://github.com/zaiboxu/wrapdrive.git Branch:wrapdrive-4.15-rc9
Mpps Mpps
Bytes
8 scenarios : ZIP accelerator serves 1 or 3 processes with SVA no IOPF or multiple map in synchronous or asynchronous modes.
12. Current Challenges
● Still existing security risk since several users work on one device.
○ ‘Queue’ is not isolated as much as a task in the OS.
● Wrapdrive queue management on VFIO Mdev is still awkward.
○ Creating Mdev for WD needs root permission.
○ Getting/putting a WD queue involves a series of Sysfs operations.
○ Mdev held by a process cannot be released automatically if the process exits unexpectedly.
● Coexist with Linux kernel Crypto / AF_ALG (controversial topic)
○ Ecosystem is ready because devices support PASID/PRI/ATS/ATC and corresponding software
such as VFIO and SVA is in place.
○ Userspace should be able to leverage accelerators directly with high performance now that
devices can DMA in user space in a fast and secure way.
13. Plans
● SVA with I/O page fault (hardware dependent)
● Extend to other type of devices (e.g. NIC)
○ Benefits user space data plane applications (ODP/DPDK etc)
● VFIO_Mdev framework Optimization to better support the requirements of
Wrapdrive.