This document proposes two solutions to address issues with live migration of virtual machines (VMs) using DPDK for packet processing.
The first solution maintains separate memory pools for the host and VM. Buffers contain both a structure part stored on the host and a message content part stored in shared memory (SHM). Rings manipulate buffers using indexes instead of pointers. This allows hiding physical addresses from the VM. Memory pools are reinitialized after migration to prevent memory leaks.
The second solution simpler initializes the receiving DPDK driver queue only after migration. This may introduce more downtime but avoids memory copy. It still has potential memory leak issues. Both solutions require further testing.
CSCI 2121- Computer Organization and Assembly Language Labor.docx
PhaseII_1
1. Solution Memo
1.Original Mechanism
1.1 Memory Layout
1.1.1 Memory Pool and Buffer
DPDK collects manages huge page memory in a way similar OS memory management. It cuts bulk
of memory into segments that are contiguous in both virtual address and physical address. When
user acquires, the structure of memory zone is used, which is allocated from suitable segment in
according to memory size required by user. That is, memory zone is also contiguous in both
virtual and physical address. Physical address is required by DMA. That is, every buffer used in the
solution should have knowledge of its physical address and could be calculated by its index or
virtual address.
The following figure shows memory layout that used by application.
The idea is, DPDK constructs buffers by slicing memory of memory pool and calculates their
physical address by offset.
2. As indicated in above figure, the rte_ring stores rte_mbuf by its virtual address.
1.2 Scenario
All the numbers in following this section are configurable.
1.2.1 Initiation
As the following figure shows, memory pool manages its free buffers by ring. In initiation, 512
buffers are allocated to rx queue of DPDK driver to prepare for reading.
1.2.2 Startup
Nothing happens on memory.
1.2.3 Sending
As the following figure shows, memory is managed by all kinds of rings. Data are transferred from
guest to host by copy.
3. 1.2.4 Receiving
Similar to Sending with the difference that the a new buffer allocated immediately after the
buffer with content read from DPDK driver are gotten by DPDK Poll Mode Engine. Then, the new
buffer is filled in the exactly same position for next round of reading.
1.2.5 Migration Source
Before migration completed on source VM, the message processing is untouched. When
migration completed on source VM, a message is sent to DPDK through internal TCP connection
to stop processing the VM. Only issue is that there may be a very short snap between migration
completion and DPDK stop the processing. In this term, message transmission still happens in
SHM and DPDK. As DPDK manager would not interrupt message procession, the DPDK would just
stop message processing about the VM after one message had sent or received completely.
That is, the status of rings in DPDK (host) stays correct. For sending, the buffer would be released
by DPDK driver as figure in 1.2.3 shows. For receiving, buffer would be released after content has
been copied into SHM. The SHM may has some dirty data, as the SHM would be deleted/re-
created/initialized when the same SHM used in next time, the dirty data does no harm.
1.2.6 Migration Destination
When migration completed, the status of SHM stay correct as the SHM and applications are
copied from source VM as a snapshot. But the host’s state may not match that of the VM.
In sending, the host always begins with fetching a message from SHM ring. That is, host always
assumes that migration completes at the point that one buffer had been sent completely. Which
causes the problem that a message had been fetched from SHM ring but failed to release into the
4. SHM ring as DPDK (host) can’t detect the case. That is, a memory leak.
In receiving, the similar problem occurs – a SHM buffer may have been allocated in source VM
while destination Host does not know there is a buffer to be released.
Also there may be corrupted data in message, but we don’t care currently.
1.3 OVS Solution
This section gives a simple description of OVS solution and to show why OVS can’t support
migration.
As the above figure shows, the host and VM shared all huge page memory and memory
management information. The advantage is to remove memory copy among process, either on
VM or host.
Theoretically, every VM has to have complete knowledge of all huge-page memories used by
DPDK and other VMs to work properly. That’s why VM can’t migrate.
While it can work well on switch-over case.
2.Issues
To adjust current L1/L2 split solution to remove memory copy, following issues should be
covered.
5. Allocate/Free: No copy means memory pool for DPDK and VM is the same one. Then buffers
and rings are moved into SHM.
SHM with Physical Address: physical address should be known by host but not by VM.
Initiation/Migration: As figure in section 1.2.1 shows, buffers should be bound to receiving
driver queue before VM starts. The problem is that destination host does not know status of
source VM – which buffer is free to be used?
3.Proposed Solution 1
3.1 Memory Layout
3.1.1 Memory Pool and Buffer
As following figure shows, maintain physical address on host and hide it on VM, the buffer is
combined by 2 parts: part for structure, address are stored in host, part for message content are
in SHM. Instead of pointer, rings now manipulate on indexes. The rings are also allocated in SHM
as figure in following section shows.
6. 3.1.2 Overview
The solution is based on assumption of one cell application. It would be extended to three cells
once being verified.
As following figure shows, two memory pools equipped for one VM. They are contiguous in both
physical address and virtual address on host. They are managed to be mapped into contiguous
virtual address on VM (open point).
The second memory pool is for migration. The point is that host memory pool does not migrate
with VM (open point), so as to be used for initiation of DPDK driver. When input in application
queues, the index of these buffers would be index counted from VM memory pool data store. For
example, if VM memory pool contains 8192 buffers, index of first buffer of host memory pool is
8193. That is, buffer users (DPDK and application) do not know there are 2 memory pools. They
have access to the whole bulk of memory of 2 pools.
As memory pool pointer embedded in buffer structure, application can release the buffer
correctly. After DPDK received start notification from VM, the VM memory pool would be re-
initialized when it is no longer used. Then, users of memory pool change it from host memory
pool to VM memory pool. As pointer change is atomic, and buffer has all information about its
memory pool and ring, the transfer could happen smoothly.
By this way, there is no problem of memory leak. And, when status is re-initialized in every
migration, there is no impact of cascading migration. How to detect there is no user of certain
memory pool? One proposal is to poll status of ring timely, including free counter, header
position, tail position, etc. If it remains unchanged for a while, we can assert there is no user of
the memory pool. The problem is that, a VM can’t be migrated immediately after migration.
7. 3.2 Scenario
3.2.1 Initiation
3.2.2 Startup
As described in 3.1.2, when there is no change on status of VM memory pool, the VM memory
pool would be re-initialized, that is, suppose it is a brand new memory pool that no buffer had
been used. Then all users switch to this one. Synchronization about the switch is not necessary.
9. 3.2.6 Migration Destination
4 Proposed Solution 2
The alternative is to initialize receiving DPDK driver queue after migration complete. The solution
would be much simpler while may introduce more down time. And still exits memory leak
problem. To be tested.