This document discusses AMD's DirectGMA technology, which allows direct access to GPU memory from other devices. It introduces DirectGMA and explains how it enables peer-to-peer transfers between GPUs and GPUs and FPGAs. It then provides details on implementing DirectGMA in APIs like OpenGL, OpenCL, DirectX 9, 10 and 11 to enable efficient data transfers without CPU involvement.
2. 2 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
Exposing Graphic memory of a GPU to any device has been always the goal for
any application looking for low latency communication of his data between
every device and the GPU. This is why AMD has introduced DirectGMA (Direct
Graphics Memory Access) in order to:
‒ Makes a portion of the GPU memory accessible to other devices
‒ Allows devices on the bus to write directly into this area of GPU memory
‒ Allows GPUs to write directly into the memory of remote devices on the bus
supporting DirectGMA
‒ Provides a driver interface to allow 3rd party hardware vendors to support data
exchange with an AMD GPU using DirectGMA
‒ APIs supporting AMD’s DirectGMA are: OpenGL, OpenCLTM, DirectX®
‒ The supported operation systems are: Windows ® 7 64 Bit and Linux ® 64 Bit
‒ The supported cards (AMD FirePro™ W W5x00 and above as well as all AMD FireProTM
S series)
INTRODUCTION TO DIRECTGMA
3. 3 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
Peer-to-Peer Transfers between GPUs
Use high-speed DMA transfers to copy data between the memories of two
GPUs on the same system/PCIe bus.
Peer-to-Peer Transfers between GPU and FPGAs
Use high-speed DMA transfers to copy data between the memories of the GPU
and the FPGA memory.
DirectGMA for Video
Optimized pipeline for frame-based devices such as frame grabbers, video
switchers, HD-SDI capture, and CameraLink devices. See our SDI webpage
INTRODUCTION TO DIRECTGMA
4. 4 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
AMD’S DIRECTGMA P2P
Direct communication between PCI cards
Bidirectional DirectGMA P2P requires memory on both cards
CPU
PCI Bus
5. 5 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
The OpenGL extension AMD_BUS_ADDRESSABLE_MEMORY provides access to
DirectGMA
The functions are:
The new tokens are:
DIRECTGMA IN OPENGL
void glMakeBuffersResident(sizei n, uint* buffers, uint64* baddr, uint64* maddr);
void glBufferBusAddress(enum target, sizeiptr size, uint64 surfbusaddress, uint64 markerbusaddress);
void glWaitMarker(uint buf, uint value);
void glWriteMarker(uint buf, uint value, uint64 offset);
GL_BUS_ADDRESSABLE_MEMORY_AMD
GL_EXTERNAL_PHYSICAL_MEMORY_AMD
6. 6 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
To receive data a buffer needs to be created that can be accessed by other
devices on the bus
The physical address of this buffer needs to be known in order to have a remote
device writing to this address
DIRECTGMA IN OPENGL | CREATING A BUFFER TO
RECEIVE DATA
glGenBuffers(m_uiNumBuffers, m_pBuffer);
m_pBufferBusAddress = new unsigned long long[m_uiNumBuffers];
m_pMarkerBusAddress = new unsigned long long[m_uiNumBuffers];
for (unsigned int i = 0; i < m_uiNumBuffers; i++)
{
glBindBuffer(GL_BUS_ADDRESSABLE_MEMORY_AMD, m_pBuffer[i]);
glBufferData(GL_BUS_ADDRESSABLE_MEMORY_AMD, m_uiBufferSize, 0, GL_DYNAMIC_DRAW);
}
// Call makeResident when all BufferData calls were submitted.
glMakeBuffersResidentAMD(m_uiNumBuffers, m_pBuffer, m_pBufferBusAddress, m_pMarkerBusAddress);
// Make sure that the buffer creation really succeeded
if (glGetError() != GL_NO_ERROR)
return false;
glBindBuffer(GL_BUS_ADDRESSABLE_MEMORY_AMD, 0);
7. 7 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
To write into the buffer on a remote device we need to create an OpenGL buffer
and assign the physical addresses of the memory on the remote device
DIRECTGMA IN OPENGL | USING A BUFFER ON A
REMOTE DEVICE
glGenBuffers(m_uiNumBuffers, m_pBuffer);
for (unsigned int i = 0; i < m_uiNumBuffers; i++)
{
glBindBuffer(GL_EXTERNAL_PHYSICAL_MEMORY_AMD, m_pBuffer[i]);
glBufferBusAddressAMD(GL_EXTERNAL_PHYSICAL_MEMORY_AMD, m_uiBufferSize, m_pBufferBusAddress[i], m_pMarkerBusAddress[i]);
if (glGetError() != GL_NO_ERROR)
return false;
}
glBindBuffer(GL_EXTERNAL_PHYSICAL_MEMORY_AMD, 0);
8. 8 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
Create one thread per GPU. Each thread creates its own context. One thread
adds as data sink the other as source.
On the sink GPU a GL_BUS_ADDRESSABLE_MEMORY_AMD buffer is created
On the source GPU a buffer is created.
DIRECTGMA IN OPENGL | GPU TO GPU COPY
glGenBuffers(m_uiNumBuffers, m_pSinkBuffer);
for (unsigned int i = 0; i < m_uiNumBuffers; i++)
{
glBindBuffer(GL_BUS_ADDRESSABLE_MEMORY_AMD, m_pSinkBuffer[i]);
glBufferData(GL_BUS_ADDRESSABLE_MEMORY_AMD, m_uiBufferSize, 0, GL_DYNAMIC_DRAW);
}
// Call makeResident when all BufferData calls were submitted.
glMakeBuffersResidentAMD(m_uiNumBuffers, m_pBuffer, m_pBufferBusAddress,
m_pMarkerBusAddress);
glGenBuffers(m_uiNumBuffers, m_pSourceBuffer);
for (unsigned int i = 0; i < m_uiNumBuffers; i++)
{
glBindBuffer(GL_EXTERNAL_PHYSICAL_MEMORY_AMD, m_pSourceBuffer[i]);
glBufferBusAddressAMD(GL_EXTERNAL_PHYSICAL_MEMORY_AMD, m_uiBufferSize,
m_pBufferBusAddress[i], m_pMarkerBusAddress[i]);
}
GPU 0: Sink GPU 1: Source
9. 9 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
The source creates data and copies it into the
GL_EXTERNAL_PHYSICAL_MEMORY buffer that has it’s data store on the sink
device
The sink device receives the data and copies it into a texture to be displayed
DIRECTGMA IN OPENGL | GPU TO GPU COPY
// Submit draw calls that do not require data sent by the source
…
glBindTexture(GL_TEXTURE_2D, m_uiTexture);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, uiBufferIdx);
// Indicate that the following commands will need the data transferred by the source
glWaitMarkerAMD(uiBufferId, uiTransferId);
// Copy buffer into texture
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, m_uiTextureWidth, m_uiTextureHeight, m_nExtFormat,
m_nType, NULL);
// Draw using received texture
// Draw
…
++uiTransferId;
// Bind buffer that has its data store on the sink GPU
glBindBuffer(GL_PIXEL_PACK_BUFFER, uiBufferid);
// Copy local buffer into remote buffer
glReadPixels(0, 0, m_uiBufferWidth, m_uiBufferHeight, m_nExtFormat, m_nType, NULL);
// Write marker
glWriteMarkerAMD(uiBufferId, uiTransferId , ullMarkerBusAddress);
glFlush();
GPU 0: Sink GPU 1: Source
10. 10 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
DIRECTGMA IN OPENGL | OVERLAPPING EXECUTION
GPU 1 render
GPU 1 transfer
GPU 0 render
GPU 0 use
buffer
GPU 0 wait
11. 11 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
The OpenCL extension CL_AMD_BUS_ADDRESSABLE_MEMORY provides access
to DirectGMA
The functions are:
The new tokens are:
DIRECTGMA IN OPENCL
cl_int clEnqueueWaitSignalAMD(cl_command_queue command_queue, cl_mem mem_object, uint value, cl_uint num_events, …
cl_int clEnqueueWriteSignalAMD(cl_command_queue command_queue, cl_mem mem_object, uint value, cl_ulong offset, …
cl_int clEnqueueMakeBuffersResidentAMD(cl_command_queue command_queue, cl_uint num_mem_objects, cl_mem* mem_objects,
cl_bool blocking_make_resident, cl_bus_address_amd * bus_addresses, cl_uint num_events, …
CL_BUS_ADDRESSABLE_MEMORY_AMD
CL_EXTERNAL_PHYSICAL_MEMORY_AMD
12. 12 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
DIRECTGMA | DX9
The DirectGMA functionality in DX9 is made available through a so called
communication surface
The process for using it is as follow:
‒ Create an 1x1 offscreen plain surface of format FOURCC_SDIF
‒ Lock the surface. On lock, the driver will allocate and return a pointer to a
AMDDX9SDICOMMPACKET structure. This structure is the communication surface.
‒ Assign and cast the pBits pointer to a locally created AMDDX9SDICOMMPACKET
pointer.
The most essential commands are: AMD_SDI_CMD_GET_CAPS_DATA
AMD_SDI_CMD_CREATE_SURFACE_LOCAL_BEGIN
AMD_SDI_CMD_CREATE_SURFACE_LOCAL_END
AMD_SDI_CMD_CREATE_SURFACE_REMOTE_BEGIN
AMD_SDI_CMD_CREATE_SURFACE_REMOTE_END
AMD_SDI_CMD_QUERY_PHY_ADDRESS_LOCAL
AMD_SDI_CMD_SYNC_WAIT_MARKER
AMD_SDI_CMD_SYNC_WRITE_MARKER
14. 14 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
DIRECTGMA | DX9
Create a local surface that can be accessed by a remote device
hr = RunSDICommand(pd3dDevice, AMD_SDI_CMD_CREATE_SURFACE_LOCAL_BEGIN, NULL, 0, NULL, 0);
if (SUCCEEDED(hr))
{
// Create SDI_LOCAL resources here
hr = pd3dDevice->CreateTexture(width, height, 1, usage, format, D3DPOOL_DEFAULT, ppTex, NULL);
if (SUCCEEDED(hr))
{
hr = MakeAllocDoneViaDumpDraw( pd3dDevice, *ppTex );
hr = RunSDICommand(pd3dDevice, AMD_SDI_CMD_CREATE_SURFACE_LOCAL_END, NULL, 0, (PBYTE)pAttrib, sizeof(AMDDX9SDISURFACEATTRIBUTES));
if (SUCCEEDED(hr))
{
pAttrib->surfaceHandle,
pAttrib->surfaceAddr.surfaceBusAddr,
pAttrib->surfaceAddr.markerBusAddr);
}
}
}
return hr;
15. 15 | DIRECTGMA ON AMD’S FIREPRO™ GPUS | SEPTEMBER 8, 2014 |
DIRECTGMA | DX10 DX 11
The AMD’s DirectGMA extension is accessed by way of the IAmdDxExt interface.
In order to create this interface, the extension client must do the following:
‒ Include the “AmdDxExtSDIApi.h” file
‒ Get the exported function AmdDxExtCreate() from the DXX driver using
GetProcAddress()
‒ Call AmdDxExtCreate to create an IAmdDxExt interface
‒ Get and use the desired specific extension interfaces
‒ Close the AMD DirectX extension interface IAmdDxExt once it is no longer needed
‒ Release the SDI interface IAmdDxExtSDI
‒ Release the extension interface IAmdDxExt