Weitere ähnliche Inhalte Ähnlich wie ISCA final presentation - Runtime (20) Mehr von HSA Foundation (13) Kürzlich hochgeladen (20) ISCA final presentation - Runtime2. OUTLINE
Introduction
HSA Core Runtime API (Pre-release 1.0 provisional)
Initialization and Shut Down
Notifications (Synchronous/Asynchronous)
Agent Information
Signals and Synchronization (Memory-Based)
Queues and Architected Dispatch
Summary
© Copyright 2014 HSA Foundation. All Rights Reserved
3. INTRODUCTION (1)
The HSA core runtime is a thin, user-mode API that provides the interface necessary for
the host to launch compute kernels to the available HSA components.
The overall goal of the HSA core runtime design is to provide a high-performance dispatch
mechanism that is portable across multiple HSA vendor architectures.
The dispatch mechanism differentiates the HSA runtime from other language runtimes by
architected argument setting and kernel launching at the hardware and specification level.
The HSA core runtime API is standard across all HSA vendors, such that languages which use the
HSA runtime can run on different vendor’s platforms that support the API.
The implementation of the HSA runtime may include kernel-level components (required for
some hardware components, ex: AMD Kaveri) or may be entirely user-space (for example,
simulators or CPU implementations).
© Copyright 2014 HSA Foundation. All Rights Reserved
4. Component 1
Driver
Component N…
Vendor m
…
Component 1
Driver
Component N…
Vendor 1
Component 1
HSA Runtime
Component N…
HSA Vendor 1
HSA
Finalizer Component 1
HSA Runtime
Component N…
HSA Vendor m
HSA
Finalizer
INTRODUCTION (2)
Programming Model
Language Runtime
The software architecture stack without HSA runtime
OpenCL
App
Java
App
OpenMP
App
DSL
App
OpenCL
Runtime
Java
Runtime
OpenMP
Runtime
DSL
Runtime
…
…
The software architecture stack with HSA runtime
…
© Copyright 2014 HSA Foundation. All Rights Reserved
5. INTRODUCTION (3)
OpenCL Runtime HSA RuntimeAgent
Start
Program
HSA Memory Allocation
Enqueue Dispatch Packet
Exit
Program Resource Deallocation
Command Queue
Platform, Device, and
Context Initialization
SVM Allocation and
Kernel Arguments Setting
Build Kernel
HSA Runtime Close
HSA Runtime Initialization
and Topology Discovery
HSAIL Finalization and
Linking
© Copyright 2014 HSA Foundation. All Rights Reserved
6. INTRODUCTION (4)
HSA Platform System Architecture Specification support
Runtime initialization and shutdown
Notifications (synchronous/asynchronous)
Agent information
Signals and synchronization (memory-based)
Queues and Architected dispatch
Memory management
HSAIL support
Finalization, linking, and debugging
Image and Sampler support
HSA Runtime
HSA Memory Allocation
Enqueue Dispatch Packet
HSA Runtime Close
HSA Runtime
Initialization and
Topology Discovery
HSAIL Finalization and
Linking
© Copyright 2014 HSA Foundation. All Rights Reserved
9. HSA RUNTIME INITIALIZATION
When the API is invoked for the first time in a given process, a runtime
instance is created.
A typical runtime instance may contain information of platform, topology, reference
count, queues, signals, etc.
The API can be called multiple times by applications
Only a single runtime instance will exist for a given process.
Whenever the API is invoked, the reference count is increased by one.
© Copyright 2014 HSA Foundation. All Rights Reserved
10. HSA RUNTIME SHUT DOWN
When the API is invoked, the reference count is decreased by 1.
When the reference count < 1
All the resources associated with the runtime instance (queues, signals, topology
information, etc.) are considered invalid and any attempt to reference them in
subsequent API calls results in undefined behavior.
The user might call hsa_init to initialize the HSA runtime again.
The HSA runtime might release resources associated with it.
© Copyright 2014 HSA Foundation. All Rights Reserved
11. EXAMPLE – RUNTIME INITIALIZATION (1)
Data structure for
runtime instance
If hsa_init is called more than once,
increase the ref_count by 1
© Copyright 2014 HSA Foundation. All Rights Reserved
12. EXAMPLE – RUNTIME INITIALIZATION (2)
hsa_init is called the first time, allocate
resources and set the reference count
Get the number of HSA agent
Initialize agents
Create an empty agent list
If initialization failed, release resources
Create topology table
© Copyright 2014 HSA Foundation. All Rights Reserved
13. Agent-0
node_id 0
id 0
type CPU
vendor Generic
name Generic
wavefront_size 0
queue_size 200
group_memory 0
fbarrier_max_count 1
is_pic_supported 0
…
…
EXAMPLE - RUNTIME INSTANCE (1)
Platform Name: Generic Memory
node_id 0
id 0
segment_type 111111
address_base 0x0001
size 2048 MB
peak_bandwidth 6553.6 mpbs
Agent-1
node_id 0
id 0
type GPU
vendor Generic
name Generic
wavefront_size 64
queue_size 200
group_memory 64
fbarrier_max_count 1
is_pic_supported 1
Cache
node_id 0
id 0
levels 1
associativity 1
cache size 64KB
cache line size 4
is_inclusive 1
Agent: 2
Memory: 1
Cache: 1
…
…
© Copyright 2014 HSA Foundation. All Rights Reserved
14. Agent-0
node_id = 0
id = 0
agent_type = 1 (CPU)
vendor[16] = Generic
name[16] = Generic
wavefront_size = 0
queue_size =200
group_memory_size_bytes =0
fbarrier_max_count = 1
is_pic_supported = 0
Platform Header File
*base_address = 0x00001
Size = 248
system_timestamp_frequency_
mhz = 200
signal_maximum_wait = 1/200
*node_id
no_nodes = 1
*agent_list
no_agent = 2
*memory_descriptor_list
no_memory_descriptor = 1
*cache_descriptor_list
no_cache_descriptor = 1
EXAMPLE - RUNTIME INSTANCE (2)
…
…
cache
node_id = 0
Id = 0
Levels = 1
* associativity
* cache_size
* cache_line_size
* is_inclusive
1 NULL
64KB NULL
1 NULL
4 NULL
Memory
node_id = 0
Id = 0
supported_segment_type_mask =
111111
virtual_address_base = 0x0001
size_in_bytes = 2048MB
peak_bandwidth_mbps = 6553.6
0 NULL
45 165 NULL
285 NULL
325 NULL
Agent-1
node_id = 0
id = 0
agent_type = 2 (GPU)
vendor[16] = Generic
name[16] = Generic
wavefront_size = 64
queue_size =200
group_memory_size_bytes =64
fbarrier_max_count = 1
is_pic_supported = 1
…
© Copyright 2014 HSA Foundation. All Rights Reserved
15. EXAMPLE – RUNTIME SHUT DOWN
© Copyright 2014 HSA Foundation. All Rights Reserved
If ref_count < 1, then free the list;
Otherwise decrease the ref_count
by 1.
18. SYNCHRONOUS NOTIFICATIONS
Notifications (errors, events, etc.) reported by the runtime can be synchronous or
asynchronous
The HSA runtime uses the return values of API functions to pass notifications
synchronously.
A status code is define as an enumeration, , to capture the return value
of any API function that has been executed, except accessors/mutators.
The notification is a status code that indicates success or error.
Success is represented by HSA_STATUS_SUCCESS, which is equivalent to zero.
An error status is assigned a positive integer and its identifier starts with the
HSA_STATUS_ERROR prefix.
The status code can help to determine a cause of the unsuccessful execution.
© Copyright 2014 HSA Foundation. All Rights Reserved
19. STATUS CODE QUERY
Query additional information on status code
Parameters
status (input): Status code that the user is seeking more information on
status_string (output): An ISO/IEC 646 encoded English language string that potentially
describes the error status
© Copyright 2014 HSA Foundation. All Rights Reserved
20. ASYNCHRONOUS NOTIFICATIONS
The runtime passes asynchronous notifications by calling user-defined
callbacks.
For instance, queues are a common source of asynchronous events because the
tasks queued by an application are asynchronously consumed by the packet
processor. Callbacks are associated with queues when they are created. When the
runtime detects an error in a queue, it invokes the callback associated with that
queue and passes it an error flag (indicating what happened) and a pointer to the
erroneous queue.
The HSA runtime does not implement any default callbacks.
When using blocking functions within the callback implementation, a callback that
does not return can render the runtime state to be undefined.
© Copyright 2014 HSA Foundation. All Rights Reserved
21. EXAMPLE - CALLBACK
Pass the callback function
when create queue
If the queue is empty, set the
event and invoke callback
© Copyright 2014 HSA Foundation. All Rights Reserved
23. OUTLINE
Agent information
hsa_node_t
hsa_agent_t
hsa_agent_info_t
hsa_component_feature_t
Agent Information manipulation APIs
hsa_iterate_agents
hsa_agent_get_info
Example
© Copyright 2014 HSA Foundation. All Rights Reserved
24. INTRODUCTION
The runtime exposes a list of agents that are available in the system.
An HSA agent is a hardware component that participates in the HSA memory model.
An HSA agent can submit AQL packets for execution.
An HSA agent may also but is not required to be an HSA component. It is possible for
a system to include HSA agents that are neither an HSA component nor a host CPU.
HSA agents are defined as opaque handles of type hsa_agent_t .
The HSA runtime provides APIs for applications to traverse the list of available
agents and query attributes of a particular agent.
© Copyright 2014 HSA Foundation. All Rights Reserved
25. AGENT INFORMATION (1)
Opaque agent handle
Opaque NUMA node handle
An HSA memory node is a node that delineates a set of
system components (host CPUs and HSA Components) with
“local” access to a set of memory resources attached to the
node's memory controller and appropriate HSA-compliant
access attributes.
© Copyright 2014 HSA Foundation. All Rights Reserved
26. AGENT INFORMATION (2)
Component features
An HSA component is a hardware or software component that can be a target of the AQL queries
and conforms to the memory model of the HSA.
Values
HSA_COMPONENT_FEATURE_NONE = 0
No component capabilities. The device is an agent, but not a component.
HSA_COMPONENT_FEATURE_BASIC = 1
The component supports the HSAIL instruction set and all the AQL packet types except Agent
dispatch.
HSA_COMPONENT_FEATURE_ALL = 2
The component supports the HSAIL instruction set and all the AQL packet types.
© Copyright 2014 HSA Foundation. All Rights Reserved
27. AGENT INFORMATION (3)
Agent attributes
Values
HSA_AGENT_INFO_MAX_GRID_DIM
HSA_AGENT_INFO_MAX_WORKGROUP_DIM
HSA_AGENT_INFO_QUEUE_MAX_PACKETS
HSA_AGENT_INFO_CLOCK
HSA_AGENT_INFO_CLOCK_FREQUENCY
HSA_AGENT_INFO_MAX_SIGNAL_WAIT
HSA_AGENT_INFO_NAME
HSA_AGENT_INFO_NODE
HSA_AGENT_INFO_COMPONENT_FEATURES
HSA_AGENT_INFO_VENDOR_NAME
HSA_AGENT_INFO_WAVEFRONT_SIZE
HSA_AGENT_INFO_CACHE_SIZE
© Copyright 2014 HSA Foundation. All Rights Reserved
28. AGENT INFORMATION MANIPULATION (1)
Iterate over the available agents, and invoke an application-defined callback on
every iteration
If callback returns a status other than HSA_STATUS_SUCCESS for a particular
iteration, the traversal stops and the function returns that status value.
Parameters
callback (input): Callback to be invoked once per agent
data (input): Application data that is passed to callback on every iteration. Can be
NULL.
© Copyright 2014 HSA Foundation. All Rights Reserved
29. AGENT INFORMATION MANIPULATION (2)
Get the current value of an attribute for a given agent
Parameters
agent (input): A valid agent
attribute (input): Attribute to query
value (output): Pointer to a user-allocated buffer where to store the value of the
attribute. If the buffer passed by the application is not large enough to hold the value
of attribute, the behavior is undefined.
© Copyright 2014 HSA Foundation. All Rights Reserved
30. EXAMPLE - AGENT ATTRIBUTE QUERY
Copy agent attribute information
Get the agent handle of Agent 0
© Copyright 2014 HSA Foundation. All Rights Reserved
32. OUTLIINE
Signal
Signal manipulation API
Create/Destroy
Query
Send
Atomic Operations
Signal wait
Get time out
Signal Condition
Example
© Copyright 2014 HSA Foundation. All Rights Reserved
33. SIGNAL (1)
HSA agents can communicate with each other by using coherent global memory,
or by using signals.
A signal is represented by an opaque signal handle
A signal carries a value, which can be updated or conditionally waited upon via
an API call or HSAIL instruction.
The value occupies four or eight bytes depending on the machine model in use.
© Copyright 2014 HSA Foundation. All Rights Reserved
34. SIGNAL (2)
Updating the value of a signal is equivalent to sending the signal.
In addition to the update (store) of signals, the API for sending signal must
support other atomic operations with specific memory order semantics
Atomic operations: AND, OR, XOR, Add, Subtract, Exchange, and CAS
Memory order semantics : Release and Relaxed
© Copyright 2014 HSA Foundation. All Rights Reserved
35. SIGNAL CREATE/DESTROY
Create a signal
Parameters
initial_value (input): Initial value of the
signal.
signal_handle (output): Signal handle.
Destroy a signal previous created by
hsa_signal_create
Parameter
signal_handle (input): Signal handle.
© Copyright 2014 HSA Foundation. All Rights Reserved
36. Send and atomically set the value of a signal
with release semantics
SIGNAL LOAD/STORE
Atomically read the current signal value with
acquire semantics
Atomically read the current signal value with
relaxed semantics
Send and atomically set the value of a signal with
relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
37. Send and atomically increment the value of a
signal by a given amount with release semantics
SIGNAL ADD/SUBTRACT
Send and atomically decrement the value of a
signal by a given amount with release semantics
Send and atomically increment the value of a
signal by a given amount with relaxed semantics
Send and atomically decrement the value of a
signal by a given amount with relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
38. Send and atomically perform a logical AND operation
on the value of a signal and a given value with
release semantics
SIGNAL AND (OR, XOR)/EXCHANGE
Send and atomically set the value of a signal and
return its previous value with release semantics
Send and atomically perform a logical AND operation
on the value of a signal and a given value with
relaxed semantics
Send and atomically set the value of a signal and
return its previous value with relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
39. SIGNAL WAIT (1)
The application may wait on a signal, with a condition specifying the terms of
wait.
Signal wait condition operator
Values
HSA_EQ: The two operands are equal.
HSA_NE: The two operands are not equal.
HSA_LT: The first operand is less than the second operand.
HSA_GTE: The first operand is greater than or equal to the second operand.
© Copyright 2014 HSA Foundation. All Rights Reserved
40. SIGNAL WAIT (2)
The wait can be done either in the HSA component via an HSAIL wait instruction
or via a runtime API defined here.
Waiting on a signal returns the current value at the opaque signal object;
The wait may have a runtime defined timeout which indicates the maximum amount of time that an
implementation can spend waiting.
The signal infrastructure allows for multiple senders/waiters on a single signal.
Wait reads the value, hence acquire synchronizations may be applied.
© Copyright 2014 HSA Foundation. All Rights Reserved
41. SIGNAL WAIT (3)
Signal wait
Parameters
signal_handle (input): A signal handle
condition (input): Condition used to compare the passed and signal values
compare_ value (input): Value to compare with
return_value (output): A pointer where the current signal value must be read into
© Copyright 2014 HSA Foundation. All Rights Reserved
42. SIGNAL WAIT (4)
Signal wait with timeout
Parameters
signal_handle (input): A signal handle
timeout (input): Maximum wait duration (A value of zero indicates no maximum)
long_wait (input): Hint indicating that the signal value is not expected to meet the given condition in
a short period of time. The HSA runtime may use this hint to optimize the wait implementation.
condition (input): Condition used to compare the passed and signal values
compare_ value (input): Value to compare with
return_value (output): A pointer where the current signal value must be read into
© Copyright 2014 HSA Foundation. All Rights Reserved
43. EXAMPLE – SIGNAL WAIT (1)
thread_1 thread_2
thread_1 is blocked
hsa_signal_add_relaxed
(value = value + 3)
Return signal value
Condition satisfied, the
execution of thread_1
continues
value = 0
Timeline Timeline
value = 3
hsa_signal_substract_relaxed
(value = value - 1)value = 2
hsa_signal_wait_timeout_acquire
(value == 2)
© Copyright 2014 HSA Foundation. All Rights Reserved
44. EXAMPLE – SIGNAL WAIT (2)
If signal_handle is invalid, then return signal invalid status
Compare tmp->value with compare_value to see if the
condition is satisfied?
If timeout = 0 then return signal time out status
Signal wait condition function
If the condition is satisfied, then return signal and status
© Copyright 2014 HSA Foundation. All Rights Reserved
46. OUTLINE
Queues
Queue Types and Structure
HSA runtime API for Queue Manipulations
Architected Queuing Language (AQL) Support
Packet type
Packet header
Examples
Enqueue Packet
Packet Processor
© Copyright 2014 HSA Foundation. All Rights Reserved
47. INTRODUCTION (1)
An HSA-compliant platform supports multiple user-level command queues allocation.
A use-level command queue is characterized as runtime-allocated, user-level accessible virtual
memory of a certain size, containing packets defined in the Architected Queuing Language (AQL
packets).
Queues are allocated by HSA applications through the HSA runtime.
HSA software receives memory-based structures to configure the hardware queues to
allow for efficient software management of the hardware queues of the HSA agents.
This queue memory shall be processed by the HSA Packet Processor as a ring buffer.
Queues are read-only data structures.
Writing values directly to a queue structure results in undefined behavior.
But HSA agents can directly modify the contents of the buffer pointed by base_address, or use
runtime APIs to access the doorbell signal or the service queue.
© Copyright 2014 HSA Foundation. All Rights Reserved
48. Two queue types, AQL and Service Queues, are supported
AQL Queue consumes AQL packets that are used to specify the information of kernel functions
that will be executed on the HSA component
Service Queue consumes agent dispatch packets that are used to specify runtime-defined or user
registered functions that will be executed on the agent (typically, the host CPU)
INTRODUCTION (2)
© Copyright 2014 HSA Foundation. All Rights Reserved
50. INTRODUCTION (4)
In addition to the data held in the queue structure, the queue also defines two
properties (readIndex and writeIndex) that define the location of “head” and “tail”
of the queue.
readIndex: The read index is a 64-bit unsigned integer that specifies the packetID of
the next AQL packet to be consumed by the packet processor.
writeIndex: The write index is a 64-bit unsigned integer that specifies the packetID of
the next AQL packet slot to be allocated.
Both indices are not directly exposed to the user, who can only access them by using
dedicated HSA core runtime APIs.
The available index functions differ on the index of interest (read or write), action to be
performed (addition, compare and swap, etc.), and memory consistency model
(relaxed, release, etc.).
© Copyright 2014 HSA Foundation. All Rights Reserved
51. INTRODUCTION (5)
The read index is automatically advanced when a packet is read by the packet
processor.
When the packet processor observes that
The read index matches the write index, the queue can be considered empty;
The write index is greater than or equal to the sum of the read index and the size of
the queue, then the queue is full.
The doorbell_signal field of a queue contains a signal that is used by the agent
to inform the packet processor to process the packets it writes.
The value that the doorbell signaled is equal to the ID of the packet that is ready to be
launched.
© Copyright 2014 HSA Foundation. All Rights Reserved
52. INTRODUCTION (6)
The new task might be consumed by the packet processor even before the
doorbell signal has been signaled by the agent.
This is because the packet processor might be already processing some other
packets and observes that there is new work available, so it processes the new
packets.
In any case, the agent must ring the doorbell for every batch of packets it writes.
© Copyright 2014 HSA Foundation. All Rights Reserved
53. QUEUE CREATE/DESTROY
Create a user mode queue
When a queue is created, the runtime also
allocates the packet buffer and the completion
signal.
The application should only rely on the status
code returned to determine if the queue is valid
Destroy a user mode queue
A destroyed queue might not be accessed after being
destroyed.
When a queue is destroyed, the state of the AQL packets
that have not been yet fully processed becomes undefined.
© Copyright 2014 HSA Foundation. All Rights Reserved
54. GET READ/WRITE INDEX
Atomically retrieve read index of a queue with
acquire semantics
Atomically retrieve write index of a queue with
acquire semantics
Atomically retrieve read index of a queue with
relaxed semantics
Atomically retrieve write index of a queue with
relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
55. SET READ/WRITE INDEX
Atomically set the read index of a queue with
release semantics
Atomically set the read index of a queue with
relaxed semantics
Atomically set the write index of a queue with
release semantics
Atomically set the write index of a queue with
relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
56. COMPARE AND SWAP WRITE INDEX
Atomically compare and set the write index of a
queue with acquire/release/relaxed/acquire-
release semantics
Parameters
queue (input): A queue
expected (input): The expected index value
val (input): Value to copy to the write index if expected
matches the observed write index
Return value
Previous value of the write index
© Copyright 2014 HSA Foundation. All Rights Reserved
57. ADD WRITE INDEX
Atomically increment the write index of a
queue by an offset with
release/acquire/relaxed/acquire-release
semantics
Parameters
queue (input): A queue
val (input): The value to add to the write index
Return value
Previous value of the write index
© Copyright 2014 HSA Foundation. All Rights Reserved
58. ARCHITECTED QUEUING LANGUAGE (AQL)
An HSA-compliant system provides a command interface for the dispatch of
HSA agent commands.
This command interface is provided by the Architected Queuing Language (AQL).
AQL allows HSA agents to build and enqueue their own command packets,
enabling fast and low-power dispatch.
AQL also provides support for HSA component queue submissions
The HSA component kernel can write commands in AQL format.
© Copyright 2014 HSA Foundation. All Rights Reserved
59. AQL PACKET (1)
AQL packet format
Values
Always reserved packet (0): Packet format is set to always reserved when the queue is initialized.
Invalid packet (1): Packet format is set to invalid when the readIndex is incremented, making the
packet slot available to the HSA agents.
Dispatch packet (2): Dispatch packets contain jobs for the HSA component and are created by
HSA agents.
Barrier packet (3): Barrier packets can be inserted by HSA agents to delay processing subsequent
packets. All queues support barrier packets.
Agent dispatch packet (4): Dispatch packets contain jobs for the HSA agent and are created by
HSA agents.
© Copyright 2014 HSA Foundation. All Rights Reserved
60. AQL PACKET (2)
HSA signaling object handle used to indicate completion of the job
© Copyright 2014 HSA Foundation. All Rights Reserved
61. EXAMPLE - ENQUEUE AQL PACKET (1)
An HSA agent submits a task to a queue by performing the following steps:
Allocate a packet slot (by incrementing the writeIndex)
Initialize the packet and copy packet to a queue associated with the Packet Processor
Mark packet as valid
Notify the Packet Processor of the packet (With doorbell signal)
© Copyright 2014 HSA Foundation. All Rights Reserved
62. EXAMPLE - ENQUEUE AQL PACKET (2)
Dispatch Queue
Allocate an AQL packet slot
Copy the packet into queue. Note
that, we can have a lock here to
prevent race condition in
multithread environment
WriteIndex
ReadIndex
Initialize
packet
Send doorbell signal
© Copyright 2014 HSA Foundation. All Rights Reserved
63. EXAMPLE - PACKET PROCESSOR
WriteIndex
ReadIndex
Get packet content
Check if barrier packet
Update readIndex, change packet state to invalid,
and send completion signal.
Receive doorbell
Dispatch Queue
If there is any packet in queue, process the packet.
© Copyright 2014 HSA Foundation. All Rights Reserved
65. OUTLINE
Memory registration and deregistration
Memory region and memory segment
APIs for memory region manipulation
APIs for memory registration and deregistration
© Copyright 2014 HSA Foundation. All Rights Reserved
66. INTRODUCTION
One of the key features of HSA is its ability to share global pointers between the
host application and code executing on the HSA component.
This ability means that an application can directly pass a pointer to memory allocated on the host
to a kernel function dispatched to a component without an intermediate copy
When a buffer created in the host is also accessed by a component,
programmers are encouraged to register the corresponding address range
beforehand.
Registering memory expresses an intention to access (read or write) the passed buffer from a
component other than the host. This is a performance hint that allows the runtime implementation
to know which buffers will be accessed by some of the components ahead of time.
When an HSA program no longer needs to access a registered buffer in a device,
the user should deregister that virtual address range.
© Copyright 2014 HSA Foundation. All Rights Reserved
67. MEMORY REGION/SEGMENT
A memory region represents a virtual memory interval that is visible to a particular agent,
and contains properties about how memory is accessed or allocated from that agent.
Memory segments
Values
HSA_SEGMENT_GLOBAL = 1
HSA_SEGMENT_PRIVATE = 2
HSA_SEGMENT_GROUP = 4
HSA_SEGMENT_KERNARG = 8
HSA_SEGMENT_READONLY = 16
HSA_SEGMENT_IMAGE = 32
© Copyright 2014 HSA Foundation. All Rights Reserved
68. MEMORY REGION INFORMATION
Attributes of a memory region
Values
HSA_REGION_INFO_BASE_ADDRESS
HSA_REGION_INFO_SIZE
HSA_REGION_INFO_NODE
HSA_REGION_INFO_MAX_ALLOCATION_SIZE
HSA_REGION_INFO_SEGMENT
HSA_REGION_INFO_BANDWIDTH
HSA_REGION_INFO_CACHED
© Copyright 2014 HSA Foundation. All Rights Reserved
69. MEMORY REGION MANIPULATION (1)
Get the current value of an attribute of a region
Iterate over the memory regions that are visible to an agent, and invoke an
application-defined callback on every iteration
If callback returns a status other than HSA_STATUS_SUCCESS for a particular iteration, the
traversal stops and the function returns that status value.
© Copyright 2014 HSA Foundation. All Rights Reserved
70. MEMORY REGION MANIPULATION (2)
Allocate a block of memory
Deallocate a block of memory previously allocated
using hsa_memory_allocate
Copy block of memory
Copying a number of bytes larger than the size of the
memory regions pointed by dst or src results in
undefined behavior.
© Copyright 2014 HSA Foundation. All Rights Reserved
71. MEMORY REGISTRATION/DEREGISTRATION
Register memory
Parameters
address (input): A pointer to the base of
the memory region to be registered. If a
NULL pointer is passed, no operation is
performed.
size (input): Requested registration size
in bytes. A size of zero is only allowed if
address is NULL.
Deregister memory previously registered
using hsa_memory_register
Parameter
address (input): A pointer to the base of the
memory region to be registered. If a NULL
pointer is passed, no operation is performed.
© Copyright 2014 HSA Foundation. All Rights Reserved
72. EXAMPLE
Allocate a memory space
Use hsa_region_get_info to get the
size in byte of this memory space
Register this memory space for a
performance hint
Finish operation, deregister and
free this memory space
© Copyright 2014 HSA Foundation. All Rights Reserved
74. SUMMARY
Covered
HSA Core Runtime API (Pre-release 1.0 provisional)
Runtime Initialization and Shutdown (Open/Close)
Notifications (Synchronous/Asynchronous)
Agent Information
Signals and Synchronization (Memory-Based)
Queues and Architected Dispatch
Memory Management
Not covered
Extension of Core Runtime
HSAIL Finalization, Linking, and Debugging
Images and Samplers
© Copyright 2014 HSA Foundation. All Rights Reserved
Hinweis der Redaktion Queue type
HSA_QUEUE_TYPE_MULTI = 0, multiple producers are supported
HSA_QUEUE_TYPE_SINGLE = 1, only a single producer is supported
Queue features
HSA_QUEUE_FEATURE_DISPATCH = 1, queue supports dispatch packets.
HSA_QUEUE_FEATURE_AGENT_DISPATCH = 2, queue supports agent dispatch packets
service_queue
A pointer to another user mode queue that can be used by the HSAIL kernel to request system services.
Service_queue_type:
NONE (no service queue),
COMMON (runtime provided service queue that is shared),
NEW (require the runtime to create a new queue). acquire_fence_scope : Determine the scope and type of the memory fence operation applied before the packet enters the
active phase.
release_fence_scope : Determine the scope and type of the memory fence operation applied after kernel completion but
before the packet is completed.
HSA_FENCE_SCOPE_NONE = 0
No scope. Only valid for barrier packets.
HSA_FENCE_SCOPE_COMPONENT = 1
The fence is applied with component scope for the global segment.
HSA_FENCE_SCOPE_SYSTEM = 2
The fence is applied with system scope for the global segment.