SlideShare ist ein Scribd-Unternehmen logo
1 von 75
HSA RUNTIME
YEN-CHING CHUNG, NATIONAL TSING HUA
UNIVERSITY
OUTLINE
 Introduction
 HSA Core Runtime API (Pre-release 1.0 provisional)
 Initialization and Shut Down
 Notifications (Synchronous/Asynchronous)
 Agent Information
 Signals and Synchronization (Memory-Based)
 Queues and Architected Dispatch
 Summary
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION (1)
 The HSA core runtime is a thin, user-mode API that provides the interface necessary for
the host to launch compute kernels to the available HSA components.
 The overall goal of the HSA core runtime design is to provide a high-performance dispatch
mechanism that is portable across multiple HSA vendor architectures.
 The dispatch mechanism differentiates the HSA runtime from other language runtimes by
architected argument setting and kernel launching at the hardware and specification level.
 The HSA core runtime API is standard across all HSA vendors, such that languages which use the
HSA runtime can run on different vendor’s platforms that support the API.
 The implementation of the HSA runtime may include kernel-level components (required for
some hardware components, ex: AMD Kaveri) or may be entirely user-space (for example,
simulators or CPU implementations).
© Copyright 2014 HSA Foundation. All Rights Reserved
Component 1
Driver
Component N…
Vendor m
…
Component 1
Driver
Component N…
Vendor 1
Component 1
HSA Runtime
Component N…
HSA Vendor 1
HSA
Finalizer Component 1
HSA Runtime
Component N…
HSA Vendor m
HSA
Finalizer
INTRODUCTION (2)
Programming Model
Language Runtime
 The software architecture stack without HSA runtime
OpenCL
App
Java
App
OpenMP
App
DSL
App
OpenCL
Runtime
Java
Runtime
OpenMP
Runtime
DSL
Runtime
…
…
 The software architecture stack with HSA runtime
…
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION (3)
OpenCL Runtime HSA RuntimeAgent
Start
Program
HSA Memory Allocation
Enqueue Dispatch Packet
Exit
Program Resource Deallocation
Command Queue
Platform, Device, and
Context Initialization
SVM Allocation and
Kernel Arguments Setting
Build Kernel
HSA Runtime Close
HSA Runtime Initialization
and Topology Discovery
HSAIL Finalization and
Linking
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION (4)
 HSA Platform System Architecture Specification support
 Runtime initialization and shutdown
 Notifications (synchronous/asynchronous)
 Agent information
 Signals and synchronization (memory-based)
 Queues and Architected dispatch
 Memory management
 HSAIL support
 Finalization, linking, and debugging
 Image and Sampler support
HSA Runtime
HSA Memory Allocation
Enqueue Dispatch Packet
HSA Runtime Close
HSA Runtime
Initialization and
Topology Discovery
HSAIL Finalization and
Linking
© Copyright 2014 HSA Foundation. All Rights Reserved
RUNTIME INITIALIZATION AND
SHUTDOWN
OUTLINE
 Runtime Initialization API
 hsa_init
 Runtime Shut Down API
 hsa_shut_down
 Examples
© Copyright 2014 HSA Foundation. All Rights Reserved
HSA RUNTIME INITIALIZATION
 When the API is invoked for the first time in a given process, a runtime
instance is created.
 A typical runtime instance may contain information of platform, topology, reference
count, queues, signals, etc.
 The API can be called multiple times by applications
 Only a single runtime instance will exist for a given process.
 Whenever the API is invoked, the reference count is increased by one.
© Copyright 2014 HSA Foundation. All Rights Reserved
HSA RUNTIME SHUT DOWN
 When the API is invoked, the reference count is decreased by 1.
 When the reference count < 1
 All the resources associated with the runtime instance (queues, signals, topology
information, etc.) are considered invalid and any attempt to reference them in
subsequent API calls results in undefined behavior.
 The user might call hsa_init to initialize the HSA runtime again.
 The HSA runtime might release resources associated with it.
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE – RUNTIME INITIALIZATION (1)
Data structure for
runtime instance
If hsa_init is called more than once,
increase the ref_count by 1
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE – RUNTIME INITIALIZATION (2)
hsa_init is called the first time, allocate
resources and set the reference count
Get the number of HSA agent
Initialize agents
Create an empty agent list
If initialization failed, release resources
Create topology table
© Copyright 2014 HSA Foundation. All Rights Reserved
Agent-0
node_id 0
id 0
type CPU
vendor Generic
name Generic
wavefront_size 0
queue_size 200
group_memory 0
fbarrier_max_count 1
is_pic_supported 0
…
…
EXAMPLE - RUNTIME INSTANCE (1)
Platform Name: Generic Memory
node_id 0
id 0
segment_type 111111
address_base 0x0001
size 2048 MB
peak_bandwidth 6553.6 mpbs
Agent-1
node_id 0
id 0
type GPU
vendor Generic
name Generic
wavefront_size 64
queue_size 200
group_memory 64
fbarrier_max_count 1
is_pic_supported 1
Cache
node_id 0
id 0
levels 1
associativity 1
cache size 64KB
cache line size 4
is_inclusive 1
Agent: 2
Memory: 1
Cache: 1
…
…
© Copyright 2014 HSA Foundation. All Rights Reserved
Agent-0
node_id = 0
id = 0
agent_type = 1 (CPU)
vendor[16] = Generic
name[16] = Generic
wavefront_size = 0
queue_size =200
group_memory_size_bytes =0
fbarrier_max_count = 1
is_pic_supported = 0
Platform Header File
*base_address = 0x00001
Size = 248
system_timestamp_frequency_
mhz = 200
signal_maximum_wait = 1/200
*node_id
no_nodes = 1
*agent_list
no_agent = 2
*memory_descriptor_list
no_memory_descriptor = 1
*cache_descriptor_list
no_cache_descriptor = 1
EXAMPLE - RUNTIME INSTANCE (2)
…
…
cache
node_id = 0
Id = 0
Levels = 1
* associativity
* cache_size
* cache_line_size
* is_inclusive
1 NULL
64KB NULL
1 NULL
4 NULL
Memory
node_id = 0
Id = 0
supported_segment_type_mask =
111111
virtual_address_base = 0x0001
size_in_bytes = 2048MB
peak_bandwidth_mbps = 6553.6
0 NULL
45 165 NULL
285 NULL
325 NULL
Agent-1
node_id = 0
id = 0
agent_type = 2 (GPU)
vendor[16] = Generic
name[16] = Generic
wavefront_size = 64
queue_size =200
group_memory_size_bytes =64
fbarrier_max_count = 1
is_pic_supported = 1
…
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE – RUNTIME SHUT DOWN
© Copyright 2014 HSA Foundation. All Rights Reserved
If ref_count < 1, then free the list;
Otherwise decrease the ref_count
by 1.
NOTIFICATIONS
(SYNCHRONOUS/ASYNCHRONOUS)
OUTLINE
 Synchronous Notifications
 hsa_status_t
 hsa_status_string
 Asynchronous Notifications
 Example
© Copyright 2014 HSA Foundation. All Rights Reserved
SYNCHRONOUS NOTIFICATIONS
 Notifications (errors, events, etc.) reported by the runtime can be synchronous or
asynchronous
 The HSA runtime uses the return values of API functions to pass notifications
synchronously.
 A status code is define as an enumeration, , to capture the return value
of any API function that has been executed, except accessors/mutators.
 The notification is a status code that indicates success or error.
 Success is represented by HSA_STATUS_SUCCESS, which is equivalent to zero.
 An error status is assigned a positive integer and its identifier starts with the
HSA_STATUS_ERROR prefix.
 The status code can help to determine a cause of the unsuccessful execution.
© Copyright 2014 HSA Foundation. All Rights Reserved
STATUS CODE QUERY
 Query additional information on status code
 Parameters
 status (input): Status code that the user is seeking more information on
 status_string (output): An ISO/IEC 646 encoded English language string that potentially
describes the error status
© Copyright 2014 HSA Foundation. All Rights Reserved
ASYNCHRONOUS NOTIFICATIONS
 The runtime passes asynchronous notifications by calling user-defined
callbacks.
 For instance, queues are a common source of asynchronous events because the
tasks queued by an application are asynchronously consumed by the packet
processor. Callbacks are associated with queues when they are created. When the
runtime detects an error in a queue, it invokes the callback associated with that
queue and passes it an error flag (indicating what happened) and a pointer to the
erroneous queue.
 The HSA runtime does not implement any default callbacks.
 When using blocking functions within the callback implementation, a callback that
does not return can render the runtime state to be undefined.
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE - CALLBACK
Pass the callback function
when create queue
If the queue is empty, set the
event and invoke callback
© Copyright 2014 HSA Foundation. All Rights Reserved
AGENT INFORMATION
OUTLINE
 Agent information
 hsa_node_t
 hsa_agent_t
 hsa_agent_info_t
 hsa_component_feature_t
 Agent Information manipulation APIs
 hsa_iterate_agents
 hsa_agent_get_info
 Example
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION
 The runtime exposes a list of agents that are available in the system.
 An HSA agent is a hardware component that participates in the HSA memory model.
 An HSA agent can submit AQL packets for execution.
 An HSA agent may also but is not required to be an HSA component. It is possible for
a system to include HSA agents that are neither an HSA component nor a host CPU.
 HSA agents are defined as opaque handles of type hsa_agent_t .
 The HSA runtime provides APIs for applications to traverse the list of available
agents and query attributes of a particular agent.
© Copyright 2014 HSA Foundation. All Rights Reserved
AGENT INFORMATION (1)
 Opaque agent handle
 Opaque NUMA node handle
 An HSA memory node is a node that delineates a set of
system components (host CPUs and HSA Components) with
“local” access to a set of memory resources attached to the
node's memory controller and appropriate HSA-compliant
access attributes.
© Copyright 2014 HSA Foundation. All Rights Reserved
AGENT INFORMATION (2)
 Component features
 An HSA component is a hardware or software component that can be a target of the AQL queries
and conforms to the memory model of the HSA.
 Values
 HSA_COMPONENT_FEATURE_NONE = 0
 No component capabilities. The device is an agent, but not a component.
 HSA_COMPONENT_FEATURE_BASIC = 1
 The component supports the HSAIL instruction set and all the AQL packet types except Agent
dispatch.
 HSA_COMPONENT_FEATURE_ALL = 2
 The component supports the HSAIL instruction set and all the AQL packet types.
© Copyright 2014 HSA Foundation. All Rights Reserved
AGENT INFORMATION (3)
 Agent attributes
 Values
 HSA_AGENT_INFO_MAX_GRID_DIM
 HSA_AGENT_INFO_MAX_WORKGROUP_DIM
 HSA_AGENT_INFO_QUEUE_MAX_PACKETS
 HSA_AGENT_INFO_CLOCK
 HSA_AGENT_INFO_CLOCK_FREQUENCY
 HSA_AGENT_INFO_MAX_SIGNAL_WAIT
 HSA_AGENT_INFO_NAME
 HSA_AGENT_INFO_NODE
 HSA_AGENT_INFO_COMPONENT_FEATURES
 HSA_AGENT_INFO_VENDOR_NAME
 HSA_AGENT_INFO_WAVEFRONT_SIZE
 HSA_AGENT_INFO_CACHE_SIZE
© Copyright 2014 HSA Foundation. All Rights Reserved
AGENT INFORMATION MANIPULATION (1)
 Iterate over the available agents, and invoke an application-defined callback on
every iteration
 If callback returns a status other than HSA_STATUS_SUCCESS for a particular
iteration, the traversal stops and the function returns that status value.
 Parameters
 callback (input): Callback to be invoked once per agent
 data (input): Application data that is passed to callback on every iteration. Can be
NULL.
© Copyright 2014 HSA Foundation. All Rights Reserved
AGENT INFORMATION MANIPULATION (2)
 Get the current value of an attribute for a given agent
 Parameters
 agent (input): A valid agent
 attribute (input): Attribute to query
 value (output): Pointer to a user-allocated buffer where to store the value of the
attribute. If the buffer passed by the application is not large enough to hold the value
of attribute, the behavior is undefined.
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE - AGENT ATTRIBUTE QUERY
Copy agent attribute information
Get the agent handle of Agent 0
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNALS AND SYNCHRONIZATION
(MEMORY-BASED)
OUTLIINE
 Signal
 Signal manipulation API
 Create/Destroy
 Query
 Send
 Atomic Operations
 Signal wait
 Get time out
 Signal Condition
 Example
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNAL (1)
 HSA agents can communicate with each other by using coherent global memory,
or by using signals.
 A signal is represented by an opaque signal handle
 A signal carries a value, which can be updated or conditionally waited upon via
an API call or HSAIL instruction.
 The value occupies four or eight bytes depending on the machine model in use.
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNAL (2)
 Updating the value of a signal is equivalent to sending the signal.
 In addition to the update (store) of signals, the API for sending signal must
support other atomic operations with specific memory order semantics
 Atomic operations: AND, OR, XOR, Add, Subtract, Exchange, and CAS
 Memory order semantics : Release and Relaxed
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNAL CREATE/DESTROY
 Create a signal
 Parameters
 initial_value (input): Initial value of the
signal.
 signal_handle (output): Signal handle.
 Destroy a signal previous created by
hsa_signal_create
 Parameter
 signal_handle (input): Signal handle.
© Copyright 2014 HSA Foundation. All Rights Reserved
 Send and atomically set the value of a signal
with release semantics
SIGNAL LOAD/STORE
 Atomically read the current signal value with
acquire semantics
 Atomically read the current signal value with
relaxed semantics
 Send and atomically set the value of a signal with
relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
 Send and atomically increment the value of a
signal by a given amount with release semantics
SIGNAL ADD/SUBTRACT
 Send and atomically decrement the value of a
signal by a given amount with release semantics
 Send and atomically increment the value of a
signal by a given amount with relaxed semantics
 Send and atomically decrement the value of a
signal by a given amount with relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
 Send and atomically perform a logical AND operation
on the value of a signal and a given value with
release semantics
SIGNAL AND (OR, XOR)/EXCHANGE
 Send and atomically set the value of a signal and
return its previous value with release semantics
 Send and atomically perform a logical AND operation
on the value of a signal and a given value with
relaxed semantics
 Send and atomically set the value of a signal and
return its previous value with relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNAL WAIT (1)
 The application may wait on a signal, with a condition specifying the terms of
wait.
 Signal wait condition operator
 Values
 HSA_EQ: The two operands are equal.
 HSA_NE: The two operands are not equal.
 HSA_LT: The first operand is less than the second operand.
 HSA_GTE: The first operand is greater than or equal to the second operand.
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNAL WAIT (2)
 The wait can be done either in the HSA component via an HSAIL wait instruction
or via a runtime API defined here.
 Waiting on a signal returns the current value at the opaque signal object;
 The wait may have a runtime defined timeout which indicates the maximum amount of time that an
implementation can spend waiting.
 The signal infrastructure allows for multiple senders/waiters on a single signal.
 Wait reads the value, hence acquire synchronizations may be applied.
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNAL WAIT (3)
 Signal wait
 Parameters
 signal_handle (input): A signal handle
 condition (input): Condition used to compare the passed and signal values
 compare_ value (input): Value to compare with
 return_value (output): A pointer where the current signal value must be read into
© Copyright 2014 HSA Foundation. All Rights Reserved
SIGNAL WAIT (4)
 Signal wait with timeout
 Parameters
 signal_handle (input): A signal handle
 timeout (input): Maximum wait duration (A value of zero indicates no maximum)
 long_wait (input): Hint indicating that the signal value is not expected to meet the given condition in
a short period of time. The HSA runtime may use this hint to optimize the wait implementation.
 condition (input): Condition used to compare the passed and signal values
 compare_ value (input): Value to compare with
 return_value (output): A pointer where the current signal value must be read into
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE – SIGNAL WAIT (1)
thread_1 thread_2
thread_1 is blocked
hsa_signal_add_relaxed
(value = value + 3)
Return signal value
Condition satisfied, the
execution of thread_1
continues
value = 0
Timeline Timeline
value = 3
hsa_signal_substract_relaxed
(value = value - 1)value = 2
hsa_signal_wait_timeout_acquire
(value == 2)
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE – SIGNAL WAIT (2)
If signal_handle is invalid, then return signal invalid status
Compare tmp->value with compare_value to see if the
condition is satisfied?
If timeout = 0 then return signal time out status
Signal wait condition function
If the condition is satisfied, then return signal and status
© Copyright 2014 HSA Foundation. All Rights Reserved
QUEUES AND ARCHITECTED
DISPATCH
OUTLINE
 Queues
 Queue Types and Structure
 HSA runtime API for Queue Manipulations
 Architected Queuing Language (AQL) Support
 Packet type
 Packet header
 Examples
 Enqueue Packet
 Packet Processor
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION (1)
 An HSA-compliant platform supports multiple user-level command queues allocation.
 A use-level command queue is characterized as runtime-allocated, user-level accessible virtual
memory of a certain size, containing packets defined in the Architected Queuing Language (AQL
packets).
 Queues are allocated by HSA applications through the HSA runtime.
 HSA software receives memory-based structures to configure the hardware queues to
allow for efficient software management of the hardware queues of the HSA agents.
 This queue memory shall be processed by the HSA Packet Processor as a ring buffer.
 Queues are read-only data structures.
 Writing values directly to a queue structure results in undefined behavior.
 But HSA agents can directly modify the contents of the buffer pointed by base_address, or use
runtime APIs to access the doorbell signal or the service queue.
© Copyright 2014 HSA Foundation. All Rights Reserved
 Two queue types, AQL and Service Queues, are supported
 AQL Queue consumes AQL packets that are used to specify the information of kernel functions
that will be executed on the HSA component
 Service Queue consumes agent dispatch packets that are used to specify runtime-defined or user
registered functions that will be executed on the agent (typically, the host CPU)
INTRODUCTION (2)
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION (3)
 AQL queue structure
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION (4)
 In addition to the data held in the queue structure, the queue also defines two
properties (readIndex and writeIndex) that define the location of “head” and “tail”
of the queue.
 readIndex: The read index is a 64-bit unsigned integer that specifies the packetID of
the next AQL packet to be consumed by the packet processor.
 writeIndex: The write index is a 64-bit unsigned integer that specifies the packetID of
the next AQL packet slot to be allocated.
 Both indices are not directly exposed to the user, who can only access them by using
dedicated HSA core runtime APIs.
 The available index functions differ on the index of interest (read or write), action to be
performed (addition, compare and swap, etc.), and memory consistency model
(relaxed, release, etc.).
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION (5)
 The read index is automatically advanced when a packet is read by the packet
processor.
 When the packet processor observes that
 The read index matches the write index, the queue can be considered empty;
 The write index is greater than or equal to the sum of the read index and the size of
the queue, then the queue is full.
 The doorbell_signal field of a queue contains a signal that is used by the agent
to inform the packet processor to process the packets it writes.
 The value that the doorbell signaled is equal to the ID of the packet that is ready to be
launched.
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION (6)
 The new task might be consumed by the packet processor even before the
doorbell signal has been signaled by the agent.
 This is because the packet processor might be already processing some other
packets and observes that there is new work available, so it processes the new
packets.
 In any case, the agent must ring the doorbell for every batch of packets it writes.
© Copyright 2014 HSA Foundation. All Rights Reserved
QUEUE CREATE/DESTROY
 Create a user mode queue
 When a queue is created, the runtime also
allocates the packet buffer and the completion
signal.
 The application should only rely on the status
code returned to determine if the queue is valid
 Destroy a user mode queue
 A destroyed queue might not be accessed after being
destroyed.
 When a queue is destroyed, the state of the AQL packets
that have not been yet fully processed becomes undefined.
© Copyright 2014 HSA Foundation. All Rights Reserved
GET READ/WRITE INDEX
 Atomically retrieve read index of a queue with
acquire semantics
 Atomically retrieve write index of a queue with
acquire semantics
 Atomically retrieve read index of a queue with
relaxed semantics
 Atomically retrieve write index of a queue with
relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
SET READ/WRITE INDEX
 Atomically set the read index of a queue with
release semantics
 Atomically set the read index of a queue with
relaxed semantics
 Atomically set the write index of a queue with
release semantics
 Atomically set the write index of a queue with
relaxed semantics
© Copyright 2014 HSA Foundation. All Rights Reserved
COMPARE AND SWAP WRITE INDEX
 Atomically compare and set the write index of a
queue with acquire/release/relaxed/acquire-
release semantics
 Parameters
 queue (input): A queue
 expected (input): The expected index value
 val (input): Value to copy to the write index if expected
matches the observed write index
 Return value
 Previous value of the write index
© Copyright 2014 HSA Foundation. All Rights Reserved
ADD WRITE INDEX
 Atomically increment the write index of a
queue by an offset with
release/acquire/relaxed/acquire-release
semantics
 Parameters
 queue (input): A queue
 val (input): The value to add to the write index
 Return value
 Previous value of the write index
© Copyright 2014 HSA Foundation. All Rights Reserved
ARCHITECTED QUEUING LANGUAGE (AQL)
 An HSA-compliant system provides a command interface for the dispatch of
HSA agent commands.
 This command interface is provided by the Architected Queuing Language (AQL).
 AQL allows HSA agents to build and enqueue their own command packets,
enabling fast and low-power dispatch.
 AQL also provides support for HSA component queue submissions
 The HSA component kernel can write commands in AQL format.
© Copyright 2014 HSA Foundation. All Rights Reserved
AQL PACKET (1)
 AQL packet format
 Values
 Always reserved packet (0): Packet format is set to always reserved when the queue is initialized.
 Invalid packet (1): Packet format is set to invalid when the readIndex is incremented, making the
packet slot available to the HSA agents.
 Dispatch packet (2): Dispatch packets contain jobs for the HSA component and are created by
HSA agents.
 Barrier packet (3): Barrier packets can be inserted by HSA agents to delay processing subsequent
packets. All queues support barrier packets.
 Agent dispatch packet (4): Dispatch packets contain jobs for the HSA agent and are created by
HSA agents.
© Copyright 2014 HSA Foundation. All Rights Reserved
AQL PACKET (2)
HSA signaling object handle used to indicate completion of the job
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE - ENQUEUE AQL PACKET (1)
 An HSA agent submits a task to a queue by performing the following steps:
 Allocate a packet slot (by incrementing the writeIndex)
 Initialize the packet and copy packet to a queue associated with the Packet Processor
 Mark packet as valid
 Notify the Packet Processor of the packet (With doorbell signal)
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE - ENQUEUE AQL PACKET (2)
Dispatch Queue
Allocate an AQL packet slot
Copy the packet into queue. Note
that, we can have a lock here to
prevent race condition in
multithread environment
WriteIndex
ReadIndex
Initialize
packet
Send doorbell signal
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE - PACKET PROCESSOR
WriteIndex
ReadIndex
Get packet content
Check if barrier packet
Update readIndex, change packet state to invalid,
and send completion signal.
Receive doorbell
Dispatch Queue
If there is any packet in queue, process the packet.
© Copyright 2014 HSA Foundation. All Rights Reserved
MEMORY MANAGEMENT
OUTLINE
 Memory registration and deregistration
 Memory region and memory segment
 APIs for memory region manipulation
 APIs for memory registration and deregistration
© Copyright 2014 HSA Foundation. All Rights Reserved
INTRODUCTION
 One of the key features of HSA is its ability to share global pointers between the
host application and code executing on the HSA component.
 This ability means that an application can directly pass a pointer to memory allocated on the host
to a kernel function dispatched to a component without an intermediate copy
 When a buffer created in the host is also accessed by a component,
programmers are encouraged to register the corresponding address range
beforehand.
 Registering memory expresses an intention to access (read or write) the passed buffer from a
component other than the host. This is a performance hint that allows the runtime implementation
to know which buffers will be accessed by some of the components ahead of time.
 When an HSA program no longer needs to access a registered buffer in a device,
the user should deregister that virtual address range.
© Copyright 2014 HSA Foundation. All Rights Reserved
MEMORY REGION/SEGMENT
 A memory region represents a virtual memory interval that is visible to a particular agent,
and contains properties about how memory is accessed or allocated from that agent.
 Memory segments
 Values
 HSA_SEGMENT_GLOBAL = 1
 HSA_SEGMENT_PRIVATE = 2
 HSA_SEGMENT_GROUP = 4
 HSA_SEGMENT_KERNARG = 8
 HSA_SEGMENT_READONLY = 16
 HSA_SEGMENT_IMAGE = 32
© Copyright 2014 HSA Foundation. All Rights Reserved
MEMORY REGION INFORMATION
 Attributes of a memory region
 Values
 HSA_REGION_INFO_BASE_ADDRESS
 HSA_REGION_INFO_SIZE
 HSA_REGION_INFO_NODE
 HSA_REGION_INFO_MAX_ALLOCATION_SIZE
 HSA_REGION_INFO_SEGMENT
 HSA_REGION_INFO_BANDWIDTH
 HSA_REGION_INFO_CACHED
© Copyright 2014 HSA Foundation. All Rights Reserved
MEMORY REGION MANIPULATION (1)
 Get the current value of an attribute of a region
 Iterate over the memory regions that are visible to an agent, and invoke an
application-defined callback on every iteration
 If callback returns a status other than HSA_STATUS_SUCCESS for a particular iteration, the
traversal stops and the function returns that status value.
© Copyright 2014 HSA Foundation. All Rights Reserved
MEMORY REGION MANIPULATION (2)
 Allocate a block of memory
 Deallocate a block of memory previously allocated
using hsa_memory_allocate
 Copy block of memory
 Copying a number of bytes larger than the size of the
memory regions pointed by dst or src results in
undefined behavior.
© Copyright 2014 HSA Foundation. All Rights Reserved
MEMORY REGISTRATION/DEREGISTRATION
 Register memory
 Parameters
 address (input): A pointer to the base of
the memory region to be registered. If a
NULL pointer is passed, no operation is
performed.
 size (input): Requested registration size
in bytes. A size of zero is only allowed if
address is NULL.
 Deregister memory previously registered
using hsa_memory_register
 Parameter
 address (input): A pointer to the base of the
memory region to be registered. If a NULL
pointer is passed, no operation is performed.
© Copyright 2014 HSA Foundation. All Rights Reserved
EXAMPLE
Allocate a memory space
Use hsa_region_get_info to get the
size in byte of this memory space
Register this memory space for a
performance hint
Finish operation, deregister and
free this memory space
© Copyright 2014 HSA Foundation. All Rights Reserved
SUMMARY
SUMMARY
 Covered
 HSA Core Runtime API (Pre-release 1.0 provisional)
 Runtime Initialization and Shutdown (Open/Close)
 Notifications (Synchronous/Asynchronous)
 Agent Information
 Signals and Synchronization (Memory-Based)
 Queues and Architected Dispatch
 Memory Management
 Not covered
 Extension of Core Runtime
 HSAIL Finalization, Linking, and Debugging
 Images and Samplers
© Copyright 2014 HSA Foundation. All Rights Reserved
QUESTIONS?
© Copyright 2014 HSA Foundation. All Rights Reserved

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013HSA Memory Model Hot Chips 2013
HSA Memory Model Hot Chips 2013
 
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorit...
 
Hsa10 whitepaper
Hsa10 whitepaperHsa10 whitepaper
Hsa10 whitepaper
 
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIB...
 
HSA Features
HSA FeaturesHSA Features
HSA Features
 
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.” AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
AFDS 2011 Phil Rogers Keynote: “The Programmer’s Guide to the APU Galaxy.”
 
Heterogeneous System Architecture Overview
Heterogeneous System Architecture OverviewHeterogeneous System Architecture Overview
Heterogeneous System Architecture Overview
 
Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime Deeper Look Into HSAIL And It's Runtime
Deeper Look Into HSAIL And It's Runtime
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric CaspoleHSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
HSA-4024, OpenJDK Sumatra Project: Bringing the GPU to Java, by Eric Caspole
 
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben SanderHSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
HSA-4131, HSAIL Programmers Manual: Uncovered, by Ben Sander
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
HSA System Architecture Overview (2014-10-31)
HSA System Architecture Overview (2014-10-31)HSA System Architecture Overview (2014-10-31)
HSA System Architecture Overview (2014-10-31)
 
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
Introduction sur Tez par Olivier RENAULT de HortonWorks Meetup du 25/11/2014
 
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
PT-4058, Measuring and Optimizing Performance of Cluster and Private Cloud Ap...
 
Apache Ambari Stack Extensibility
Apache Ambari Stack ExtensibilityApache Ambari Stack Extensibility
Apache Ambari Stack Extensibility
 

Andere mochten auch

Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
inside-BigData.com
 

Andere mochten auch (20)

Nokia Web-Runtime Presentation (Phong Vu)
Nokia Web-Runtime Presentation (Phong Vu)Nokia Web-Runtime Presentation (Phong Vu)
Nokia Web-Runtime Presentation (Phong Vu)
 
Intel and Amazon - Powering your innovation together.
Intel and Amazon - Powering your innovation together. Intel and Amazon - Powering your innovation together.
Intel and Amazon - Powering your innovation together.
 
Microsoft Really Loves Linux – a Virtual Love Story
Microsoft Really Loves Linux – a Virtual Love StoryMicrosoft Really Loves Linux – a Virtual Love Story
Microsoft Really Loves Linux – a Virtual Love Story
 
Valgrind overview: runtime memory checker and a bit more aka использование #v...
Valgrind overview: runtime memory checker and a bit more aka использование #v...Valgrind overview: runtime memory checker and a bit more aka использование #v...
Valgrind overview: runtime memory checker and a bit more aka использование #v...
 
OpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouterOpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouter
 
Using GPUs to Achieve Massive Parallelism in Java 8
Using GPUs to Achieve Massive Parallelism in Java 8Using GPUs to Achieve Massive Parallelism in Java 8
Using GPUs to Achieve Massive Parallelism in Java 8
 
Markus Tessmann, InnoGames
Markus Tessmann, InnoGames	Markus Tessmann, InnoGames
Markus Tessmann, InnoGames
 
Java garbage collection & GC friendly coding
Java garbage collection  & GC friendly codingJava garbage collection  & GC friendly coding
Java garbage collection & GC friendly coding
 
Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersCilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
 
20170329 container technight-第一回勉強会
20170329 container technight-第一回勉強会20170329 container technight-第一回勉強会
20170329 container technight-第一回勉強会
 
Serverless Application - Who the heck needs a Server?
Serverless Application - Who the heck needs a Server?Serverless Application - Who the heck needs a Server?
Serverless Application - Who the heck needs a Server?
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC TechnologiesAccelerating Hadoop, Spark, and Memcached with HPC Technologies
Accelerating Hadoop, Spark, and Memcached with HPC Technologies
 
BPF: Next Generation of Programmable Datapath
BPF: Next Generation of Programmable DatapathBPF: Next Generation of Programmable Datapath
BPF: Next Generation of Programmable Datapath
 
Introduction to OpenCV 3.x (with Java)
Introduction to OpenCV 3.x (with Java)Introduction to OpenCV 3.x (with Java)
Introduction to OpenCV 3.x (with Java)
 
Google ART (Android RunTime)
Google ART (Android RunTime)Google ART (Android RunTime)
Google ART (Android RunTime)
 
SOC Processors Used in SOC
SOC Processors Used in SOCSOC Processors Used in SOC
SOC Processors Used in SOC
 
BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In PythonBigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In Python
 
A Year of Innovation Using the DGX-1 AI Supercomputer
A Year of Innovation Using the DGX-1 AI SupercomputerA Year of Innovation Using the DGX-1 AI Supercomputer
A Year of Innovation Using the DGX-1 AI Supercomputer
 

Ähnlich wie ISCA final presentation - Runtime

Load Balancer Component Architecture - Apache Stratos 4.0.0
Load Balancer Component Architecture - Apache Stratos 4.0.0Load Balancer Component Architecture - Apache Stratos 4.0.0
Load Balancer Component Architecture - Apache Stratos 4.0.0
Imesh Gunaratne
 
Create Home Directories on Storage Using WFA and ServiceNow integration
Create Home Directories on Storage Using WFA and ServiceNow integrationCreate Home Directories on Storage Using WFA and ServiceNow integration
Create Home Directories on Storage Using WFA and ServiceNow integration
Rutul Shah
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
Jayesh Thakrar
 

Ähnlich wie ISCA final presentation - Runtime (20)

Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
Design Summit - RESTful API Overview - John Hardy
Design Summit - RESTful API Overview - John HardyDesign Summit - RESTful API Overview - John Hardy
Design Summit - RESTful API Overview - John Hardy
 
Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?
 
Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?Why is my Hadoop cluster slow?
Why is my Hadoop cluster slow?
 
Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?Why is My Hadoop Job Slow?
Why is My Hadoop Job Slow?
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
Securing Your Web Server
Securing Your Web ServerSecuring Your Web Server
Securing Your Web Server
 
Airavata_Architecture_xsede16
Airavata_Architecture_xsede16Airavata_Architecture_xsede16
Airavata_Architecture_xsede16
 
Load Balancer Component Architecture - Apache Stratos 4.0.0
Load Balancer Component Architecture - Apache Stratos 4.0.0Load Balancer Component Architecture - Apache Stratos 4.0.0
Load Balancer Component Architecture - Apache Stratos 4.0.0
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New Features
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
Passenger 6 generic language support presentation
Passenger 6 generic language support presentationPassenger 6 generic language support presentation
Passenger 6 generic language support presentation
 
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated SystemsPetapath HP Cast 12 - Programming for High Performance Accelerated Systems
Petapath HP Cast 12 - Programming for High Performance Accelerated Systems
 
An Introduction to Websphere sMash for PHP Programmers
An Introduction to Websphere sMash for PHP ProgrammersAn Introduction to Websphere sMash for PHP Programmers
An Introduction to Websphere sMash for PHP Programmers
 
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
 
Create Home Directories on Storage Using WFA and ServiceNow integration
Create Home Directories on Storage Using WFA and ServiceNow integrationCreate Home Directories on Storage Using WFA and ServiceNow integration
Create Home Directories on Storage Using WFA and ServiceNow integration
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
 
Apache ppt
Apache pptApache ppt
Apache ppt
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 

Mehr von HSA Foundation

ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
HSA Foundation
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
HSA Foundation
 

Mehr von HSA Foundation (13)

Hsa Runtime version 1.00 Provisional
Hsa Runtime version  1.00  ProvisionalHsa Runtime version  1.00  Provisional
Hsa Runtime version 1.00 Provisional
 
Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)Hsa programmers reference manual (version 1.0 provisional)
Hsa programmers reference manual (version 1.0 provisional)
 
ISCA Final Presentaiton - Compilations
ISCA Final Presentaiton -  CompilationsISCA Final Presentaiton -  Compilations
ISCA Final Presentaiton - Compilations
 
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed Hsa Platform System Architecture Specification Provisional  verl 1.0 ratifed
Hsa Platform System Architecture Specification Provisional verl 1.0 ratifed
 
Apu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshareApu13 cp lu-keynote-final-slideshare
Apu13 cp lu-keynote-final-slideshare
 
HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer HSA Foundation BoF -Siggraph 2013 Flyer
HSA Foundation BoF -Siggraph 2013 Flyer
 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, C...
 
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
ARM Techcon Keynote 2012: Sensor Integration and Improved User Experiences at...
 
Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012Phil Rogers IFA Keynote 2012
Phil Rogers IFA Keynote 2012
 
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA  by Ben Sanders, AMDBolt C++ Standard Template Libary for HSA  by Ben Sanders, AMD
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
 
Hsa2012 logo guidelines.
Hsa2012 logo guidelines.Hsa2012 logo guidelines.
Hsa2012 logo guidelines.
 
What Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSAWhat Fabric Engine Can Do With HSA
What Fabric Engine Can Do With HSA
 
Fabric Engine: Why HSA is Invaluable
Fabric Engine: Why HSA is  InvaluableFabric Engine: Why HSA is  Invaluable
Fabric Engine: Why HSA is Invaluable
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

ISCA final presentation - Runtime

  • 1. HSA RUNTIME YEN-CHING CHUNG, NATIONAL TSING HUA UNIVERSITY
  • 2. OUTLINE  Introduction  HSA Core Runtime API (Pre-release 1.0 provisional)  Initialization and Shut Down  Notifications (Synchronous/Asynchronous)  Agent Information  Signals and Synchronization (Memory-Based)  Queues and Architected Dispatch  Summary © Copyright 2014 HSA Foundation. All Rights Reserved
  • 3. INTRODUCTION (1)  The HSA core runtime is a thin, user-mode API that provides the interface necessary for the host to launch compute kernels to the available HSA components.  The overall goal of the HSA core runtime design is to provide a high-performance dispatch mechanism that is portable across multiple HSA vendor architectures.  The dispatch mechanism differentiates the HSA runtime from other language runtimes by architected argument setting and kernel launching at the hardware and specification level.  The HSA core runtime API is standard across all HSA vendors, such that languages which use the HSA runtime can run on different vendor’s platforms that support the API.  The implementation of the HSA runtime may include kernel-level components (required for some hardware components, ex: AMD Kaveri) or may be entirely user-space (for example, simulators or CPU implementations). © Copyright 2014 HSA Foundation. All Rights Reserved
  • 4. Component 1 Driver Component N… Vendor m … Component 1 Driver Component N… Vendor 1 Component 1 HSA Runtime Component N… HSA Vendor 1 HSA Finalizer Component 1 HSA Runtime Component N… HSA Vendor m HSA Finalizer INTRODUCTION (2) Programming Model Language Runtime  The software architecture stack without HSA runtime OpenCL App Java App OpenMP App DSL App OpenCL Runtime Java Runtime OpenMP Runtime DSL Runtime … …  The software architecture stack with HSA runtime … © Copyright 2014 HSA Foundation. All Rights Reserved
  • 5. INTRODUCTION (3) OpenCL Runtime HSA RuntimeAgent Start Program HSA Memory Allocation Enqueue Dispatch Packet Exit Program Resource Deallocation Command Queue Platform, Device, and Context Initialization SVM Allocation and Kernel Arguments Setting Build Kernel HSA Runtime Close HSA Runtime Initialization and Topology Discovery HSAIL Finalization and Linking © Copyright 2014 HSA Foundation. All Rights Reserved
  • 6. INTRODUCTION (4)  HSA Platform System Architecture Specification support  Runtime initialization and shutdown  Notifications (synchronous/asynchronous)  Agent information  Signals and synchronization (memory-based)  Queues and Architected dispatch  Memory management  HSAIL support  Finalization, linking, and debugging  Image and Sampler support HSA Runtime HSA Memory Allocation Enqueue Dispatch Packet HSA Runtime Close HSA Runtime Initialization and Topology Discovery HSAIL Finalization and Linking © Copyright 2014 HSA Foundation. All Rights Reserved
  • 8. OUTLINE  Runtime Initialization API  hsa_init  Runtime Shut Down API  hsa_shut_down  Examples © Copyright 2014 HSA Foundation. All Rights Reserved
  • 9. HSA RUNTIME INITIALIZATION  When the API is invoked for the first time in a given process, a runtime instance is created.  A typical runtime instance may contain information of platform, topology, reference count, queues, signals, etc.  The API can be called multiple times by applications  Only a single runtime instance will exist for a given process.  Whenever the API is invoked, the reference count is increased by one. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 10. HSA RUNTIME SHUT DOWN  When the API is invoked, the reference count is decreased by 1.  When the reference count < 1  All the resources associated with the runtime instance (queues, signals, topology information, etc.) are considered invalid and any attempt to reference them in subsequent API calls results in undefined behavior.  The user might call hsa_init to initialize the HSA runtime again.  The HSA runtime might release resources associated with it. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 11. EXAMPLE – RUNTIME INITIALIZATION (1) Data structure for runtime instance If hsa_init is called more than once, increase the ref_count by 1 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 12. EXAMPLE – RUNTIME INITIALIZATION (2) hsa_init is called the first time, allocate resources and set the reference count Get the number of HSA agent Initialize agents Create an empty agent list If initialization failed, release resources Create topology table © Copyright 2014 HSA Foundation. All Rights Reserved
  • 13. Agent-0 node_id 0 id 0 type CPU vendor Generic name Generic wavefront_size 0 queue_size 200 group_memory 0 fbarrier_max_count 1 is_pic_supported 0 … … EXAMPLE - RUNTIME INSTANCE (1) Platform Name: Generic Memory node_id 0 id 0 segment_type 111111 address_base 0x0001 size 2048 MB peak_bandwidth 6553.6 mpbs Agent-1 node_id 0 id 0 type GPU vendor Generic name Generic wavefront_size 64 queue_size 200 group_memory 64 fbarrier_max_count 1 is_pic_supported 1 Cache node_id 0 id 0 levels 1 associativity 1 cache size 64KB cache line size 4 is_inclusive 1 Agent: 2 Memory: 1 Cache: 1 … … © Copyright 2014 HSA Foundation. All Rights Reserved
  • 14. Agent-0 node_id = 0 id = 0 agent_type = 1 (CPU) vendor[16] = Generic name[16] = Generic wavefront_size = 0 queue_size =200 group_memory_size_bytes =0 fbarrier_max_count = 1 is_pic_supported = 0 Platform Header File *base_address = 0x00001 Size = 248 system_timestamp_frequency_ mhz = 200 signal_maximum_wait = 1/200 *node_id no_nodes = 1 *agent_list no_agent = 2 *memory_descriptor_list no_memory_descriptor = 1 *cache_descriptor_list no_cache_descriptor = 1 EXAMPLE - RUNTIME INSTANCE (2) … … cache node_id = 0 Id = 0 Levels = 1 * associativity * cache_size * cache_line_size * is_inclusive 1 NULL 64KB NULL 1 NULL 4 NULL Memory node_id = 0 Id = 0 supported_segment_type_mask = 111111 virtual_address_base = 0x0001 size_in_bytes = 2048MB peak_bandwidth_mbps = 6553.6 0 NULL 45 165 NULL 285 NULL 325 NULL Agent-1 node_id = 0 id = 0 agent_type = 2 (GPU) vendor[16] = Generic name[16] = Generic wavefront_size = 64 queue_size =200 group_memory_size_bytes =64 fbarrier_max_count = 1 is_pic_supported = 1 … © Copyright 2014 HSA Foundation. All Rights Reserved
  • 15. EXAMPLE – RUNTIME SHUT DOWN © Copyright 2014 HSA Foundation. All Rights Reserved If ref_count < 1, then free the list; Otherwise decrease the ref_count by 1.
  • 17. OUTLINE  Synchronous Notifications  hsa_status_t  hsa_status_string  Asynchronous Notifications  Example © Copyright 2014 HSA Foundation. All Rights Reserved
  • 18. SYNCHRONOUS NOTIFICATIONS  Notifications (errors, events, etc.) reported by the runtime can be synchronous or asynchronous  The HSA runtime uses the return values of API functions to pass notifications synchronously.  A status code is define as an enumeration, , to capture the return value of any API function that has been executed, except accessors/mutators.  The notification is a status code that indicates success or error.  Success is represented by HSA_STATUS_SUCCESS, which is equivalent to zero.  An error status is assigned a positive integer and its identifier starts with the HSA_STATUS_ERROR prefix.  The status code can help to determine a cause of the unsuccessful execution. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 19. STATUS CODE QUERY  Query additional information on status code  Parameters  status (input): Status code that the user is seeking more information on  status_string (output): An ISO/IEC 646 encoded English language string that potentially describes the error status © Copyright 2014 HSA Foundation. All Rights Reserved
  • 20. ASYNCHRONOUS NOTIFICATIONS  The runtime passes asynchronous notifications by calling user-defined callbacks.  For instance, queues are a common source of asynchronous events because the tasks queued by an application are asynchronously consumed by the packet processor. Callbacks are associated with queues when they are created. When the runtime detects an error in a queue, it invokes the callback associated with that queue and passes it an error flag (indicating what happened) and a pointer to the erroneous queue.  The HSA runtime does not implement any default callbacks.  When using blocking functions within the callback implementation, a callback that does not return can render the runtime state to be undefined. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 21. EXAMPLE - CALLBACK Pass the callback function when create queue If the queue is empty, set the event and invoke callback © Copyright 2014 HSA Foundation. All Rights Reserved
  • 23. OUTLINE  Agent information  hsa_node_t  hsa_agent_t  hsa_agent_info_t  hsa_component_feature_t  Agent Information manipulation APIs  hsa_iterate_agents  hsa_agent_get_info  Example © Copyright 2014 HSA Foundation. All Rights Reserved
  • 24. INTRODUCTION  The runtime exposes a list of agents that are available in the system.  An HSA agent is a hardware component that participates in the HSA memory model.  An HSA agent can submit AQL packets for execution.  An HSA agent may also but is not required to be an HSA component. It is possible for a system to include HSA agents that are neither an HSA component nor a host CPU.  HSA agents are defined as opaque handles of type hsa_agent_t .  The HSA runtime provides APIs for applications to traverse the list of available agents and query attributes of a particular agent. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 25. AGENT INFORMATION (1)  Opaque agent handle  Opaque NUMA node handle  An HSA memory node is a node that delineates a set of system components (host CPUs and HSA Components) with “local” access to a set of memory resources attached to the node's memory controller and appropriate HSA-compliant access attributes. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 26. AGENT INFORMATION (2)  Component features  An HSA component is a hardware or software component that can be a target of the AQL queries and conforms to the memory model of the HSA.  Values  HSA_COMPONENT_FEATURE_NONE = 0  No component capabilities. The device is an agent, but not a component.  HSA_COMPONENT_FEATURE_BASIC = 1  The component supports the HSAIL instruction set and all the AQL packet types except Agent dispatch.  HSA_COMPONENT_FEATURE_ALL = 2  The component supports the HSAIL instruction set and all the AQL packet types. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 27. AGENT INFORMATION (3)  Agent attributes  Values  HSA_AGENT_INFO_MAX_GRID_DIM  HSA_AGENT_INFO_MAX_WORKGROUP_DIM  HSA_AGENT_INFO_QUEUE_MAX_PACKETS  HSA_AGENT_INFO_CLOCK  HSA_AGENT_INFO_CLOCK_FREQUENCY  HSA_AGENT_INFO_MAX_SIGNAL_WAIT  HSA_AGENT_INFO_NAME  HSA_AGENT_INFO_NODE  HSA_AGENT_INFO_COMPONENT_FEATURES  HSA_AGENT_INFO_VENDOR_NAME  HSA_AGENT_INFO_WAVEFRONT_SIZE  HSA_AGENT_INFO_CACHE_SIZE © Copyright 2014 HSA Foundation. All Rights Reserved
  • 28. AGENT INFORMATION MANIPULATION (1)  Iterate over the available agents, and invoke an application-defined callback on every iteration  If callback returns a status other than HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and the function returns that status value.  Parameters  callback (input): Callback to be invoked once per agent  data (input): Application data that is passed to callback on every iteration. Can be NULL. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 29. AGENT INFORMATION MANIPULATION (2)  Get the current value of an attribute for a given agent  Parameters  agent (input): A valid agent  attribute (input): Attribute to query  value (output): Pointer to a user-allocated buffer where to store the value of the attribute. If the buffer passed by the application is not large enough to hold the value of attribute, the behavior is undefined. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 30. EXAMPLE - AGENT ATTRIBUTE QUERY Copy agent attribute information Get the agent handle of Agent 0 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 32. OUTLIINE  Signal  Signal manipulation API  Create/Destroy  Query  Send  Atomic Operations  Signal wait  Get time out  Signal Condition  Example © Copyright 2014 HSA Foundation. All Rights Reserved
  • 33. SIGNAL (1)  HSA agents can communicate with each other by using coherent global memory, or by using signals.  A signal is represented by an opaque signal handle  A signal carries a value, which can be updated or conditionally waited upon via an API call or HSAIL instruction.  The value occupies four or eight bytes depending on the machine model in use. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 34. SIGNAL (2)  Updating the value of a signal is equivalent to sending the signal.  In addition to the update (store) of signals, the API for sending signal must support other atomic operations with specific memory order semantics  Atomic operations: AND, OR, XOR, Add, Subtract, Exchange, and CAS  Memory order semantics : Release and Relaxed © Copyright 2014 HSA Foundation. All Rights Reserved
  • 35. SIGNAL CREATE/DESTROY  Create a signal  Parameters  initial_value (input): Initial value of the signal.  signal_handle (output): Signal handle.  Destroy a signal previous created by hsa_signal_create  Parameter  signal_handle (input): Signal handle. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 36.  Send and atomically set the value of a signal with release semantics SIGNAL LOAD/STORE  Atomically read the current signal value with acquire semantics  Atomically read the current signal value with relaxed semantics  Send and atomically set the value of a signal with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  • 37.  Send and atomically increment the value of a signal by a given amount with release semantics SIGNAL ADD/SUBTRACT  Send and atomically decrement the value of a signal by a given amount with release semantics  Send and atomically increment the value of a signal by a given amount with relaxed semantics  Send and atomically decrement the value of a signal by a given amount with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  • 38.  Send and atomically perform a logical AND operation on the value of a signal and a given value with release semantics SIGNAL AND (OR, XOR)/EXCHANGE  Send and atomically set the value of a signal and return its previous value with release semantics  Send and atomically perform a logical AND operation on the value of a signal and a given value with relaxed semantics  Send and atomically set the value of a signal and return its previous value with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  • 39. SIGNAL WAIT (1)  The application may wait on a signal, with a condition specifying the terms of wait.  Signal wait condition operator  Values  HSA_EQ: The two operands are equal.  HSA_NE: The two operands are not equal.  HSA_LT: The first operand is less than the second operand.  HSA_GTE: The first operand is greater than or equal to the second operand. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 40. SIGNAL WAIT (2)  The wait can be done either in the HSA component via an HSAIL wait instruction or via a runtime API defined here.  Waiting on a signal returns the current value at the opaque signal object;  The wait may have a runtime defined timeout which indicates the maximum amount of time that an implementation can spend waiting.  The signal infrastructure allows for multiple senders/waiters on a single signal.  Wait reads the value, hence acquire synchronizations may be applied. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 41. SIGNAL WAIT (3)  Signal wait  Parameters  signal_handle (input): A signal handle  condition (input): Condition used to compare the passed and signal values  compare_ value (input): Value to compare with  return_value (output): A pointer where the current signal value must be read into © Copyright 2014 HSA Foundation. All Rights Reserved
  • 42. SIGNAL WAIT (4)  Signal wait with timeout  Parameters  signal_handle (input): A signal handle  timeout (input): Maximum wait duration (A value of zero indicates no maximum)  long_wait (input): Hint indicating that the signal value is not expected to meet the given condition in a short period of time. The HSA runtime may use this hint to optimize the wait implementation.  condition (input): Condition used to compare the passed and signal values  compare_ value (input): Value to compare with  return_value (output): A pointer where the current signal value must be read into © Copyright 2014 HSA Foundation. All Rights Reserved
  • 43. EXAMPLE – SIGNAL WAIT (1) thread_1 thread_2 thread_1 is blocked hsa_signal_add_relaxed (value = value + 3) Return signal value Condition satisfied, the execution of thread_1 continues value = 0 Timeline Timeline value = 3 hsa_signal_substract_relaxed (value = value - 1)value = 2 hsa_signal_wait_timeout_acquire (value == 2) © Copyright 2014 HSA Foundation. All Rights Reserved
  • 44. EXAMPLE – SIGNAL WAIT (2) If signal_handle is invalid, then return signal invalid status Compare tmp->value with compare_value to see if the condition is satisfied? If timeout = 0 then return signal time out status Signal wait condition function If the condition is satisfied, then return signal and status © Copyright 2014 HSA Foundation. All Rights Reserved
  • 46. OUTLINE  Queues  Queue Types and Structure  HSA runtime API for Queue Manipulations  Architected Queuing Language (AQL) Support  Packet type  Packet header  Examples  Enqueue Packet  Packet Processor © Copyright 2014 HSA Foundation. All Rights Reserved
  • 47. INTRODUCTION (1)  An HSA-compliant platform supports multiple user-level command queues allocation.  A use-level command queue is characterized as runtime-allocated, user-level accessible virtual memory of a certain size, containing packets defined in the Architected Queuing Language (AQL packets).  Queues are allocated by HSA applications through the HSA runtime.  HSA software receives memory-based structures to configure the hardware queues to allow for efficient software management of the hardware queues of the HSA agents.  This queue memory shall be processed by the HSA Packet Processor as a ring buffer.  Queues are read-only data structures.  Writing values directly to a queue structure results in undefined behavior.  But HSA agents can directly modify the contents of the buffer pointed by base_address, or use runtime APIs to access the doorbell signal or the service queue. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 48.  Two queue types, AQL and Service Queues, are supported  AQL Queue consumes AQL packets that are used to specify the information of kernel functions that will be executed on the HSA component  Service Queue consumes agent dispatch packets that are used to specify runtime-defined or user registered functions that will be executed on the agent (typically, the host CPU) INTRODUCTION (2) © Copyright 2014 HSA Foundation. All Rights Reserved
  • 49. INTRODUCTION (3)  AQL queue structure © Copyright 2014 HSA Foundation. All Rights Reserved
  • 50. INTRODUCTION (4)  In addition to the data held in the queue structure, the queue also defines two properties (readIndex and writeIndex) that define the location of “head” and “tail” of the queue.  readIndex: The read index is a 64-bit unsigned integer that specifies the packetID of the next AQL packet to be consumed by the packet processor.  writeIndex: The write index is a 64-bit unsigned integer that specifies the packetID of the next AQL packet slot to be allocated.  Both indices are not directly exposed to the user, who can only access them by using dedicated HSA core runtime APIs.  The available index functions differ on the index of interest (read or write), action to be performed (addition, compare and swap, etc.), and memory consistency model (relaxed, release, etc.). © Copyright 2014 HSA Foundation. All Rights Reserved
  • 51. INTRODUCTION (5)  The read index is automatically advanced when a packet is read by the packet processor.  When the packet processor observes that  The read index matches the write index, the queue can be considered empty;  The write index is greater than or equal to the sum of the read index and the size of the queue, then the queue is full.  The doorbell_signal field of a queue contains a signal that is used by the agent to inform the packet processor to process the packets it writes.  The value that the doorbell signaled is equal to the ID of the packet that is ready to be launched. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 52. INTRODUCTION (6)  The new task might be consumed by the packet processor even before the doorbell signal has been signaled by the agent.  This is because the packet processor might be already processing some other packets and observes that there is new work available, so it processes the new packets.  In any case, the agent must ring the doorbell for every batch of packets it writes. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 53. QUEUE CREATE/DESTROY  Create a user mode queue  When a queue is created, the runtime also allocates the packet buffer and the completion signal.  The application should only rely on the status code returned to determine if the queue is valid  Destroy a user mode queue  A destroyed queue might not be accessed after being destroyed.  When a queue is destroyed, the state of the AQL packets that have not been yet fully processed becomes undefined. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 54. GET READ/WRITE INDEX  Atomically retrieve read index of a queue with acquire semantics  Atomically retrieve write index of a queue with acquire semantics  Atomically retrieve read index of a queue with relaxed semantics  Atomically retrieve write index of a queue with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  • 55. SET READ/WRITE INDEX  Atomically set the read index of a queue with release semantics  Atomically set the read index of a queue with relaxed semantics  Atomically set the write index of a queue with release semantics  Atomically set the write index of a queue with relaxed semantics © Copyright 2014 HSA Foundation. All Rights Reserved
  • 56. COMPARE AND SWAP WRITE INDEX  Atomically compare and set the write index of a queue with acquire/release/relaxed/acquire- release semantics  Parameters  queue (input): A queue  expected (input): The expected index value  val (input): Value to copy to the write index if expected matches the observed write index  Return value  Previous value of the write index © Copyright 2014 HSA Foundation. All Rights Reserved
  • 57. ADD WRITE INDEX  Atomically increment the write index of a queue by an offset with release/acquire/relaxed/acquire-release semantics  Parameters  queue (input): A queue  val (input): The value to add to the write index  Return value  Previous value of the write index © Copyright 2014 HSA Foundation. All Rights Reserved
  • 58. ARCHITECTED QUEUING LANGUAGE (AQL)  An HSA-compliant system provides a command interface for the dispatch of HSA agent commands.  This command interface is provided by the Architected Queuing Language (AQL).  AQL allows HSA agents to build and enqueue their own command packets, enabling fast and low-power dispatch.  AQL also provides support for HSA component queue submissions  The HSA component kernel can write commands in AQL format. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 59. AQL PACKET (1)  AQL packet format  Values  Always reserved packet (0): Packet format is set to always reserved when the queue is initialized.  Invalid packet (1): Packet format is set to invalid when the readIndex is incremented, making the packet slot available to the HSA agents.  Dispatch packet (2): Dispatch packets contain jobs for the HSA component and are created by HSA agents.  Barrier packet (3): Barrier packets can be inserted by HSA agents to delay processing subsequent packets. All queues support barrier packets.  Agent dispatch packet (4): Dispatch packets contain jobs for the HSA agent and are created by HSA agents. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 60. AQL PACKET (2) HSA signaling object handle used to indicate completion of the job © Copyright 2014 HSA Foundation. All Rights Reserved
  • 61. EXAMPLE - ENQUEUE AQL PACKET (1)  An HSA agent submits a task to a queue by performing the following steps:  Allocate a packet slot (by incrementing the writeIndex)  Initialize the packet and copy packet to a queue associated with the Packet Processor  Mark packet as valid  Notify the Packet Processor of the packet (With doorbell signal) © Copyright 2014 HSA Foundation. All Rights Reserved
  • 62. EXAMPLE - ENQUEUE AQL PACKET (2) Dispatch Queue Allocate an AQL packet slot Copy the packet into queue. Note that, we can have a lock here to prevent race condition in multithread environment WriteIndex ReadIndex Initialize packet Send doorbell signal © Copyright 2014 HSA Foundation. All Rights Reserved
  • 63. EXAMPLE - PACKET PROCESSOR WriteIndex ReadIndex Get packet content Check if barrier packet Update readIndex, change packet state to invalid, and send completion signal. Receive doorbell Dispatch Queue If there is any packet in queue, process the packet. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 65. OUTLINE  Memory registration and deregistration  Memory region and memory segment  APIs for memory region manipulation  APIs for memory registration and deregistration © Copyright 2014 HSA Foundation. All Rights Reserved
  • 66. INTRODUCTION  One of the key features of HSA is its ability to share global pointers between the host application and code executing on the HSA component.  This ability means that an application can directly pass a pointer to memory allocated on the host to a kernel function dispatched to a component without an intermediate copy  When a buffer created in the host is also accessed by a component, programmers are encouraged to register the corresponding address range beforehand.  Registering memory expresses an intention to access (read or write) the passed buffer from a component other than the host. This is a performance hint that allows the runtime implementation to know which buffers will be accessed by some of the components ahead of time.  When an HSA program no longer needs to access a registered buffer in a device, the user should deregister that virtual address range. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 67. MEMORY REGION/SEGMENT  A memory region represents a virtual memory interval that is visible to a particular agent, and contains properties about how memory is accessed or allocated from that agent.  Memory segments  Values  HSA_SEGMENT_GLOBAL = 1  HSA_SEGMENT_PRIVATE = 2  HSA_SEGMENT_GROUP = 4  HSA_SEGMENT_KERNARG = 8  HSA_SEGMENT_READONLY = 16  HSA_SEGMENT_IMAGE = 32 © Copyright 2014 HSA Foundation. All Rights Reserved
  • 68. MEMORY REGION INFORMATION  Attributes of a memory region  Values  HSA_REGION_INFO_BASE_ADDRESS  HSA_REGION_INFO_SIZE  HSA_REGION_INFO_NODE  HSA_REGION_INFO_MAX_ALLOCATION_SIZE  HSA_REGION_INFO_SEGMENT  HSA_REGION_INFO_BANDWIDTH  HSA_REGION_INFO_CACHED © Copyright 2014 HSA Foundation. All Rights Reserved
  • 69. MEMORY REGION MANIPULATION (1)  Get the current value of an attribute of a region  Iterate over the memory regions that are visible to an agent, and invoke an application-defined callback on every iteration  If callback returns a status other than HSA_STATUS_SUCCESS for a particular iteration, the traversal stops and the function returns that status value. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 70. MEMORY REGION MANIPULATION (2)  Allocate a block of memory  Deallocate a block of memory previously allocated using hsa_memory_allocate  Copy block of memory  Copying a number of bytes larger than the size of the memory regions pointed by dst or src results in undefined behavior. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 71. MEMORY REGISTRATION/DEREGISTRATION  Register memory  Parameters  address (input): A pointer to the base of the memory region to be registered. If a NULL pointer is passed, no operation is performed.  size (input): Requested registration size in bytes. A size of zero is only allowed if address is NULL.  Deregister memory previously registered using hsa_memory_register  Parameter  address (input): A pointer to the base of the memory region to be registered. If a NULL pointer is passed, no operation is performed. © Copyright 2014 HSA Foundation. All Rights Reserved
  • 72. EXAMPLE Allocate a memory space Use hsa_region_get_info to get the size in byte of this memory space Register this memory space for a performance hint Finish operation, deregister and free this memory space © Copyright 2014 HSA Foundation. All Rights Reserved
  • 74. SUMMARY  Covered  HSA Core Runtime API (Pre-release 1.0 provisional)  Runtime Initialization and Shutdown (Open/Close)  Notifications (Synchronous/Asynchronous)  Agent Information  Signals and Synchronization (Memory-Based)  Queues and Architected Dispatch  Memory Management  Not covered  Extension of Core Runtime  HSAIL Finalization, Linking, and Debugging  Images and Samplers © Copyright 2014 HSA Foundation. All Rights Reserved
  • 75. QUESTIONS? © Copyright 2014 HSA Foundation. All Rights Reserved

Hinweis der Redaktion

  1. Queue type HSA_QUEUE_TYPE_MULTI = 0, multiple producers are supported HSA_QUEUE_TYPE_SINGLE = 1, only a single producer is supported Queue features HSA_QUEUE_FEATURE_DISPATCH = 1, queue supports dispatch packets. HSA_QUEUE_FEATURE_AGENT_DISPATCH = 2, queue supports agent dispatch packets service_queue A pointer to another user mode queue that can be used by the HSAIL kernel to request system services.
  2. Service_queue_type: NONE (no service queue), COMMON (runtime provided service queue that is shared), NEW (require the runtime to create a new queue).
  3. acquire_fence_scope : Determine the scope and type of the memory fence operation applied before the packet enters the active phase. release_fence_scope : Determine the scope and type of the memory fence operation applied after kernel completion but before the packet is completed. HSA_FENCE_SCOPE_NONE = 0 No scope. Only valid for barrier packets. HSA_FENCE_SCOPE_COMPONENT = 1 The fence is applied with component scope for the global segment. HSA_FENCE_SCOPE_SYSTEM = 2 The fence is applied with system scope for the global segment.