1. ITRI
Industrial Technology
Research Institute
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院
HSA System Architecture Overview
王振傑 (Jay Wang)
嵌入式系統與晶片技術組 -系統架構設計部 (D200)
資訊與通訊研究所 (ICL) ccwang.jay@itri.org.tw
2014-10-31
2. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Platform Model
2
In HSA system, a regular device is called an HSA agent, and if the HSA agent can run kernels then it is also an HSA component.
Serial and Task Parallel Workloads
Data Parallel Workloads
SIMD
3. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Three Eras of Processor Performance
3
?
Single-thread Performance
Time
we are
here
Enabled by:
Moore’s Observation
Voltage Scaling
Micro-Architecture
Constrained by:
Power
Complexity
Single-Core Era
Modern Application
Performance
Time (Data-parallel exploitation)
we are
here
Heterogeneous
Systems Era
Enabled by:
Moore’s Observation
Abundant data parallelism
Power efficient data parallel
processing (GPUs)
Constrained by:
Programming models
Communication overheads
Throughput Performance
Time (# of processors)
we are
here
Enabled by:
Moore’s Observation
Desire for Throughput
20 years of SMP arch
Constrained by:
Power
Parallel SW availability
Scalability
Multi-Core Era
Assembly C/C++ Java …
pthreads OpenMP / TBB …
Shader CUDA OpenCL
C++ and Java
SOURCE : HSA INTRODUCTION, HSA FOUNDATION (PHIL ROGERS, AMD)
4. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Intermediate Language (HSAIL)
4
The HSA Foundation members are building a heterogeneous compute software ecosystem built on open, royalty-free industry standards and open-source software: the HSA runtimes and compilation tools are based on open-source technologies such as LLVM and GCC. ( https://github.com/HSAFoundation )
8. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA Memory Consistency Model (Relaxed Model)
Second Operation
ld_rlx st_rlx
atomic_rlx atomicNoRet_rlx
atomic_acq atomicNoRet_acq
fence_acq
atomic_rel atomicNoRet_rel fence_rel
atomic_ar atomicNoRet_ar fence_ar
First
Operation
ld_rlx or st_rlx
yes
yes
yes
yes
no
no
atomic_rlx atomicNoRet_rlx
yes
yes
yes
no
no
no
atomic_acq atomicNoRet_acq fence_acq
no
no
no
no
no
no
atomic_rel atomicNoRet_rel
yes
yes
no
no
no
no
fence_rel
yes
no
no
no
no
no
atomic_ar atomicNoRet_ar fence_ar
no
no
no
no
no
no
8
relaxed ;
…..
acquire ;
…..
release ;
…..
acq_rel ;
…..
17. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSAIL Instructions for Signaling
17
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG) (Version 1.0 Provisional)
6.8 Notification (signal) Operation
18. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Atomic Memory Operations
HSA requires the following standard atomic memory operations to be supported by HSA Components (other HSA Agents only need to support the subset of these operations required by their role in the system):
Load from memory
Store to memory
Fetch from memory, apply logic operation (bitwise AND/OR/XOR) with one addition operand, and store back.
Fetch from memory, apply integer arithmetic operation (add, subtract, increment, decrement, minimum, maximum) with one addition operand, and store back.
Exchange memory location with operand.
Compare-and-swap (CAS); load memory location, compare with first operand, if equal than store second operand back to memory location.
18
19. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
HSA System Timestamp
The HSA system provide for a low overhead mechanism of determining the passing of time.
A system timestamp is required that can be read from HSAIL or through the HSA runtime.
It is also possible to determine the system timestamp frequency through the HSA runtime.
19
24. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
AQL Packet Types
24
HSA signaling object handle used to indicate completion of the job.
Format (8-bit)
barrier (1-bit)
acquireFenceScope (2-bit)
releaseFenceScope (2-bit)
reserved (3-bit)
Format
0 Always_Reserved
1 Invalid
2 Kernel_Dispatch
3 Barrier_AND
4 Agent_Dispatch
5 Barrier_OR
25. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Kernel Dispatch Packet
25
Work-group Size
Grid Size
Segment Size
Pointer to the Kernel
Pointer to the arguments
26. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Agent Dispatch Packet
26
64-bit direct or indirect arguments
Pointer to location to store the function return value(s) in
The function to be performed by the destination HSA Agent.
The type value is split into the following ranges:
0x0000 ~ 0x7FFF
Reserved
0x8000 ~ 0xFFFF
User registered function
27. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Barrier-AND / Barrier-OR Packet
The Barrier packet defines dependencies for the HSA Packet Processor to monitor.
The HSA Packet Processor will not launch any further packets until the Barrier- AND / Barrier-OR packet is complete.
27
Handles for dependent signaling objects to be evaluated by the packet processor.
28. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Packet Process Flow
All preceding packets in the queue must have completed their launch phase.
If the barrier bit in the packet header is set than all preceding packets in the queue must have completed.
In the launch phase an acquire memory fence is applied before the packet enters the active phase.
Kernel Dispatch packets and Agent Dispatch packets execute on the HSA Component/Agent, and the active phase ends when the task completes.
Barrier-AND and Barrier-OR packets remain in the active phase until their condition is met.
The first step in the completion phase is the memory release fence.
After the memory release fence completes the signal specified by the completionSignal field in the AQL packet is signaled with a decrementing atomic operation.
28
Launch Phase
Active Phase
Completion Phase
41. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31)
Images
A graphics feature that can sometimes be useful in data- parallel computing
Used to store one-, two-, or three-dimensional images
predefined image formats
Image memory is a special kind of memory access
Dedicated hardware to speed up image operations.
41
The OpenCL™ Specification Version 2.0: 5.3 Image Objects http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf