SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
ITRI 
Industrial Technology 
Research Institute 
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 
HSA System Architecture Overview 
王振傑 (Jay Wang) 
嵌入式系統與晶片技術組 -系統架構設計部 (D200) 
資訊與通訊研究所 (ICL) ccwang.jay@itri.org.tw 
2014-10-31
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Platform Model 
2 
In HSA system, a regular device is called an HSA agent, and if the HSA agent can run kernels then it is also an HSA component. 
Serial and Task Parallel Workloads 
Data Parallel Workloads 
SIMD
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Three Eras of Processor Performance 
3 
? 
Single-thread Performance 
Time 
we are 
here 
Enabled by: 
 Moore’s Observation 
 Voltage Scaling 
 Micro-Architecture 
Constrained by: 
 Power 
 Complexity 
Single-Core Era 
Modern Application 
Performance 
Time (Data-parallel exploitation) 
we are 
here 
Heterogeneous 
Systems Era 
Enabled by: 
 Moore’s Observation 
 Abundant data parallelism 
 Power efficient data parallel 
processing (GPUs) 
Constrained by: 
 Programming models 
 Communication overheads 
Throughput Performance 
Time (# of processors) 
we are 
here 
Enabled by: 
 Moore’s Observation 
 Desire for Throughput 
 20 years of SMP arch 
Constrained by: 
 Power 
 Parallel SW availability 
 Scalability 
Multi-Core Era 
Assembly  C/C++  Java … 
pthreads  OpenMP / TBB … 
Shader  CUDA OpenCL 
 C++ and Java 
SOURCE : HSA INTRODUCTION, HSA FOUNDATION (PHIL ROGERS, AMD)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Intermediate Language (HSAIL) 
4 
The HSA Foundation members are building a heterogeneous compute software ecosystem built on open, royalty-free industry standards and open-source software: the HSA runtimes and compilation tools are based on open-source technologies such as LLVM and GCC. ( https://github.com/HSAFoundation )
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSAIL Programming Model 
5
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Runtime Stack 
6
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Kernel Execution 
7
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Memory Consistency Model (Relaxed Model) 
Second Operation 
ld_rlx st_rlx 
atomic_rlx atomicNoRet_rlx 
atomic_acq atomicNoRet_acq 
fence_acq 
atomic_rel atomicNoRet_rel fence_rel 
atomic_ar atomicNoRet_ar fence_ar 
First 
Operation 
ld_rlx or st_rlx 
yes 
yes 
yes 
yes 
no 
no 
atomic_rlx atomicNoRet_rlx 
yes 
yes 
yes 
no 
no 
no 
atomic_acq atomicNoRet_acq fence_acq 
no 
no 
no 
no 
no 
no 
atomic_rel atomicNoRet_rel 
yes 
yes 
no 
no 
no 
no 
fence_rel 
yes 
no 
no 
no 
no 
no 
atomic_ar atomicNoRet_ar fence_ar 
no 
no 
no 
no 
no 
no 
8 
relaxed ; 
….. 
acquire ; 
….. 
release ; 
….. 
acq_rel ; 
…..
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
System Arch. Requirements 
1. Shared Virtual Memory 
2. Cache Coherency Domains 
3. Flat Addressing 
4. Signaling and Synchronization 
5. Atomic Memory Operations 
6. HSA System Timestamp 
7. User Mode Queuing 
8. Architected Queuing Language (AQL) 
9. HSA Agent Scheduling 
10. HSA Component Context Switching 
11. IEEE754-2008 Floating Point Exceptions 
12. HSA Component Hardware Debug Infrastructure 
13. HSA Platform Topology Discovery 
14. Images 
9 
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Legacy GPU Compute 
Multiple memory pools and address spaces 
Data copies before/after GPU compute 
10 
System MemoryGPU Memory123Host CPUsGPUVirtual Memory #1Virtual Memory #2(HSA Agent) (HSA Agent) (HSA Component) © 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Host CPUs GPU 
@ 2014 JAY WANG 
(HSA Agent) 
(HSA Agent) 
(HSA Component) 
Shared Virtual Memory 
System Memory GPU Memory 
Shared Virtual Memory (HSA) 
11 
32-bit HSA System 
(32 bits VA) 
64-bit HSA System 
(≥ 48 bits VA) 
MMU 
OS Page Table 
MMU
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Memory Hierarchy 
12 
1)Global 
2)Group 
3)Private 
4)Kernarg 
5)Readonly 
6)Spill 
7)Arg 
Virtual Address Range Reservation 
(System Memory or Device Local Memory)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Cache Coherency Domains 
13 
System Memory 
Cache 
Cache 
Cache 
Coherency
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
System Arch. Requirements 
1. Shared Virtual Memory 
2. Cache Coherency Domains 
3. Flat Addressing 
4. Signaling and Synchronization 
5. Atomic Memory Operations 
6. HSA System Timestamp 
7. User Mode Queuing 
8. Architected Queuing Language (AQL) 
9. HSA Agent Scheduling 
10. HSA Component Context Switching 
11. IEEE754-2008 Floating Point Exceptions 
12. HSA Component Hardware Debug Infrastructure 
13. HSA Platform Topology Discovery 
14. Images 
14 
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Signaling and Synchronization 
 The required mechanisms for HSAIL and the HSA runtime are: 
 Allocate/Destroy an HSA signal 
 Read the current HSA signal value 
 Wait on an HSA signal to meet a specified condition (with a maximum wait duration 
requested) 
 Send an HSA signal value 
 Atomic read-modify-write an HSA signal value 
15 
Signal Handle 
(hsa_signal_t) 
Signal Value 
(hsa_signal_value_t) 
HSA Agent 
HSA 
Component 
Host CPU 
HSA Agent 
HSA Runtime 
APIs 
HSAIL 
Instructions 
Implementation-defined 
data 
Sig32 or Sig64 
© 2014 JAY WANG 
sem_init() 
sem_wait() 
sem_post() 
sem_destroy() 
pthread_mutex_init() 
pthread_mutex_lock() 
pthread_mutex_unlock() 
pthread_mutex_destroy()
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Runtime APIs for Signaling 
16 
HSA Runtime APIs ( for HSA application ) 
•hsa_signal_create ( ) 
•hsa_signal_destroy ( ) 
•hsa_signal_load_{acquire, relaxed} ( ) 
•hsa_signal_store_{relaxed, release} ( ) 
•hsa_signal_exchange_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_signal_cas_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_signal_add_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_signal_subtract_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_signal_and_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_signal_or_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_signal_xor_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_signal_wait__{acquire, relaxed} ( ) 
HSA Runtime Programmer’s Reference Manual (v1.00) 
2.4 Signals
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSAIL Instructions for Signaling 
17 
HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG) (Version 1.0 Provisional) 
6.8 Notification (signal) Operation
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Atomic Memory Operations 
HSA requires the following standard atomic memory operations to be supported by HSA Components (other HSA Agents only need to support the subset of these operations required by their role in the system): 
Load from memory 
Store to memory 
Fetch from memory, apply logic operation (bitwise AND/OR/XOR) with one addition operand, and store back. 
Fetch from memory, apply integer arithmetic operation (add, subtract, increment, decrement, minimum, maximum) with one addition operand, and store back. 
Exchange memory location with operand. 
Compare-and-swap (CAS); load memory location, compare with first operand, if equal than store second operand back to memory location. 
18
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA System Timestamp 
The HSA system provide for a low overhead mechanism of determining the passing of time. 
A system timestamp is required that can be read from HSAIL or through the HSA runtime. 
It is also possible to determine the system timestamp frequency through the HSA runtime. 
19
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
System Arch. Requirements 
1. Shared Virtual Memory 
2. Cache Coherency Domains 
3. Flat Addressing 
4. Signaling and Synchronization 
5. Atomic Memory Operations 
6. HSA System Timestamp 
7. User Mode Queuing 
8. Architected Queuing Language (AQL) 
9. HSA Agent Scheduling 
10. HSA Component Context Switching 
11. IEEE754-2008 Floating Point Exceptions 
12. HSA Component Hardware Debug Infrastructure 
13. HSA Platform Topology Discovery 
14. Images 
20 
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
User Model Queuing 
Multiple user-level command queues 
Runtime-allocated 
Architected Queuing Language (AQL) 
21
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Packet Processor 
22
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
User Mode Queue Operations 
HSA Runtime APIs ( for HSA application ) 
•hsa_queue_create ( ) 
•hsa_queue_destroy ( ) 
•hsa_queue_inactivate ( ) 
•hsa_queue_load_write_index_{acquire, relaxed} ( ) 
•hsa_queue_store_write_index_{relaxed, release} ( ) 
•hsa_queue_cas_write_index_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_queue_add_write_index_{acq_rel, acquire, relaxed, release} ( ) 
•hsa_queue_load_read_index_{acquire, relaxed} ( ) 
•hsa_queue_store_read_index_{relaxed, release} ( ) 
23 
HSAIL Instructions ( for HSA component ) 
•agentcount_u32 dest 
•agentid_u32 dest 
•ldk_uLength dest, kernelName 
•queueid_u32 dest 
•queueptr_uLength dest 
•ldqueuewriteindex_segment_order_u64 dest, address 
•stqueuewriteindex_segment_order_u64 address, src 
•casqueuewriteindex_segment_order_u64 dest, address, src0, src1 
•addqueuewriteindex_segment_order_u64 dest, address, src 
•ldqueuereadindex_segment_order_u64 dest, address 
•stqueuereadindex_segment_order_u64 address, src
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
AQL Packet Types 
24 
 HSA signaling object handle used to indicate completion of the job. 
Format (8-bit) 
barrier (1-bit) 
acquireFenceScope (2-bit) 
releaseFenceScope (2-bit) 
reserved (3-bit) 
Format 
0 Always_Reserved 
1 Invalid 
2 Kernel_Dispatch 
3 Barrier_AND 
4 Agent_Dispatch 
5 Barrier_OR
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Kernel Dispatch Packet 
25 
Work-group Size 
Grid Size 
Segment Size 
Pointer to the Kernel 
Pointer to the arguments
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Agent Dispatch Packet 
26 
64-bit direct or indirect arguments 
Pointer to location to store the function return value(s) in 
The function to be performed by the destination HSA Agent. 
The type value is split into the following ranges: 
0x0000 ~ 0x7FFF 
Reserved 
0x8000 ~ 0xFFFF 
User registered function
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Barrier-AND / Barrier-OR Packet 
The Barrier packet defines dependencies for the HSA Packet Processor to monitor. 
The HSA Packet Processor will not launch any further packets until the Barrier- AND / Barrier-OR packet is complete. 
27 
Handles for dependent signaling objects to be evaluated by the packet processor.
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Packet Process Flow 
All preceding packets in the queue must have completed their launch phase. 
If the barrier bit in the packet header is set than all preceding packets in the queue must have completed. 
In the launch phase an acquire memory fence is applied before the packet enters the active phase. 
Kernel Dispatch packets and Agent Dispatch packets execute on the HSA Component/Agent, and the active phase ends when the task completes. 
Barrier-AND and Barrier-OR packets remain in the active phase until their condition is met. 
The first step in the completion phase is the memory release fence. 
After the memory release fence completes the signal specified by the completionSignal field in the AQL packet is signaled with a decrementing atomic operation. 
28 
Launch Phase 
Active Phase 
Completion Phase
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Barrier-bit Example 
29 
completionSignalBarrier bit = 1AQL Packet DequeueEnqueueLaunch PhaseActive Phase Completion Phase © 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Barrier-AND Packet Example 
30
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
System Arch. Requirements 
1. Shared Virtual Memory 
2. Cache Coherency Domains 
3. Flat Addressing 
4. Signaling and Synchronization 
5. Atomic Memory Operations 
6. HSA System Timestamp 
7. User Mode Queuing 
8. Architected Queuing Language (AQL) 
9. HSA Agent Scheduling 
10. HSA Component Context Switching 
11. IEEE754-2008 Floating Point Exceptions 
12. HSA Component Hardware Debug Infrastructure 
13. HSA Platform Topology Discovery 
14. Images 
31 
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Agent Scheduling 
32 
AQL packet (Agent Dispatch packet or Barrier-AND/OR packet) HSA Agent SchedulingAgent Dispatch QueueAgent Dispatch QueueAgent Dispatch QueueAgent Dispatch QueueNon-HSA Task PoolAgent Dispatch QueueApplication #1Application #2Application #3HSAAgentTriggerTask execution completeAQL packet submission Barrier packet completeAgtAgtAgtAgtAgtAgtAgt © 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Component Context Switching 
33 
HSA Agent SchedulingAgent Dispatch QueueAgent Dispatch QueueAgent Dispatch QueueAgent Dispatch QueueNon-HSA Task PoolAgent Dispatch QueueApplication 1Application 2Application 3AgtAgtAgtAgtAgtAgtAgtCompute Unit(CU) Compute Unit(CU) Compute Unit(CU) HSA AgentContextSwitchingHSA ComponentKernelProgramKernelProgramKernelProgramWGWGWG1. Switch ( Required ) 2. Preempt ( Required as soon as possible ) 3. Terminate and context reset (Terminated as fast as possible) © 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
System Arch. Requirements 
1. Shared Virtual Memory 
2. Cache Coherency Domains 
3. Flat Addressing 
4. Signaling and Synchronization 
5. Atomic Memory Operations 
6. HSA System Timestamp 
7. User Mode Queuing 
8. Architected Queuing Language (AQL) 
9. HSA Agent Scheduling 
10. HSA Component Context Switching 
11. IEEE754-2008 Floating Point Exceptions 
12. HSA Component Hardware Debug Infrastructure 
13. HSA Platform Topology Discovery 
14. Images 
34 
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
FP Exception Reporting 
An HSA Component shall report certain defined exceptions related to the execution of the HSAIL code to the HSA Runtime. 
35 
DETECT Policy 
BREAK Policy 
Lane0Lane1Lane2Lane(N-1) Lane3WorkItemWorkItemWorkItemWorkItemWorkItemLane4WorkItemWork-Group 0Work-Group 2Work-Group 1Work-Group XWavefront 0Wavefront 1Wavefront 2Wavefront 3Wavefront YGridWork-Group 1Wavefront Size N = 1 ~ 64Compute UnitPCHSA Component (HSA Agent) Wavefront 2SIMD (Single Instruction, Multiple Data) styleStatus bitsPolicyException HandlerHSA RuntimeHost CPU (HSA Agent) Exception ModuleException Policy DETECT BREAKSignalingcleardetectexcept_u32getdetectexcept_u32setdetectexcept_u32HSAIL InstructionException CodeDescriptionInvalid operatoinDivide-by-zeroOverflowUnderflowInexact01234IEEE754-2008 © 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Debug Infrastructure 
The HSA Component shall provide mechanisms to allow system software and some select application software (for example, debuggers and profilers) to set breakpoints and collect throughput information for profiling. 
36 
Lane0Lane1Lane2Lane(N-1) Lane3WorkItemWorkItemWorkItemWorkItemWorkItemLane4WorkItemWork-Group 0Work-Group 2Work-Group 1Work-XWavefront 0Wavefront 1Wavefront 2Wavefront 3Wavefront YGridWork-Group 1Wavefront 64Compute UnitPCHSA Component (HSA Agent) Wavefront 2SIMD (Single Instruction, Multiple Data) styleStatus ModuleInstruction BreakpointDebug ModuleHost CPU (HSA Agent) DebuggersHSA ComponentDebug IntefaceProfilersConditionalBeakpointMemory Brakpoint © 2014 JAY WANG
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
System Arch. Requirements 
1. Shared Virtual Memory 
2. Cache Coherency Domains 
3. Flat Addressing 
4. Signaling and Synchronization 
5. Atomic Memory Operations 
6. HSA System Timestamp 
7. User Mode Queuing 
8. Architected Queuing Language (AQL) 
9. HSA Agent Scheduling 
10. HSA Component Context Switching 
11. IEEE754-2008 Floating Point Exceptions 
12. HSA Component Hardware Debug Infrastructure 
13. HSA Platform Topology Discovery 
14. Images 
37 
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Execution Environment 
38 
You have 2 OpenCL platform(s) 
---------------------------------------------- 
Platform[0].Name = NVIDIA CUDA 
Platform[0].Vendor = NVIDIA Corporation 
Platform[0].Version = OpenCL 1.1 CUDA 4.2.1 
Platform[0].Profile = FULL_PROFILE 
---------------------------------------------- 
Platform[1].Name = Intel(R) OpenCL 
Platform[1].Vendor = Intel(R) Corporation 
Platform[1].Version = OpenCL 1.2 
Platform[1].Profile = FULL_PROFILE 
---------------------------------------------- 
Platform[0] has 1 device(s) 
---------------------------------------------- 
Device[0].Type = CL_DEVICE_TYPE_GPU 
Device[0].Name = GeForce GT 625 
Device[0].Vendor = NVIDIA Corporation 
Device[0].Version = OpenCL 1.1 CUDA 
Device[0].DriverVersion = 320.49 
Device[0].Profile = FULL_PROFILE 
Device[0].OpenCL_C = OpenCL C 1.1 
Device[0].MaxComputeUnits = 1 
Device[0].MaxWiDimensions = 3 
Device[0].MaxWiSize = (1024,1024,64) 
Device[0].MaxWgSize = 1024 
Device[0].MaxClkFrequency = 1747 MHz 
Device[0].AddrSpaceSize = 32 bits 
Platform[1] has 1 device(s) ---------------------------------------------- Device[0].Type = CL_DEVICE_TYPE_CPU Device[0].Name = Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz Device[0].Vendor = Intel(R) Corporation Device[0].Version = OpenCL 1.2 (Build 80752) Device[0].DriverVersion = 3.0.1.15216 Device[0].Profile = FULL_PROFILE Device[0].OpenCL_C = OpenCL C 1.2 Device[0].MaxComputeUnits = 4 Device[0].MaxWiDimensions = 3 Device[0].MaxWiSize = (1024,1024,1024) Device[0].MaxWgSize = 1024 Device[0].MaxClkFrequency = 3100 MHz Device[0].AddrSpaceSize = 32 bits 
OpenCL APIs
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
HSA Platform Topology Discovery 
 HSA platform resources: Agent, Memory, Compute Properties, Caches, and I/O 
39 
HSA Platform Node 2 
Node 0 
Add-In Board (optional) 
HSA discrete GPU 
System Memory 
(cacheable) 
coherent 
(non-cacheable) 
non-coherent 
HSA APU 
GPU 
H-CU 
H-CU 
H-CU 
GPU 
H-CU 
H-CU 
H-CU 
CPU 
Core 
Core 
Core 
Device Local 
Memory 
coherent 
non-coherent 
Mem 
Mem 
HSA MMU 
SBIOS 
UEFI 
HSA discrete GPU 
GPU 
H-CU 
H-CU 
H-CU 
Device Local 
Memory 
coherent 
non-coherent 
Mem 
Node 1 
PCIe 
PCIE Bridge 
System Memory 
(cacheable) 
coherent 
(non-cacheable) 
non-coherent 
HSA APU 
GPU 
H-CU 
H-CU 
H-CU 
CPU 
Core 
Core 
Core 
Mem HSA MMU 
Add-In Board (optional) 
HSA discrete GPU 
GPU 
H-CU 
H-CU 
H-CU 
Device Local 
Memory 
coherent 
non-coherent 
PCIE 
Mem 
VBIOS 
UEFI GOP 
Socket Interconnect 
Node 3 
PCIE 
Node 4 
PCIE 
VBIOS 
UEFI GOP
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
System Arch. Requirements 
1. Shared Virtual Memory 
2. Cache Coherency Domains 
3. Flat Addressing 
4. Signaling and Synchronization 
5. Atomic Memory Operations 
6. HSA System Timestamp 
7. User Mode Queuing 
8. Architected Queuing Language (AQL) 
9. HSA Agent Scheduling 
10. HSA Component Context Switching 
11. IEEE754-2008 Floating Point Exceptions 
12. HSA Component Hardware Debug Infrastructure 
13. HSA Platform Topology Discovery 
14. Images 
40 
@ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Images 
A graphics feature that can sometimes be useful in data- parallel computing 
Used to store one-, two-, or three-dimensional images 
predefined image formats 
Image memory is a special kind of memory access 
Dedicated hardware to speed up image operations. 
41 
The OpenCL™ Specification Version 2.0: 5.3 Image Objects http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf
2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) 
Summary 
Programming model issues 
HSA Intermediate Language (HSAIL) + HSA Runtime 
Architected Queuing Language (AQL) + Signaling 
Debug infrastructure 
Communication overhead issues 
Cache coherent shared virtual memory (CC-SVM) 
Architected Queuing Language (AQL) for user mode queuing 
Hardware-assisted signaling and atomic operations for synchronization 
42 
CPUsGPUDSP... HSAILUnified Coherent MemoryHSA RuntimeAQL © 2014 JAY WANG

Weitere ähnliche Inhalte

Was ist angesagt?

A whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerA whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerNikita Popov
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control InterfaceLinaro
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netYan Vugenfirer
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New HardwareRuggedBoardGroup
 
Embedded Rust on ESP2 - Rust Linz
Embedded Rust on ESP2 - Rust LinzEmbedded Rust on ESP2 - Rust Linz
Embedded Rust on ESP2 - Rust LinzJuraj Michálek
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
淺談 Live patching technology
淺談 Live patching technology淺談 Live patching technology
淺談 Live patching technologySZ Lin
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmicsDenys Haryachyy
 
LCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLinaro
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival GuideKernel TLV
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation Jiann-Fuh Liaw
 
linux file sysytem& input and output
linux file sysytem& input and outputlinux file sysytem& input and output
linux file sysytem& input and outputMythiliA5
 

Was ist angesagt? (20)

A whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizerA whirlwind tour of the LLVM optimizer
A whirlwind tour of the LLVM optimizer
 
LAS16-200: SCMI - System Management and Control Interface
LAS16-200:  SCMI - System Management and Control InterfaceLAS16-200:  SCMI - System Management and Control Interface
LAS16-200: SCMI - System Management and Control Interface
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-netReceive side scaling (RSS) with eBPF in QEMU and virtio-net
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
 
U-Boot Porting on New Hardware
U-Boot Porting on New HardwareU-Boot Porting on New Hardware
U-Boot Porting on New Hardware
 
Cuda
CudaCuda
Cuda
 
Embedded Rust on ESP2 - Rust Linz
Embedded Rust on ESP2 - Rust LinzEmbedded Rust on ESP2 - Rust Linz
Embedded Rust on ESP2 - Rust Linz
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
淺談 Live patching technology
淺談 Live patching technology淺談 Live patching technology
淺談 Live patching technology
 
Understanding DPDK algorithmics
Understanding DPDK algorithmicsUnderstanding DPDK algorithmics
Understanding DPDK algorithmics
 
LCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLCA13: Power State Coordination Interface
LCA13: Power State Coordination Interface
 
What Can Compilers Do for Us?
What Can Compilers Do for Us?What Can Compilers Do for Us?
What Can Compilers Do for Us?
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival Guide
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
QEMU - Binary Translation
QEMU - Binary Translation QEMU - Binary Translation
QEMU - Binary Translation
 
linux file sysytem& input and output
linux file sysytem& input and outputlinux file sysytem& input and output
linux file sysytem& input and output
 

Andere mochten auch

美國內閣人事結構 160428
美國內閣人事結構 160428美國內閣人事結構 160428
美國內閣人事結構 160428健正 林
 
地政研究所演講 160311v3.1
地政研究所演講 160311v3.1地政研究所演講 160311v3.1
地政研究所演講 160311v3.1健正 林
 
No Place Left Session One
No Place Left Session OneNo Place Left Session One
No Place Left Session OneGrace Canberra
 
Career & Student Services
Career & Student ServicesCareer & Student Services
Career & Student ServicesNicole Spivey
 
Book of Daniel - Session One
Book of Daniel - Session OneBook of Daniel - Session One
Book of Daniel - Session OneGrace Canberra
 
The Greatest Love Story Ever Told
The Greatest Love Story Ever ToldThe Greatest Love Story Ever Told
The Greatest Love Story Ever ToldGrace Canberra
 
Finding the Kingdom Trail When Trouble Comes
Finding the Kingdom Trail When Trouble ComesFinding the Kingdom Trail When Trouble Comes
Finding the Kingdom Trail When Trouble ComesGrace Canberra
 
地政研究所演講 160311v3.1
地政研究所演講 160311v3.1地政研究所演講 160311v3.1
地政研究所演講 160311v3.1健正 林
 
ABP Introduction
ABP IntroductionABP Introduction
ABP IntroductionJustin Yi
 
Wasze dzieci w naszym życiu
Wasze dzieci w naszym życiuWasze dzieci w naszym życiu
Wasze dzieci w naszym życiuTomasz Murias
 
External and Internal Reference Points
External and Internal Reference PointsExternal and Internal Reference Points
External and Internal Reference PointsGrace Canberra
 
頭前溪兩岸空間發展策略 20160311v4.1
頭前溪兩岸空間發展策略 20160311v4.1頭前溪兩岸空間發展策略 20160311v4.1
頭前溪兩岸空間發展策略 20160311v4.1健正 林
 
How to Stop Panic Attacks at Night
How to Stop Panic Attacks at NightHow to Stop Panic Attacks at Night
How to Stop Panic Attacks at NightHowToDealWithAnxiety
 

Andere mochten auch (20)

美國內閣人事結構 160428
美國內閣人事結構 160428美國內閣人事結構 160428
美國內閣人事結構 160428
 
地政研究所演講 160311v3.1
地政研究所演講 160311v3.1地政研究所演講 160311v3.1
地政研究所演講 160311v3.1
 
No Place Left Session One
No Place Left Session OneNo Place Left Session One
No Place Left Session One
 
Career & Student Services
Career & Student ServicesCareer & Student Services
Career & Student Services
 
Book of Daniel - Session One
Book of Daniel - Session OneBook of Daniel - Session One
Book of Daniel - Session One
 
The Greatest Love Story Ever Told
The Greatest Love Story Ever ToldThe Greatest Love Story Ever Told
The Greatest Love Story Ever Told
 
Finding the Kingdom Trail When Trouble Comes
Finding the Kingdom Trail When Trouble ComesFinding the Kingdom Trail When Trouble Comes
Finding the Kingdom Trail When Trouble Comes
 
地政研究所演講 160311v3.1
地政研究所演講 160311v3.1地政研究所演講 160311v3.1
地政研究所演講 160311v3.1
 
Easter Sunday Message
Easter Sunday  MessageEaster Sunday  Message
Easter Sunday Message
 
ABP Introduction
ABP IntroductionABP Introduction
ABP Introduction
 
Wasze dzieci w naszym życiu
Wasze dzieci w naszym życiuWasze dzieci w naszym życiu
Wasze dzieci w naszym życiu
 
SMTULSA Social Business Conference Sponsorship Kit
SMTULSA Social Business Conference Sponsorship KitSMTULSA Social Business Conference Sponsorship Kit
SMTULSA Social Business Conference Sponsorship Kit
 
External and Internal Reference Points
External and Internal Reference PointsExternal and Internal Reference Points
External and Internal Reference Points
 
The Chistmas Story
The Chistmas StoryThe Chistmas Story
The Chistmas Story
 
頭前溪兩岸空間發展策略 20160311v4.1
頭前溪兩岸空間發展策略 20160311v4.1頭前溪兩岸空間發展策略 20160311v4.1
頭前溪兩岸空間發展策略 20160311v4.1
 
Unbelievable
UnbelievableUnbelievable
Unbelievable
 
How to Stop Panic Attacks at Night
How to Stop Panic Attacks at NightHow to Stop Panic Attacks at Night
How to Stop Panic Attacks at Night
 
Something I Can Use
Something I Can UseSomething I Can Use
Something I Can Use
 
The Road Ahead
The Road AheadThe Road Ahead
The Road Ahead
 
1 John 2:12-17
1 John 2:12-171 John 2:12-17
1 John 2:12-17
 

Ähnlich wie HSA System Architecture Overview (2014-10-31)

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...AMD Developer Central
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - RuntimeHSA Foundation
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattAMD Developer Central
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSA Foundation
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...Edge AI and Vision Alliance
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective HSA Foundation
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - IntroHSA Foundation
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future ComputingGanesan Narayanasamy
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Foundation
 
xPatterns - Spark Summit 2014
xPatterns - Spark Summit   2014xPatterns - Spark Summit   2014
xPatterns - Spark Summit 2014Claudiu Barbura
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013HSA Foundation
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)Claudiu Barbura
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 
Hasura 2.0 Webinar
Hasura 2.0   WebinarHasura 2.0   Webinar
Hasura 2.0 WebinarHasura
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013 HSA Foundation
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsJim Dowling
 

Ähnlich wie HSA System Architecture Overview (2014-10-31) (20)

HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
HC-4015, An Overview of the HSA System Architecture Requirements, by Paul Bli...
 
HSA Introduction
HSA IntroductionHSA Introduction
HSA Introduction
 
ISCA final presentation - Runtime
ISCA final presentation - RuntimeISCA final presentation - Runtime
ISCA final presentation - Runtime
 
HSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian BrattHSA-4122, "HSA Queuing Mode," by Ian Bratt
HSA-4122, "HSA Queuing Mode," by Ian Bratt
 
HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA HSAemu a Full System Emulator for HSA
HSAemu a Full System Emulator for HSA
 
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati..."Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
"Enabling Efficient Heterogeneous Processing Through Coherency," a Presentati...
 
HSA From A Software Perspective
HSA From A Software Perspective HSA From A Software Perspective
HSA From A Software Perspective
 
ISCA Final Presentation - Intro
ISCA Final Presentation - IntroISCA Final Presentation - Intro
ISCA Final Presentation - Intro
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Open Hardware and Future Computing
Open Hardware and Future ComputingOpen Hardware and Future Computing
Open Hardware and Future Computing
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit EU talk by Debasish Das and Pramod Narasimha
 
HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013 HSA Queuing Hot Chips 2013
HSA Queuing Hot Chips 2013
 
xPatterns - Spark Summit 2014
xPatterns - Spark Summit   2014xPatterns - Spark Summit   2014
xPatterns - Spark Summit 2014
 
HSA Introduction Hot Chips 2013
HSA Introduction  Hot Chips 2013HSA Introduction  Hot Chips 2013
HSA Introduction Hot Chips 2013
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
Hasura 2.0 Webinar
Hasura 2.0   WebinarHasura 2.0   Webinar
Hasura 2.0 Webinar
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Hana faq
Hana faqHana faq
Hana faq
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 

Kürzlich hochgeladen

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 

Kürzlich hochgeladen (20)

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 

HSA System Architecture Overview (2014-10-31)

  • 1. ITRI Industrial Technology Research Institute 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 HSA System Architecture Overview 王振傑 (Jay Wang) 嵌入式系統與晶片技術組 -系統架構設計部 (D200) 資訊與通訊研究所 (ICL) ccwang.jay@itri.org.tw 2014-10-31
  • 2. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Platform Model 2 In HSA system, a regular device is called an HSA agent, and if the HSA agent can run kernels then it is also an HSA component. Serial and Task Parallel Workloads Data Parallel Workloads SIMD
  • 3. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Three Eras of Processor Performance 3 ? Single-thread Performance Time we are here Enabled by:  Moore’s Observation  Voltage Scaling  Micro-Architecture Constrained by:  Power  Complexity Single-Core Era Modern Application Performance Time (Data-parallel exploitation) we are here Heterogeneous Systems Era Enabled by:  Moore’s Observation  Abundant data parallelism  Power efficient data parallel processing (GPUs) Constrained by:  Programming models  Communication overheads Throughput Performance Time (# of processors) we are here Enabled by:  Moore’s Observation  Desire for Throughput  20 years of SMP arch Constrained by:  Power  Parallel SW availability  Scalability Multi-Core Era Assembly  C/C++  Java … pthreads  OpenMP / TBB … Shader  CUDA OpenCL  C++ and Java SOURCE : HSA INTRODUCTION, HSA FOUNDATION (PHIL ROGERS, AMD)
  • 4. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Intermediate Language (HSAIL) 4 The HSA Foundation members are building a heterogeneous compute software ecosystem built on open, royalty-free industry standards and open-source software: the HSA runtimes and compilation tools are based on open-source technologies such as LLVM and GCC. ( https://github.com/HSAFoundation )
  • 5. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSAIL Programming Model 5
  • 6. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Runtime Stack 6
  • 7. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Kernel Execution 7
  • 8. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Memory Consistency Model (Relaxed Model) Second Operation ld_rlx st_rlx atomic_rlx atomicNoRet_rlx atomic_acq atomicNoRet_acq fence_acq atomic_rel atomicNoRet_rel fence_rel atomic_ar atomicNoRet_ar fence_ar First Operation ld_rlx or st_rlx yes yes yes yes no no atomic_rlx atomicNoRet_rlx yes yes yes no no no atomic_acq atomicNoRet_acq fence_acq no no no no no no atomic_rel atomicNoRet_rel yes yes no no no no fence_rel yes no no no no no atomic_ar atomicNoRet_ar fence_ar no no no no no no 8 relaxed ; ….. acquire ; ….. release ; ….. acq_rel ; …..
  • 9. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Signaling and Synchronization 5. Atomic Memory Operations 6. HSA System Timestamp 7. User Mode Queuing 8. Architected Queuing Language (AQL) 9. HSA Agent Scheduling 10. HSA Component Context Switching 11. IEEE754-2008 Floating Point Exceptions 12. HSA Component Hardware Debug Infrastructure 13. HSA Platform Topology Discovery 14. Images 9 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
  • 10. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Legacy GPU Compute Multiple memory pools and address spaces Data copies before/after GPU compute 10 System MemoryGPU Memory123Host CPUsGPUVirtual Memory #1Virtual Memory #2(HSA Agent) (HSA Agent) (HSA Component) © 2014 JAY WANG
  • 11. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Host CPUs GPU @ 2014 JAY WANG (HSA Agent) (HSA Agent) (HSA Component) Shared Virtual Memory System Memory GPU Memory Shared Virtual Memory (HSA) 11 32-bit HSA System (32 bits VA) 64-bit HSA System (≥ 48 bits VA) MMU OS Page Table MMU
  • 12. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Memory Hierarchy 12 1)Global 2)Group 3)Private 4)Kernarg 5)Readonly 6)Spill 7)Arg Virtual Address Range Reservation (System Memory or Device Local Memory)
  • 13. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Cache Coherency Domains 13 System Memory Cache Cache Cache Coherency
  • 14. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Signaling and Synchronization 5. Atomic Memory Operations 6. HSA System Timestamp 7. User Mode Queuing 8. Architected Queuing Language (AQL) 9. HSA Agent Scheduling 10. HSA Component Context Switching 11. IEEE754-2008 Floating Point Exceptions 12. HSA Component Hardware Debug Infrastructure 13. HSA Platform Topology Discovery 14. Images 14 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
  • 15. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Signaling and Synchronization  The required mechanisms for HSAIL and the HSA runtime are:  Allocate/Destroy an HSA signal  Read the current HSA signal value  Wait on an HSA signal to meet a specified condition (with a maximum wait duration requested)  Send an HSA signal value  Atomic read-modify-write an HSA signal value 15 Signal Handle (hsa_signal_t) Signal Value (hsa_signal_value_t) HSA Agent HSA Component Host CPU HSA Agent HSA Runtime APIs HSAIL Instructions Implementation-defined data Sig32 or Sig64 © 2014 JAY WANG sem_init() sem_wait() sem_post() sem_destroy() pthread_mutex_init() pthread_mutex_lock() pthread_mutex_unlock() pthread_mutex_destroy()
  • 16. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Runtime APIs for Signaling 16 HSA Runtime APIs ( for HSA application ) •hsa_signal_create ( ) •hsa_signal_destroy ( ) •hsa_signal_load_{acquire, relaxed} ( ) •hsa_signal_store_{relaxed, release} ( ) •hsa_signal_exchange_{acq_rel, acquire, relaxed, release} ( ) •hsa_signal_cas_{acq_rel, acquire, relaxed, release} ( ) •hsa_signal_add_{acq_rel, acquire, relaxed, release} ( ) •hsa_signal_subtract_{acq_rel, acquire, relaxed, release} ( ) •hsa_signal_and_{acq_rel, acquire, relaxed, release} ( ) •hsa_signal_or_{acq_rel, acquire, relaxed, release} ( ) •hsa_signal_xor_{acq_rel, acquire, relaxed, release} ( ) •hsa_signal_wait__{acquire, relaxed} ( ) HSA Runtime Programmer’s Reference Manual (v1.00) 2.4 Signals
  • 17. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSAIL Instructions for Signaling 17 HSA Programmer’s Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and Object Format (BRIG) (Version 1.0 Provisional) 6.8 Notification (signal) Operation
  • 18. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Atomic Memory Operations HSA requires the following standard atomic memory operations to be supported by HSA Components (other HSA Agents only need to support the subset of these operations required by their role in the system): Load from memory Store to memory Fetch from memory, apply logic operation (bitwise AND/OR/XOR) with one addition operand, and store back. Fetch from memory, apply integer arithmetic operation (add, subtract, increment, decrement, minimum, maximum) with one addition operand, and store back. Exchange memory location with operand. Compare-and-swap (CAS); load memory location, compare with first operand, if equal than store second operand back to memory location. 18
  • 19. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA System Timestamp The HSA system provide for a low overhead mechanism of determining the passing of time. A system timestamp is required that can be read from HSAIL or through the HSA runtime. It is also possible to determine the system timestamp frequency through the HSA runtime. 19
  • 20. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Signaling and Synchronization 5. Atomic Memory Operations 6. HSA System Timestamp 7. User Mode Queuing 8. Architected Queuing Language (AQL) 9. HSA Agent Scheduling 10. HSA Component Context Switching 11. IEEE754-2008 Floating Point Exceptions 12. HSA Component Hardware Debug Infrastructure 13. HSA Platform Topology Discovery 14. Images 20 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
  • 21. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) User Model Queuing Multiple user-level command queues Runtime-allocated Architected Queuing Language (AQL) 21
  • 22. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Packet Processor 22
  • 23. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) User Mode Queue Operations HSA Runtime APIs ( for HSA application ) •hsa_queue_create ( ) •hsa_queue_destroy ( ) •hsa_queue_inactivate ( ) •hsa_queue_load_write_index_{acquire, relaxed} ( ) •hsa_queue_store_write_index_{relaxed, release} ( ) •hsa_queue_cas_write_index_{acq_rel, acquire, relaxed, release} ( ) •hsa_queue_add_write_index_{acq_rel, acquire, relaxed, release} ( ) •hsa_queue_load_read_index_{acquire, relaxed} ( ) •hsa_queue_store_read_index_{relaxed, release} ( ) 23 HSAIL Instructions ( for HSA component ) •agentcount_u32 dest •agentid_u32 dest •ldk_uLength dest, kernelName •queueid_u32 dest •queueptr_uLength dest •ldqueuewriteindex_segment_order_u64 dest, address •stqueuewriteindex_segment_order_u64 address, src •casqueuewriteindex_segment_order_u64 dest, address, src0, src1 •addqueuewriteindex_segment_order_u64 dest, address, src •ldqueuereadindex_segment_order_u64 dest, address •stqueuereadindex_segment_order_u64 address, src
  • 24. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) AQL Packet Types 24  HSA signaling object handle used to indicate completion of the job. Format (8-bit) barrier (1-bit) acquireFenceScope (2-bit) releaseFenceScope (2-bit) reserved (3-bit) Format 0 Always_Reserved 1 Invalid 2 Kernel_Dispatch 3 Barrier_AND 4 Agent_Dispatch 5 Barrier_OR
  • 25. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Kernel Dispatch Packet 25 Work-group Size Grid Size Segment Size Pointer to the Kernel Pointer to the arguments
  • 26. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Agent Dispatch Packet 26 64-bit direct or indirect arguments Pointer to location to store the function return value(s) in The function to be performed by the destination HSA Agent. The type value is split into the following ranges: 0x0000 ~ 0x7FFF Reserved 0x8000 ~ 0xFFFF User registered function
  • 27. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Barrier-AND / Barrier-OR Packet The Barrier packet defines dependencies for the HSA Packet Processor to monitor. The HSA Packet Processor will not launch any further packets until the Barrier- AND / Barrier-OR packet is complete. 27 Handles for dependent signaling objects to be evaluated by the packet processor.
  • 28. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Packet Process Flow All preceding packets in the queue must have completed their launch phase. If the barrier bit in the packet header is set than all preceding packets in the queue must have completed. In the launch phase an acquire memory fence is applied before the packet enters the active phase. Kernel Dispatch packets and Agent Dispatch packets execute on the HSA Component/Agent, and the active phase ends when the task completes. Barrier-AND and Barrier-OR packets remain in the active phase until their condition is met. The first step in the completion phase is the memory release fence. After the memory release fence completes the signal specified by the completionSignal field in the AQL packet is signaled with a decrementing atomic operation. 28 Launch Phase Active Phase Completion Phase
  • 29. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Barrier-bit Example 29 completionSignalBarrier bit = 1AQL Packet DequeueEnqueueLaunch PhaseActive Phase Completion Phase © 2014 JAY WANG
  • 30. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Barrier-AND Packet Example 30
  • 31. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Signaling and Synchronization 5. Atomic Memory Operations 6. HSA System Timestamp 7. User Mode Queuing 8. Architected Queuing Language (AQL) 9. HSA Agent Scheduling 10. HSA Component Context Switching 11. IEEE754-2008 Floating Point Exceptions 12. HSA Component Hardware Debug Infrastructure 13. HSA Platform Topology Discovery 14. Images 31 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
  • 32. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Agent Scheduling 32 AQL packet (Agent Dispatch packet or Barrier-AND/OR packet) HSA Agent SchedulingAgent Dispatch QueueAgent Dispatch QueueAgent Dispatch QueueAgent Dispatch QueueNon-HSA Task PoolAgent Dispatch QueueApplication #1Application #2Application #3HSAAgentTriggerTask execution completeAQL packet submission Barrier packet completeAgtAgtAgtAgtAgtAgtAgt © 2014 JAY WANG
  • 33. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Component Context Switching 33 HSA Agent SchedulingAgent Dispatch QueueAgent Dispatch QueueAgent Dispatch QueueAgent Dispatch QueueNon-HSA Task PoolAgent Dispatch QueueApplication 1Application 2Application 3AgtAgtAgtAgtAgtAgtAgtCompute Unit(CU) Compute Unit(CU) Compute Unit(CU) HSA AgentContextSwitchingHSA ComponentKernelProgramKernelProgramKernelProgramWGWGWG1. Switch ( Required ) 2. Preempt ( Required as soon as possible ) 3. Terminate and context reset (Terminated as fast as possible) © 2014 JAY WANG
  • 34. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Signaling and Synchronization 5. Atomic Memory Operations 6. HSA System Timestamp 7. User Mode Queuing 8. Architected Queuing Language (AQL) 9. HSA Agent Scheduling 10. HSA Component Context Switching 11. IEEE754-2008 Floating Point Exceptions 12. HSA Component Hardware Debug Infrastructure 13. HSA Platform Topology Discovery 14. Images 34 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
  • 35. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) FP Exception Reporting An HSA Component shall report certain defined exceptions related to the execution of the HSAIL code to the HSA Runtime. 35 DETECT Policy BREAK Policy Lane0Lane1Lane2Lane(N-1) Lane3WorkItemWorkItemWorkItemWorkItemWorkItemLane4WorkItemWork-Group 0Work-Group 2Work-Group 1Work-Group XWavefront 0Wavefront 1Wavefront 2Wavefront 3Wavefront YGridWork-Group 1Wavefront Size N = 1 ~ 64Compute UnitPCHSA Component (HSA Agent) Wavefront 2SIMD (Single Instruction, Multiple Data) styleStatus bitsPolicyException HandlerHSA RuntimeHost CPU (HSA Agent) Exception ModuleException Policy DETECT BREAKSignalingcleardetectexcept_u32getdetectexcept_u32setdetectexcept_u32HSAIL InstructionException CodeDescriptionInvalid operatoinDivide-by-zeroOverflowUnderflowInexact01234IEEE754-2008 © 2014 JAY WANG
  • 36. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Debug Infrastructure The HSA Component shall provide mechanisms to allow system software and some select application software (for example, debuggers and profilers) to set breakpoints and collect throughput information for profiling. 36 Lane0Lane1Lane2Lane(N-1) Lane3WorkItemWorkItemWorkItemWorkItemWorkItemLane4WorkItemWork-Group 0Work-Group 2Work-Group 1Work-XWavefront 0Wavefront 1Wavefront 2Wavefront 3Wavefront YGridWork-Group 1Wavefront 64Compute UnitPCHSA Component (HSA Agent) Wavefront 2SIMD (Single Instruction, Multiple Data) styleStatus ModuleInstruction BreakpointDebug ModuleHost CPU (HSA Agent) DebuggersHSA ComponentDebug IntefaceProfilersConditionalBeakpointMemory Brakpoint © 2014 JAY WANG
  • 37. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Signaling and Synchronization 5. Atomic Memory Operations 6. HSA System Timestamp 7. User Mode Queuing 8. Architected Queuing Language (AQL) 9. HSA Agent Scheduling 10. HSA Component Context Switching 11. IEEE754-2008 Floating Point Exceptions 12. HSA Component Hardware Debug Infrastructure 13. HSA Platform Topology Discovery 14. Images 37 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
  • 38. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Execution Environment 38 You have 2 OpenCL platform(s) ---------------------------------------------- Platform[0].Name = NVIDIA CUDA Platform[0].Vendor = NVIDIA Corporation Platform[0].Version = OpenCL 1.1 CUDA 4.2.1 Platform[0].Profile = FULL_PROFILE ---------------------------------------------- Platform[1].Name = Intel(R) OpenCL Platform[1].Vendor = Intel(R) Corporation Platform[1].Version = OpenCL 1.2 Platform[1].Profile = FULL_PROFILE ---------------------------------------------- Platform[0] has 1 device(s) ---------------------------------------------- Device[0].Type = CL_DEVICE_TYPE_GPU Device[0].Name = GeForce GT 625 Device[0].Vendor = NVIDIA Corporation Device[0].Version = OpenCL 1.1 CUDA Device[0].DriverVersion = 320.49 Device[0].Profile = FULL_PROFILE Device[0].OpenCL_C = OpenCL C 1.1 Device[0].MaxComputeUnits = 1 Device[0].MaxWiDimensions = 3 Device[0].MaxWiSize = (1024,1024,64) Device[0].MaxWgSize = 1024 Device[0].MaxClkFrequency = 1747 MHz Device[0].AddrSpaceSize = 32 bits Platform[1] has 1 device(s) ---------------------------------------------- Device[0].Type = CL_DEVICE_TYPE_CPU Device[0].Name = Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz Device[0].Vendor = Intel(R) Corporation Device[0].Version = OpenCL 1.2 (Build 80752) Device[0].DriverVersion = 3.0.1.15216 Device[0].Profile = FULL_PROFILE Device[0].OpenCL_C = OpenCL C 1.2 Device[0].MaxComputeUnits = 4 Device[0].MaxWiDimensions = 3 Device[0].MaxWiSize = (1024,1024,1024) Device[0].MaxWgSize = 1024 Device[0].MaxClkFrequency = 3100 MHz Device[0].AddrSpaceSize = 32 bits OpenCL APIs
  • 39. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) HSA Platform Topology Discovery  HSA platform resources: Agent, Memory, Compute Properties, Caches, and I/O 39 HSA Platform Node 2 Node 0 Add-In Board (optional) HSA discrete GPU System Memory (cacheable) coherent (non-cacheable) non-coherent HSA APU GPU H-CU H-CU H-CU GPU H-CU H-CU H-CU CPU Core Core Core Device Local Memory coherent non-coherent Mem Mem HSA MMU SBIOS UEFI HSA discrete GPU GPU H-CU H-CU H-CU Device Local Memory coherent non-coherent Mem Node 1 PCIe PCIE Bridge System Memory (cacheable) coherent (non-cacheable) non-coherent HSA APU GPU H-CU H-CU H-CU CPU Core Core Core Mem HSA MMU Add-In Board (optional) HSA discrete GPU GPU H-CU H-CU H-CU Device Local Memory coherent non-coherent PCIE Mem VBIOS UEFI GOP Socket Interconnect Node 3 PCIE Node 4 PCIE VBIOS UEFI GOP
  • 40. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) System Arch. Requirements 1. Shared Virtual Memory 2. Cache Coherency Domains 3. Flat Addressing 4. Signaling and Synchronization 5. Atomic Memory Operations 6. HSA System Timestamp 7. User Mode Queuing 8. Architected Queuing Language (AQL) 9. HSA Agent Scheduling 10. HSA Component Context Switching 11. IEEE754-2008 Floating Point Exceptions 12. HSA Component Hardware Debug Infrastructure 13. HSA Platform Topology Discovery 14. Images 40 @ HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION (2014-09-15)
  • 41. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Images A graphics feature that can sometimes be useful in data- parallel computing Used to store one-, two-, or three-dimensional images predefined image formats Image memory is a special kind of memory access Dedicated hardware to speed up image operations. 41 The OpenCL™ Specification Version 2.0: 5.3 Image Objects http://www.khronos.org/registry/cl/specs/opencl-2.0.pdf
  • 42. 2014 異質系統架構 (HSA) 技術論壇 - 工業技術研究院 (2014-10-31) Summary Programming model issues HSA Intermediate Language (HSAIL) + HSA Runtime Architected Queuing Language (AQL) + Signaling Debug infrastructure Communication overhead issues Cache coherent shared virtual memory (CC-SVM) Architected Queuing Language (AQL) for user mode queuing Hardware-assisted signaling and atomic operations for synchronization 42 CPUsGPUDSP... HSAILUnified Coherent MemoryHSA RuntimeAQL © 2014 JAY WANG