3. Introduction
◻ HSA: Heterogeneous System Architecture
◻ Promising future:
◻ Arm processors producers
◻ GPU vendors: AMD, Imaginations
◻ Fully utilize computation resource
◻ Our system may connect to major
application base with supporting HSA
4. Goal of HSA
◻ Remove programmability barrier
◻ Memory space barrier
◻ Access latency among devices
◻ Backward compatible
◻ Utilize existing programming models
6. Abstract
◻ Two kinds of compute unit
◻ LCU: Latency Compute Unit (ex. CPU)
◻ TCU: Throughput Compute Unit (ex. GPU)
◻ Merged memory space
7. Memory Management (1/2)
◻ Shared page table
◻ Memory is shared by all devices
◻ No longer host to device copy and vice versa
◻ Support pointer data structure (ex. list)
◻ Page faulting
◻ Virtual memory space for all devices
◻ ex. GPU now can use memory as if it has
whole memory space
8. Memory Management (2/2)
◻ Coherent memory regions
◻ The memory is coherent
◻ Shared among all devices (CUs)
◻ Unified address space
◻ Memory type separated by address
◻ Private / local / global memory decided by
memory region
◻ No special instruction is required
9. User-Level Command Queue
◻ Queues for communication
◻ User to device
◻ Device to device
◻ HSA runtime handles the queue
◻ Allocation & destruction
◻ Each per application
◻ Vendor dependent implementation
◻ Direct access to devices
◻ No OS syscall
◻ No task managing
10. Hardware Scheduler (1/3)
◻ No real scheduling on TCU (GPU)
◻ Task scheduling
◻ Task preemption
◻ Current implementation
◻ Execute without lock:
◻ All threads execute
◻ Multiple tasks cause error result
11. Hardware Scheduler (2/3)
◻ Current implementation
◻ Execute with lock:
◻ Code exception may cause the resource being
locked up
◻ Long runtime tasks prevent others from
execution
◻ We may fail to finish critical jobs
12. Hardware Scheduler (3/3)
HSA runtime guarantees:
◻ Bounded execution time
◻ Any process cease in reasonable time
◻ Fast switch among applications
◻ Use hardware to save time
◻ Application level parallelism
13. HSAIL (1/2)
◻ HSA Intermediate Language
◻ The language for TCU
◻ Similar to “PTX” code
◻ No graphic-specific instructions
◻ Further translated to HW ISA (by Finalizer)
◻ The abstract platform is similar to OpenCL
◻ Work item (thread)
◻ Work group (block)
◻ NDRange (grid)
16. ◻ All types of memory using same space
◻ Memory access behavior
◻ Not all regions are accessible by all devices
◻ OS kernel should not be accessible
◻ Mapping to a region in kernel is still possible
◻ Accessing identical address may gives
different values
◻ Work item private memory
◻ Work group local memory
◻ Accessing other item / group is not valid
Virtual Memory Address
17. ◻ Global
◻ The memory shared by all LCU & TCU
◻ Accessible via work item / group
◻ Group
◻ The memory shared by all work items in the
same group
◻ Private
◻ The memory only visible by a work item
Memory Region
18. ◻ Kernarg
◻ The memory for kernel arguments
◻ Kernel is the code fragment we ask a device
to run on
◻ Readonly
◻ Read-only type of global memory
◻ Spill
◻ Memory for register spill
◻ Arg
◻ Memory for function call arguments
Memory Region
19. Memory Consistency
◻ LCU
◻ LCU maintains its own consistency
◻ Shares global memory
◻ Work item
◻ Memory operation to same address by single
work item is in order
◻ Memory operations to different address may
be reordered
◻ Other than that, nothing is guaranteed
22. Compilation
◻ Frontend
◻ LLVM IR
◻ No data dependency
◻ Backend
◻ Convert IR to HSAIL
◻ Optimization happens
here
◻ Binary format
◻ ELF format
◻ Embedded container for
HSAIL (BRIG)
23. Runtime
◻ HSA runtime
◻ Issue tasks to device
protocol
◻ Device
◻ Convert HSAIL to ISA with
Finalizer
24. HSAIL Program Features
◻ Backward Compatible
◻ A system without HSA support should still
run the executable
◻ Function Invocation
◻ LCU functions may call LCU ones
◻ TCU functions may call TCU ones with
Finalizer support
◻ LCU to TCU / TCU to LCU is supported by
using queue
◻ C++ compatible
25. Conclusion
◻ HSA is an open and standard layer
between software / hardware
◻ The cardinal feature of HSA is the unified
virtual memory space
◻ No replacement for current programming
framework, no new language is required
26. Reference
◻ Heterogeneous System Architecture: A
Technical Review
◻ HSA Programmer’s Reference Manual
◻ HSAIL: Write-Once-Run-Everywhere for
Heterogeneous Systems