Introduction to HSA

INTRODUCTION TO
HETEROGENEOUS SYSTEM
ARCHITECTURE
Presenter: BingRu Wu

Outline
◻ Introduction
◻ Goal
◻ Concept
◻ Memory Model
◻ System Components

Introduction
◻ HSA: Heterogeneous System Architecture
◻ Promising future:
◻ Arm processors producers
◻ GPU vendors: AMD, Imaginations
◻ Fully utilize computation resource
◻ Our system may connect to major
application base with supporting HSA

Goal of HSA
◻ Remove programmability barrier
◻ Memory space barrier
◻ Access latency among devices
◻ Backward compatible
◻ Utilize existing programming models

Abstract
◻ Two kinds of compute unit
◻ LCU: Latency Compute Unit (ex. CPU)
◻ TCU: Throughput Compute Unit (ex. GPU)
◻ Merged memory space

Memory Management (1/2)
◻ Shared page table
◻ Memory is shared by all devices
◻ No longer host to device copy and vice versa
◻ Support pointer data structure (ex. list)
◻ Page faulting
◻ Virtual memory space for all devices
◻ ex. GPU now can use memory as if it has
whole memory space

Memory Management (2/2)
◻ Coherent memory regions
◻ The memory is coherent
◻ Shared among all devices (CUs)
◻ Unified address space
◻ Memory type separated by address
◻ Private / local / global memory decided by
memory region
◻ No special instruction is required

User-Level Command Queue
◻ Queues for communication
◻ User to device
◻ Device to device
◻ HSA runtime handles the queue
◻ Allocation & destruction
◻ Each per application
◻ Vendor dependent implementation
◻ Direct access to devices
◻ No OS syscall
◻ No task managing

Hardware Scheduler (1/3)
◻ No real scheduling on TCU (GPU)
◻ Task scheduling
◻ Task preemption
◻ Current implementation
◻ Execute without lock:
◻ All threads execute
◻ Multiple tasks cause error result

◻ Current implementation
◻ Execute with lock:
◻ Code exception may cause the resource being
locked up
◻ Long runtime tasks prevent others from
execution
◻ We may fail to finish critical jobs

HSA runtime guarantees:
◻ Bounded execution time
◻ Any process cease in reasonable time
◻ Fast switch among applications
◻ Use hardware to save time
◻ Application level parallelism

HSAIL (1/2)
◻ HSA Intermediate Language
◻ The language for TCU
◻ Similar to “PTX” code
◻ No graphic-specific instructions
◻ Further translated to HW ISA (by Finalizer)
◻ The abstract platform is similar to OpenCL
◻ Work item (thread)
◻ Work group (block)
◻ NDRange (grid)

◻ All types of memory using same space
◻ Memory access behavior
◻ Not all regions are accessible by all devices
◻ OS kernel should not be accessible
◻ Mapping to a region in kernel is still possible
◻ Accessing identical address may gives
different values
◻ Work item private memory
◻ Work group local memory
◻ Accessing other item / group is not valid
Virtual Memory Address

◻ Global
◻ The memory shared by all LCU & TCU
◻ Accessible via work item / group
◻ Group
◻ The memory shared by all work items in the
same group
◻ Private
◻ The memory only visible by a work item
Memory Region

◻ Kernarg
◻ The memory for kernel arguments
◻ Kernel is the code fragment we ask a device
to run on
◻ Readonly
◻ Read-only type of global memory
◻ Spill
◻ Memory for register spill
◻ Arg
◻ Memory for function call arguments
Memory Region

Memory Consistency
◻ LCU
◻ LCU maintains its own consistency
◻ Shares global memory
◻ Work item
◻ Memory operation to same address by single
work item is in order
◻ Memory operations to different address may
be reordered
◻ Other than that, nothing is guaranteed

Compilation
◻ Frontend
◻ LLVM IR
◻ No data dependency
◻ Backend
◻ Convert IR to HSAIL
◻ Optimization happens
here
◻ Binary format
◻ ELF format
◻ Embedded container for
HSAIL (BRIG)

Runtime
◻ HSA runtime
◻ Issue tasks to device
protocol
◻ Device
◻ Convert HSAIL to ISA with
Finalizer

HSAIL Program Features
◻ Backward Compatible
◻ A system without HSA support should still
run the executable
◻ Function Invocation
◻ LCU functions may call LCU ones
◻ TCU functions may call TCU ones with
Finalizer support
◻ LCU to TCU / TCU to LCU is supported by
using queue
◻ C++ compatible

Conclusion
◻ HSA is an open and standard layer
between software / hardware
◻ The cardinal feature of HSA is the unified
virtual memory space
◻ No replacement for current programming
framework, no new language is required

Reference
◻ Heterogeneous System Architecture: A
Technical Review
◻ HSA Programmer’s Reference Manual
◻ HSAIL: Write-Once-Run-Everywhere for
Heterogeneous Systems

Introduction to HSA

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (14)

Ähnlich wie Introduction to HSA

Ähnlich wie Introduction to HSA (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Introduction to HSA