SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Introduction to ARMv8 Aarch64
2014
issue.hsu@gmail.com
What is Aarch64?
• 64 Bit Instruction set introduced in ARMv8
2
Overview
• 64-Bit pointer and registers
• Fixed length (32bit) instructions
• Load/store architecture
• Little endian (big endian possible)
• 31 general purpose registers and zero register
• Unaligned access ok
– Except of exclusive and ordered accesses
3
Traditional ARM features gone
• No conditional execution of most instructions
– No equivalent of T32 IT instruction
• No “free shifts" in arithmetic instructions
– Immediate shifts only
– No RRX shift, no ROR shift for ADD/SUB
• No open access to PC register
• No co-processor concept
– Now provide system instructions
• No load/store multiple instructions
– LDM, STM, PUSH, POP
4
Traditional ARM features still here
• Floating point support is now mandatory
• VFP -mostly same
• AdvSIMD is based on NEON but with major changes
• Weakly ordered memory
• Basic arithmetic instructions usually same
5
New features
• Load-acquire and store-release atomics
• Crypto (AES and SHA) instructions
• AdvSIMD usable for general purpose float math
• Larger PC-relative addressing and branching
• Literal pool access and most conditional branches are extended to ±
1MB, unconditional branches and calls to ±128MB
• Non-temporal (cache skipping) load/store
• Load/store of a non-contiguous pair of registers
6
NEW FEATURES DETAILS
7
Advanced SIMD
• Not covered in this slide
8
Registers
• 64 Bit integer registers:
– X0 ~ X29, X30/LR, SP/ZERO
• Only register with special semantics is 31, which acts as both stack
pointer and a zero register
– Zero register
• When used as a source register, and discards the result when used as
destination register
– Stack pointer
• When used as a load/store base register
• Some arithmetic instructions
• X30/LR for procedure call link register is unbanked, exception save
restart PC to the target exception level’s ELR system register
9
Registers (cont)
• Bottom 32 bits of the registers are referred as W0 .. W30
• Benefits
– Easier to do 64-bit arithmetic!
– Less need to spill to the stack
– Spare registers to keep more temporaries
10
Structure Layout
11
struct foo {
int32_t a;
void* p;
int32_t x;
};
32-bit 64-bit 64-bit
struct foo {
void* p;
int32_t a;
int32_t x;
};
Data models
• ARM targeted two data models for the 64-bit mode, to address the
key OS partners
– The first is LP64, where integers are 32-bit, and long integers are 64-bit, which is
used by Linux, most UNIXes and OS X
– The other is LLP64, where integers and long integers are 32-bit, while long long
integers are 64-bit, and favored by Microsoft Windows
• -mabi=name
– Generate code for the specified data model.
– Permissible values are ‘ilp32’ for SysV-like data model where int, long int and
pointer are 32-bit, and ‘lp64’ for SysV-like data model where int is 32-bit, but long
int and pointer are 64-bit.
– The default depends on the specific target configuration. Note that the LP64 and
ILP32 ABIs are not link-compatible; you must compile your entire program with the
same ABI, and link with a compatible set of libraries.
12
Reference
http://www.unix.org/version2/whatsnew/lp64_wp.html
http://www.realworldtech.com/arm64/2/
http://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
Data models (cont)
13
struct foo {
int a;
long l;
int x;
};
Reference
http://www.linaro.org/assets/common/campus-party-
presentation-Sept_2013.pdf
Banked registers
• AArch64 Banked registers are banked by exception level
• Used for exception return information and stack pointer
• EL0 Stack Pointer can be used by higher exception levels after
exception taken
14
Exception model
• 4 exception levels: EL3-EL0
– Forms a privilege hierarchy, EL0 the least privileged
• Exceptions can be taken to the same or a higher exception level
15
Conditional instructions
• Instructions are unconditionally executed but use the condition flags
as an extra input to the instruction
– Conditional branch
• CBZ, B.cond
– Add/subtract with carry
• ADC, SBC
– Conditional compare
• CCMP
– Conditional select/set with increment, negate or invert
• Benchmarking reveals these to be the highest frequency used of single
conditional instructions
• CSEL, CSET
16
Immediate shifts for ADD/SUB
• In ARMv7
• In ARMv8
17
Addressing features
• VA address space has a maximum address width of 48 bits, gives a maximum VA space of 256TB,
with VA range of 0x0000_0000_0000_0000 to 0x0000_FFFF_FFFF_FFFF
• For the EL1&0 translation stage the VA range is split into two subranges, one at the bottom of the
full 64-bit address range of the PC, and one at the top, as follows:
– The bottom VA range runs up from address 0x0000_0000_0000_0000. With the maximum
address width of 48 bits this gives a VA range of 0x0000_0000_0000_0000 to
0x0000_FFFF_FFFF_FFFF
– The top VA subrange runs up to address 0xFFFF_FFFF_FFFF_FFFF. With the maximum
address width of 48 bits this gives a VA range of 0xFFFF_0000_0000_0000 to
0xFFFF_FFFF_FFFF_FFFF
18
Addressing features (cont)
• Register indexed addressing
– Allowing a 64-bit index register to be added to 64-bit base register
– Providing sign or zero extension of 32-bit value within an index register
• PC relative addressing
– PC-relative literal loads have an offset range of ±1MB. This permits fewer literal
pools, and more sharing of literal data between functions – reducing I-cache and
TLB pollution
– Most conditional branches have a range of ±1MiB, expected to be sufficient for
the majority of conditional branches which take place within a single function
– Unconditional branches, including branch and link, have a range of ±128MiB.
Expected to be sufficient to span the static code segment of most executable load
modules and shared objects, without needing linker-inserted trampolines or
“veneers”
– PC-relative load/store and address generation with a range of ±4GiB may be
performed inline using only two instructions, i.e. without the need to load an offset
from a literal pool
19
An example for global variable access
20
extern int gVar;
int main(void)
{
return gVar;
}
.arch armv7-a
.text
.align 2
.global main
.type main, %function
main:
movw r3, #:lower16:gVar
movt r3, #:upper16:gVar
ldr r0, [r3, #0]
bx lr
.arch armv5te
.text
.align 2
.global main
.type main, %function
main:
ldr r3, .L3
ldr r0, [r3]
bx lr
.L4:
.align 2
.L3:
.word gVar
.arch armv8-a+fp+simd
.section .text.startup
.align 2
.global main
.type main, %function
main:
adrp x0, gVar
ldr w0, [x0,#:lo12:gVar]
ret
arm-marvell-eabi-gcc -S -O2 -march=armv5te global.c
arm-marvell-eabi-gcc -S -O2 -march=armv7-a global.c
aarch64-marvell-elf-gcc -S -O2 -march=armv8-a global.c
Address Generation
• ADRP Xd, label
– Address of Page
– Sign extends a 21-bit offset, shifts it left by 12 and adds it to the value of the PC
with its bottom 12 bits cleared, writing the result to register Xd
– This computes the base address of the 4KB aligned memory region containing
label, and is designed to be used in conjunction with a load, store or ADD
instruction which supplies the bottom 12 bits of the label’s address
– This permits position-independent addressing of any location within ±4GB of the
PC using two instructions, providing that dynamic relocation is done with a
minimum granularity of 4KB
– The term “page” is short-hand for the 4KB relocation granule, and is not
necessarily related to the virtual memory page size
21
Address Generation (cont)
• ADR Xd, label
– Address
– Adds a 21-bit signed byte offset to the program counter, writing the result to
register Xd
– Used to compute the effective address of any location within ±1MiB of the PC
22
The program counter (PC)
• Cannot be used in arithmetic and load/store instructions
• Instructions that implicitly read PC
– PC relative address compute instructions
• ADR, ADRP, literal load, direct branch
• Its value is the address of the instruction, there is no implied offset of 4 or 8
bytes
– Branch-and-link instructions
• BL, BLR, will store PC to link register
• Instructions to implicitly modify PC
– Explicit control flow instructions
• [Un]conditional branch, exception generation, exception return instructions
23
Memory Load-Store
• Bulk transfers
– LDM, STM, PUSH, POP do not exist in Aarch64
– LDP, STP that load and store a pair of independent registers from consecutive
memory locations, which support unaligned addresses when accessing normal
memory
– LDNP, STNP provide a streaming or non-temporal hint that data does not need to
be retained in caches
• A special exception to the normal memory ordering rules, where an address dependency
exists between two memory reads and the second read was generated by a LDNP then, in
the absence of any other barrier mechanism to achieve order, those memory accesses can
be observed in any order by other observers within the shareability domain of the memory
addresses being accessed.
24
Memory Load-Store (cont)
• Exclusive accesses
– LDXR, LDXP, STXR, STXP
– Exclusive access to a pair of double words permit atomic updates of a pair of
pointers
– Must be naturally aligned, exclusive pair access must be aligned to twice the data
size
• Load-acquire, store-release
– LDAR, STLR, LDAXR, STLXR
– Explicitly synchronizing load and store instructions (release-consistency memory
model)
– Reducing the need for explicit memory barriers
– Require natural address alignment
25
Memory Load-Store (cont)
• Prefetch Memory
– Support following addressing modes:
• Base plus a scaled 12-bit unsigned immediate offset or base plus an unscaled 9-bit signed
immediate offset
• Base plus a 64-bit register offset. This can be optionally scaled by 8-bits, for example
LSL#3.
• Base plus a 32-bit extended register offset. This can be optionally scaled by 8-bits.
• PC-relative literal.
– PRFM <prfop>, addr | label
• <prfop> is defined as <type><target><policy>
• <type>: PLD (prefetch for load), PST (prefetch for store), PLI (preload instructions)
• <target>: L1 (level 1 cache), L2 (level 2 cache), L3 (level 3 cache)
• <policy>
– KEEP: Retained or temporal prefetch, allocated in the cache normally
– STRM: Streaming or non-temporal prefetch, for data that is used only once
– PLDL1KEEP, PSTL2STRM, PLIL3KEEP
26
Floating Point
• There is no “soft-float” variant of the AARCH64 Procedure
Calling Standard
• The deprecated small vector feature of VFP is removed
• Load/store addressing modes are identical to integer
load/store
• FCSEL/FCCMP equivalent to integer CSEL/CCMP
instructions
– Set integer condition flags directly, not modify FPSR
• All floating-point multiply-add and multiply-sub
instructions are “fused”
27
Scalar/SIMD Registers
• SIMD and Scalar share register bank
– 32 bit float registers: S0 ... S31
– 64 bit double registers: D0 ... D31
– 128 bit SIMD registers: V0 ... V31
• S0 is bottom 32 bits of D0 which is the bottom 64 bits of
V0
28
System instructions
• Exception generating instructions
– SVC, HVC, SMC, ERET
– BRK, HLT, DCPSn, CRPS
• System register access
– No access to CPSR as a single register, but with system instruction
– MRS, MSR
• System management
– Cache and TLB maintenance, address translation
• Architectural hints
– NOP, WFE, WFI, SEV
• Barriers and CLREX
– DMB, DSB, ISB, CLREX
29
Weakly ordered memory model
• With ARM MP systems, the thread using programmer will also have
to deal with weak memory model
• Unlike on X86, but like Aarch32 and PowerPC, order of writes to
memory isn't guaranteed. Deal with it:
– use mutexes!
– barrier instructions DMB, DSB, ISB
– ARMv8: Load-Acquire/Store-Release instructions: LDRA, STRL
30
GNU/LINUX PORTING ISSUES
31
Good News
• Most typical C/C++ OSS software compiles just fine - except:
– when code assumes endianness or struct sizes
– or calls kernel system call directly
– or has assembler code or a JIT
– or uses autoconf ^_^
32
Most common porting problem
– checking build system type... x86_64-pc-linux-gnu
– checking host system type... Invalid configuration `aarch64-oe-
linux': machine `aarch64-oe' not recognized
– configure: error: /bin/sh config.sub aarch64-oe-linux failed
• Please run autoreconf against autotools-dev 20120210.1 or later, and
make a release of your software.
33
Available defines
• aarch64-oe-linux-cpp -dM -E - < /dev/null|sort
• ...
• #define __aarch64__ 1
• #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
• #define __CHAR_UNSIGNED__ 1
• #define __SIZEOF_POINTER__ 8
– ... but this is gcc specific!
34
Test features, not platform
• Works but not portable
– #if defined (__alpha__) || defined(__aarch64__)
– // assume 64-bit pointers
– #elif ...
• Instead
– #if __SIZEOF_POINTER__ == 8
– // assume 64-bit pointers
– #elif ...
35
Aarch64 call convention
• Arguments and return values in registers
– X0 - X7 arguments and return value
– X8 indirect result (struct) location
– X9 - X15 temporary registers
– X16 - X17 intra-call-use registers (PLT, linker)
– X18 platform specific use (TLS)
– X19 - X28 callee-saved registers
– X29 frame pointer
– X30 link register
– SP stack pointer (XZR)
36
Reference
IHI0055B_aapcs64.pdf
Aarch64 call convention floats
• VFP/SIMD mandatory - no soft float ABI
– V0 - V7 arguments and return value
– D8 - D15 callee saved registers
– V16 - V31 temporary registers
• Bits 64:128 not saved on V8-V15
37
Reference
IHI0055B_aapcs64.pdf
System calls
• Since the architectures are new, some legacy support has been
removed
– linux-3.10.18/include/uapi/asm-generic/unistd.h
38
System calls
–alarm -> ualarm
–epoll_wait -> epoll_pwait
–futimesat -> utimensat
–getpgrp -> getpgid
–pause -> ?
–recv -> recvfrom
–send -> sendto
–time -> ?
–ustat -> statfs
39
– bdflush -> gone!
– fork -> clone
– getdents -> getdents64
– oldumount -> umount
– poll -> ppoll
– select -> pselect6
– sysctl -> use /proc/sys
– uselib -> gone!
– utime -> utimes
• Deprecated system calls are not available:
System calls
• Pre-at system calls are not available:
40
– open -> openat
– unlink -> unlinkat
– chmod -> chmodat
– mkdir -> mkdirat
– lchown -> lchownat
– rename -> renameat
– symlink -> symlinkat
– link -> linkat
– mknod -> mknodat
– chown -> chownat
– rmdir -> rmdirat
– access -> accessat
– readlink -> readlinkat
– utimes -> utimensat
System calls
• System calls without flags parameter:
41
– pipe -> pipe2
– dup2 -> dup3
– epoll_create -> epoll_create1
– inotify_init -> inotify_init1
– eventfd -> eventfd2
– signalfd -> signalfd4
Reference
• 64-bit ARM - introduction to porting
• ARMv8 Instruction Set Overview
• ARM Architecture Reference Manual - ARMv8, for ARMv8-A
architecture profile
• ARMv8 Technology Preview
42

Weitere ähnliche Inhalte

Was ist angesagt?

x86 architecture
x86 architecturex86 architecture
x86 architecturei i
 
Arm cortex-m3 by-joe_bungo_arm
Arm cortex-m3 by-joe_bungo_armArm cortex-m3 by-joe_bungo_arm
Arm cortex-m3 by-joe_bungo_armPrashant Ahire
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
 
用Raspberry Pi 學Linux I2C Driver
用Raspberry Pi 學Linux I2C Driver用Raspberry Pi 學Linux I2C Driver
用Raspberry Pi 學Linux I2C Driver艾鍗科技
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors ArchitecturesMohammed Hilal
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory ManagementNi Zo-Ma
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...Adrian Huang
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device driversHoucheng Lin
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelKernel TLV
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureRyo Jin
 

Was ist angesagt? (20)

x86 architecture
x86 architecturex86 architecture
x86 architecture
 
Arm cortex-m3 by-joe_bungo_arm
Arm cortex-m3 by-joe_bungo_armArm cortex-m3 by-joe_bungo_arm
Arm cortex-m3 by-joe_bungo_arm
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
用Raspberry Pi 學Linux I2C Driver
用Raspberry Pi 學Linux I2C Driver用Raspberry Pi 學Linux I2C Driver
用Raspberry Pi 學Linux I2C Driver
 
Linux dma engine
Linux dma engineLinux dma engine
Linux dma engine
 
Arm Processors Architectures
Arm Processors ArchitecturesArm Processors Architectures
Arm Processors Architectures
 
Embedded Android : System Development - Part II (HAL)
Embedded Android : System Development - Part II (HAL)Embedded Android : System Development - Part II (HAL)
Embedded Android : System Development - Part II (HAL)
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device drivers
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
spinlock.pdf
spinlock.pdfspinlock.pdf
spinlock.pdf
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
Block Drivers
Block DriversBlock Drivers
Block Drivers
 
Embedded C programming session10
Embedded C programming  session10Embedded C programming  session10
Embedded C programming session10
 
Introduction to Linux Drivers
Introduction to Linux DriversIntroduction to Linux Drivers
Introduction to Linux Drivers
 
I2c drivers
I2c driversI2c drivers
I2c drivers
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 

Andere mochten auch

GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64Yi-Hsiu Hsu
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
Q4.11: ARM Technology Update Plenary
Q4.11: ARM Technology Update PlenaryQ4.11: ARM Technology Update Plenary
Q4.11: ARM Technology Update PlenaryLinaro
 
HKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewHKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewLinaro
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3Linaro
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingMerck Hung
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLinaro
 
BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 Linaro
 
Introduction to memory order consume
Introduction to memory order consumeIntroduction to memory order consume
Introduction to memory order consumeYi-Hsiu Hsu
 
Yocto Project introduction
Yocto Project introductionYocto Project introduction
Yocto Project introductionYi-Hsiu Hsu
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART iiLinaro
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerLinaro
 
Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsLinaro
 
LAS16 111 - Raspberry pi3, op-tee and jtag debugging
LAS16 111 - Raspberry pi3, op-tee and jtag debuggingLAS16 111 - Raspberry pi3, op-tee and jtag debugging
LAS16 111 - Raspberry pi3, op-tee and jtag debugging96Boards
 
ARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on AndoidARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on Andoidhidenorly
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideLinaro
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELinaro
 

Andere mochten auch (20)

GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64GCC for ARMv8 Aarch64
GCC for ARMv8 Aarch64
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
Q4.11: ARM Technology Update Plenary
Q4.11: ARM Technology Update PlenaryQ4.11: ARM Technology Update Plenary
Q4.11: ARM Technology Update Plenary
 
HKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overviewHKG15-300: Art's Quick Compiler: An unofficial overview
HKG15-300: Art's Quick Compiler: An unofficial overview
 
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
LAS16-111: Easing Access to ARM TrustZone – OP-TEE and Raspberry Pi 3
 
Arm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefingArm v8 instruction overview android 64 bit briefing
Arm v8 instruction overview android 64 bit briefing
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
 
BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64 BUD17-310: Introducing LLDB for linux on Arm and AArch64
BUD17-310: Introducing LLDB for linux on Arm and AArch64
 
Introduction to memory order consume
Introduction to memory order consumeIntroduction to memory order consume
Introduction to memory order consume
 
Yocto Project introduction
Yocto Project introductionYocto Project introduction
Yocto Project introduction
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART ii
 
Q4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-VectorizerQ4.11: Using GCC Auto-Vectorizer
Q4.11: Using GCC Auto-Vectorizer
 
Q4.11: NEON Intrinsics
Q4.11: NEON IntrinsicsQ4.11: NEON Intrinsics
Q4.11: NEON Intrinsics
 
LAS16 111 - Raspberry pi3, op-tee and jtag debugging
LAS16 111 - Raspberry pi3, op-tee and jtag debuggingLAS16 111 - Raspberry pi3, op-tee and jtag debugging
LAS16 111 - Raspberry pi3, op-tee and jtag debugging
 
ARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on AndoidARM Cortex-A53 Errata on Andoid
ARM Cortex-A53 Errata on Andoid
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEE
 
Introduction to ART (Android Runtime)
Introduction to ART (Android Runtime)Introduction to ART (Android Runtime)
Introduction to ART (Android Runtime)
 

Ähnlich wie Introduction to armv8 aarch64

Ähnlich wie Introduction to armv8 aarch64 (20)

EC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptxEC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptx
 
8051d
8051d8051d
8051d
 
micro chapter 3jjgffffyeyhhuyerfftfgggffgjj
micro chapter 3jjgffffyeyhhuyerfftfgggffgjjmicro chapter 3jjgffffyeyhhuyerfftfgggffgjj
micro chapter 3jjgffffyeyhhuyerfftfgggffgjj
 
ARM Processors
ARM ProcessorsARM Processors
ARM Processors
 
Microprocessor
MicroprocessorMicroprocessor
Microprocessor
 
Arm11
Arm11Arm11
Arm11
 
Blackfin Processor Core Architecture Part 2
Blackfin Processor Core Architecture Part 2Blackfin Processor Core Architecture Part 2
Blackfin Processor Core Architecture Part 2
 
MPU Chp2.pptx
MPU Chp2.pptxMPU Chp2.pptx
MPU Chp2.pptx
 
Introduction to 8086 microprocessor
Introduction to 8086 microprocessorIntroduction to 8086 microprocessor
Introduction to 8086 microprocessor
 
Introduction to 80386 microprocessor
Introduction to 80386 microprocessorIntroduction to 80386 microprocessor
Introduction to 80386 microprocessor
 
ARM_2.ppt
ARM_2.pptARM_2.ppt
ARM_2.ppt
 
It322 intro 1
It322 intro 1It322 intro 1
It322 intro 1
 
Arm7 document
Arm7  documentArm7  document
Arm7 document
 
Lecture9
Lecture9Lecture9
Lecture9
 
ARM Architecture
ARM ArchitectureARM Architecture
ARM Architecture
 
High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)High Bandwidth Memory(HBM)
High Bandwidth Memory(HBM)
 
Archi arm2
Archi arm2Archi arm2
Archi arm2
 
Mips 64
Mips 64Mips 64
Mips 64
 
Introduction to ARM Architecture
Introduction to ARM ArchitectureIntroduction to ARM Architecture
Introduction to ARM Architecture
 
Electronics product design companies in bangalore
Electronics product design companies in bangaloreElectronics product design companies in bangalore
Electronics product design companies in bangalore
 

Kürzlich hochgeladen

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 

Kürzlich hochgeladen (20)

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Introduction to armv8 aarch64

  • 1. Introduction to ARMv8 Aarch64 2014 issue.hsu@gmail.com
  • 2. What is Aarch64? • 64 Bit Instruction set introduced in ARMv8 2
  • 3. Overview • 64-Bit pointer and registers • Fixed length (32bit) instructions • Load/store architecture • Little endian (big endian possible) • 31 general purpose registers and zero register • Unaligned access ok – Except of exclusive and ordered accesses 3
  • 4. Traditional ARM features gone • No conditional execution of most instructions – No equivalent of T32 IT instruction • No “free shifts" in arithmetic instructions – Immediate shifts only – No RRX shift, no ROR shift for ADD/SUB • No open access to PC register • No co-processor concept – Now provide system instructions • No load/store multiple instructions – LDM, STM, PUSH, POP 4
  • 5. Traditional ARM features still here • Floating point support is now mandatory • VFP -mostly same • AdvSIMD is based on NEON but with major changes • Weakly ordered memory • Basic arithmetic instructions usually same 5
  • 6. New features • Load-acquire and store-release atomics • Crypto (AES and SHA) instructions • AdvSIMD usable for general purpose float math • Larger PC-relative addressing and branching • Literal pool access and most conditional branches are extended to ± 1MB, unconditional branches and calls to ±128MB • Non-temporal (cache skipping) load/store • Load/store of a non-contiguous pair of registers 6
  • 8. Advanced SIMD • Not covered in this slide 8
  • 9. Registers • 64 Bit integer registers: – X0 ~ X29, X30/LR, SP/ZERO • Only register with special semantics is 31, which acts as both stack pointer and a zero register – Zero register • When used as a source register, and discards the result when used as destination register – Stack pointer • When used as a load/store base register • Some arithmetic instructions • X30/LR for procedure call link register is unbanked, exception save restart PC to the target exception level’s ELR system register 9
  • 10. Registers (cont) • Bottom 32 bits of the registers are referred as W0 .. W30 • Benefits – Easier to do 64-bit arithmetic! – Less need to spill to the stack – Spare registers to keep more temporaries 10
  • 11. Structure Layout 11 struct foo { int32_t a; void* p; int32_t x; }; 32-bit 64-bit 64-bit struct foo { void* p; int32_t a; int32_t x; };
  • 12. Data models • ARM targeted two data models for the 64-bit mode, to address the key OS partners – The first is LP64, where integers are 32-bit, and long integers are 64-bit, which is used by Linux, most UNIXes and OS X – The other is LLP64, where integers and long integers are 32-bit, while long long integers are 64-bit, and favored by Microsoft Windows • -mabi=name – Generate code for the specified data model. – Permissible values are ‘ilp32’ for SysV-like data model where int, long int and pointer are 32-bit, and ‘lp64’ for SysV-like data model where int is 32-bit, but long int and pointer are 64-bit. – The default depends on the specific target configuration. Note that the LP64 and ILP32 ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries. 12 Reference http://www.unix.org/version2/whatsnew/lp64_wp.html http://www.realworldtech.com/arm64/2/ http://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
  • 13. Data models (cont) 13 struct foo { int a; long l; int x; }; Reference http://www.linaro.org/assets/common/campus-party- presentation-Sept_2013.pdf
  • 14. Banked registers • AArch64 Banked registers are banked by exception level • Used for exception return information and stack pointer • EL0 Stack Pointer can be used by higher exception levels after exception taken 14
  • 15. Exception model • 4 exception levels: EL3-EL0 – Forms a privilege hierarchy, EL0 the least privileged • Exceptions can be taken to the same or a higher exception level 15
  • 16. Conditional instructions • Instructions are unconditionally executed but use the condition flags as an extra input to the instruction – Conditional branch • CBZ, B.cond – Add/subtract with carry • ADC, SBC – Conditional compare • CCMP – Conditional select/set with increment, negate or invert • Benchmarking reveals these to be the highest frequency used of single conditional instructions • CSEL, CSET 16
  • 17. Immediate shifts for ADD/SUB • In ARMv7 • In ARMv8 17
  • 18. Addressing features • VA address space has a maximum address width of 48 bits, gives a maximum VA space of 256TB, with VA range of 0x0000_0000_0000_0000 to 0x0000_FFFF_FFFF_FFFF • For the EL1&0 translation stage the VA range is split into two subranges, one at the bottom of the full 64-bit address range of the PC, and one at the top, as follows: – The bottom VA range runs up from address 0x0000_0000_0000_0000. With the maximum address width of 48 bits this gives a VA range of 0x0000_0000_0000_0000 to 0x0000_FFFF_FFFF_FFFF – The top VA subrange runs up to address 0xFFFF_FFFF_FFFF_FFFF. With the maximum address width of 48 bits this gives a VA range of 0xFFFF_0000_0000_0000 to 0xFFFF_FFFF_FFFF_FFFF 18
  • 19. Addressing features (cont) • Register indexed addressing – Allowing a 64-bit index register to be added to 64-bit base register – Providing sign or zero extension of 32-bit value within an index register • PC relative addressing – PC-relative literal loads have an offset range of ±1MB. This permits fewer literal pools, and more sharing of literal data between functions – reducing I-cache and TLB pollution – Most conditional branches have a range of ±1MiB, expected to be sufficient for the majority of conditional branches which take place within a single function – Unconditional branches, including branch and link, have a range of ±128MiB. Expected to be sufficient to span the static code segment of most executable load modules and shared objects, without needing linker-inserted trampolines or “veneers” – PC-relative load/store and address generation with a range of ±4GiB may be performed inline using only two instructions, i.e. without the need to load an offset from a literal pool 19
  • 20. An example for global variable access 20 extern int gVar; int main(void) { return gVar; } .arch armv7-a .text .align 2 .global main .type main, %function main: movw r3, #:lower16:gVar movt r3, #:upper16:gVar ldr r0, [r3, #0] bx lr .arch armv5te .text .align 2 .global main .type main, %function main: ldr r3, .L3 ldr r0, [r3] bx lr .L4: .align 2 .L3: .word gVar .arch armv8-a+fp+simd .section .text.startup .align 2 .global main .type main, %function main: adrp x0, gVar ldr w0, [x0,#:lo12:gVar] ret arm-marvell-eabi-gcc -S -O2 -march=armv5te global.c arm-marvell-eabi-gcc -S -O2 -march=armv7-a global.c aarch64-marvell-elf-gcc -S -O2 -march=armv8-a global.c
  • 21. Address Generation • ADRP Xd, label – Address of Page – Sign extends a 21-bit offset, shifts it left by 12 and adds it to the value of the PC with its bottom 12 bits cleared, writing the result to register Xd – This computes the base address of the 4KB aligned memory region containing label, and is designed to be used in conjunction with a load, store or ADD instruction which supplies the bottom 12 bits of the label’s address – This permits position-independent addressing of any location within ±4GB of the PC using two instructions, providing that dynamic relocation is done with a minimum granularity of 4KB – The term “page” is short-hand for the 4KB relocation granule, and is not necessarily related to the virtual memory page size 21
  • 22. Address Generation (cont) • ADR Xd, label – Address – Adds a 21-bit signed byte offset to the program counter, writing the result to register Xd – Used to compute the effective address of any location within ±1MiB of the PC 22
  • 23. The program counter (PC) • Cannot be used in arithmetic and load/store instructions • Instructions that implicitly read PC – PC relative address compute instructions • ADR, ADRP, literal load, direct branch • Its value is the address of the instruction, there is no implied offset of 4 or 8 bytes – Branch-and-link instructions • BL, BLR, will store PC to link register • Instructions to implicitly modify PC – Explicit control flow instructions • [Un]conditional branch, exception generation, exception return instructions 23
  • 24. Memory Load-Store • Bulk transfers – LDM, STM, PUSH, POP do not exist in Aarch64 – LDP, STP that load and store a pair of independent registers from consecutive memory locations, which support unaligned addresses when accessing normal memory – LDNP, STNP provide a streaming or non-temporal hint that data does not need to be retained in caches • A special exception to the normal memory ordering rules, where an address dependency exists between two memory reads and the second read was generated by a LDNP then, in the absence of any other barrier mechanism to achieve order, those memory accesses can be observed in any order by other observers within the shareability domain of the memory addresses being accessed. 24
  • 25. Memory Load-Store (cont) • Exclusive accesses – LDXR, LDXP, STXR, STXP – Exclusive access to a pair of double words permit atomic updates of a pair of pointers – Must be naturally aligned, exclusive pair access must be aligned to twice the data size • Load-acquire, store-release – LDAR, STLR, LDAXR, STLXR – Explicitly synchronizing load and store instructions (release-consistency memory model) – Reducing the need for explicit memory barriers – Require natural address alignment 25
  • 26. Memory Load-Store (cont) • Prefetch Memory – Support following addressing modes: • Base plus a scaled 12-bit unsigned immediate offset or base plus an unscaled 9-bit signed immediate offset • Base plus a 64-bit register offset. This can be optionally scaled by 8-bits, for example LSL#3. • Base plus a 32-bit extended register offset. This can be optionally scaled by 8-bits. • PC-relative literal. – PRFM <prfop>, addr | label • <prfop> is defined as <type><target><policy> • <type>: PLD (prefetch for load), PST (prefetch for store), PLI (preload instructions) • <target>: L1 (level 1 cache), L2 (level 2 cache), L3 (level 3 cache) • <policy> – KEEP: Retained or temporal prefetch, allocated in the cache normally – STRM: Streaming or non-temporal prefetch, for data that is used only once – PLDL1KEEP, PSTL2STRM, PLIL3KEEP 26
  • 27. Floating Point • There is no “soft-float” variant of the AARCH64 Procedure Calling Standard • The deprecated small vector feature of VFP is removed • Load/store addressing modes are identical to integer load/store • FCSEL/FCCMP equivalent to integer CSEL/CCMP instructions – Set integer condition flags directly, not modify FPSR • All floating-point multiply-add and multiply-sub instructions are “fused” 27
  • 28. Scalar/SIMD Registers • SIMD and Scalar share register bank – 32 bit float registers: S0 ... S31 – 64 bit double registers: D0 ... D31 – 128 bit SIMD registers: V0 ... V31 • S0 is bottom 32 bits of D0 which is the bottom 64 bits of V0 28
  • 29. System instructions • Exception generating instructions – SVC, HVC, SMC, ERET – BRK, HLT, DCPSn, CRPS • System register access – No access to CPSR as a single register, but with system instruction – MRS, MSR • System management – Cache and TLB maintenance, address translation • Architectural hints – NOP, WFE, WFI, SEV • Barriers and CLREX – DMB, DSB, ISB, CLREX 29
  • 30. Weakly ordered memory model • With ARM MP systems, the thread using programmer will also have to deal with weak memory model • Unlike on X86, but like Aarch32 and PowerPC, order of writes to memory isn't guaranteed. Deal with it: – use mutexes! – barrier instructions DMB, DSB, ISB – ARMv8: Load-Acquire/Store-Release instructions: LDRA, STRL 30
  • 32. Good News • Most typical C/C++ OSS software compiles just fine - except: – when code assumes endianness or struct sizes – or calls kernel system call directly – or has assembler code or a JIT – or uses autoconf ^_^ 32
  • 33. Most common porting problem – checking build system type... x86_64-pc-linux-gnu – checking host system type... Invalid configuration `aarch64-oe- linux': machine `aarch64-oe' not recognized – configure: error: /bin/sh config.sub aarch64-oe-linux failed • Please run autoreconf against autotools-dev 20120210.1 or later, and make a release of your software. 33
  • 34. Available defines • aarch64-oe-linux-cpp -dM -E - < /dev/null|sort • ... • #define __aarch64__ 1 • #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ • #define __CHAR_UNSIGNED__ 1 • #define __SIZEOF_POINTER__ 8 – ... but this is gcc specific! 34
  • 35. Test features, not platform • Works but not portable – #if defined (__alpha__) || defined(__aarch64__) – // assume 64-bit pointers – #elif ... • Instead – #if __SIZEOF_POINTER__ == 8 – // assume 64-bit pointers – #elif ... 35
  • 36. Aarch64 call convention • Arguments and return values in registers – X0 - X7 arguments and return value – X8 indirect result (struct) location – X9 - X15 temporary registers – X16 - X17 intra-call-use registers (PLT, linker) – X18 platform specific use (TLS) – X19 - X28 callee-saved registers – X29 frame pointer – X30 link register – SP stack pointer (XZR) 36 Reference IHI0055B_aapcs64.pdf
  • 37. Aarch64 call convention floats • VFP/SIMD mandatory - no soft float ABI – V0 - V7 arguments and return value – D8 - D15 callee saved registers – V16 - V31 temporary registers • Bits 64:128 not saved on V8-V15 37 Reference IHI0055B_aapcs64.pdf
  • 38. System calls • Since the architectures are new, some legacy support has been removed – linux-3.10.18/include/uapi/asm-generic/unistd.h 38
  • 39. System calls –alarm -> ualarm –epoll_wait -> epoll_pwait –futimesat -> utimensat –getpgrp -> getpgid –pause -> ? –recv -> recvfrom –send -> sendto –time -> ? –ustat -> statfs 39 – bdflush -> gone! – fork -> clone – getdents -> getdents64 – oldumount -> umount – poll -> ppoll – select -> pselect6 – sysctl -> use /proc/sys – uselib -> gone! – utime -> utimes • Deprecated system calls are not available:
  • 40. System calls • Pre-at system calls are not available: 40 – open -> openat – unlink -> unlinkat – chmod -> chmodat – mkdir -> mkdirat – lchown -> lchownat – rename -> renameat – symlink -> symlinkat – link -> linkat – mknod -> mknodat – chown -> chownat – rmdir -> rmdirat – access -> accessat – readlink -> readlinkat – utimes -> utimensat
  • 41. System calls • System calls without flags parameter: 41 – pipe -> pipe2 – dup2 -> dup3 – epoll_create -> epoll_create1 – inotify_init -> inotify_init1 – eventfd -> eventfd2 – signalfd -> signalfd4
  • 42. Reference • 64-bit ARM - introduction to porting • ARMv8 Instruction Set Overview • ARM Architecture Reference Manual - ARMv8, for ARMv8-A architecture profile • ARMv8 Technology Preview 42

Hinweis der Redaktion

  1. Address dependency An address dependency exists when the value returned by a read access is used to compute the address of a subsequent read or write access. The address dependency exists even if the value read by the first read access does not change the address of the second read or write access.
  2. SVC: gen exception target at EL1 HVC: gen exception target at EL2 SMC: gen exception target at EL3 DCPSn: debug change processor state to ELn CRPS:debug restore processor state WFE: wait for event WFI: wait for interrupt SEV: send event