2. von Neumann/Princeton Architecture
Memory holds both data and instructions.
Central processing unit (CPU) fetches instructions from memory.
Separate CPU and memory distinguishes stored-program computer.
CPU registers: program counter (PC), instruction register (IR), general-
purpose registers, etc.
von Neumann machines also known as stored-program computers
Named after John von Neumann who wrote First Draft of a report on the
EDVAC (Electronic Discrete Variable Automatic Computer), 1952
2 Processor Architecure and ARM
3. CPU + Memory
address
200
PC
memory data
CPU
200 ADD r5,r1,r3 ADD IR
r5,r1,r3
3 Processor Architecure and ARM
4. Harvard Architecture
address
data memory
data PC
CPU
address
program memory data
4 Processor Architecure and ARM
5. Princeton Arch. vs. Harvard Arch.
From programmer’s perspective, general purpose computers appear to be
Princeton machines
However, modern high-performance CPUs are, at their heart, frequently
designed in Harvard architecture, with added hardware outside the CPU to
create the appearance of a Princeton design.
Harvard can’t use self-modifying code.
Harvard allows two simultaneous memory fetches.
Most DSPs use Harvard architecture for streaming data:
greater memory bandwidth;
more predictable bandwidth.
5 Processor Architecure and ARM
6. Instruction Set Architecture (ISA)
Instruction set architecture (ISA)
interface between hardware and software
Express operations visible to the programmer or compiler
writer
Portion of the computer visible to software
ISA is supported by:
Organization of programmable storage
Data type &data structures: encodings & representations
Addressing modes for data and instructions
Instruction formats
Instruction/opcode set
Exceptional conditions
6 Processor Architecure and ARM
7. Application Considerations in ISA Design
Desktops
Emphasis on performance with integer and floating-point (FP) data types. Little
regard for program size or power consumption
Servers
Primarily used for databases, file servers, web applications & multi-user time-
sharing. Performance on integers & character strings is important. However, FP
instructions are virtually in every server processor
Embedded systems
Emphasis on low cost and low power small code size. Some instruction types,
eg., FP, may be optional to reduce chip costs.
7 Processor Architecure and ARM
8. Processor Internal Storage vs. ISA
Internal storage type serves for the most basic differentiation
Classes of ISAs based on operand storage type
Stack architecture: operands implicitly on top of stack
Accumulator architecture: 1 operand implicitly in accumulator
General-purpose register (GPR) architecture: only explicit operands
Extended accumulator or special-purpose register architecture:
restrictions on using special registers
8 Processor Architecure and ARM
9. General-Purpose Register Architecture
Three possible choices for GPR arch.
Register-memory architecture: memory access can be part of
any instruction (memory operands)
Register-register (load-store) architecture: only load & store
instructions can access memory
Almost all designs after 1980
Memory-memory architecture: all operands in memory
Not normally available nowadays
9 Processor Architecure and ARM
10. ISA Classification
Based on internal storage in processor
(a) Stack (b) Accumulator (c) Register-memory (d) Register-register
Processor
TOS
ALU ALU ALU ALU
Mem
10 Processor Architecure and ARM
11. Comparison of ISAs
Code sequences for “C=A+B”
Stack Accumulator Register-memory Register-register
Push A Load A Load R1, A Load R1, A
Push B Add B Add R3, R1, B Load R2, B
Add Store C Store R3, C Add R3, R1, R2
Pop C Store R3, C
Implicit operands in Stack/Accumulator arch.
Less flexibility of execution order in Stack arch.
11 Processor Architecure and ARM
12. Instruction Set Complexity
Depends on:
Number of instructions
Instruction formats
Data formats
Addressing modes
General-purpose registers (number and size)
Flow-control mechanisms (conditionals, exceptions)
Instruction set characteristics
Fixed vs. variable length.
Addressing modes.
Number of operands.
Types of operands.
12 Processor Architecure and ARM
13. CISC vs. RISC
Complex instruction set computer (CISC):
many addressing modes;
can directly operate on operands in memory
many operations.
variable instruction length
Examples: Intel x86 microprocessors and compatibles
Reduced instruction set computer (RISC):
load/store;
operands in memory must be first loaded into register before any
operation
fixed instruction length (in general)
pipelinable instructions.
examples: ARM, MIPS, Sun Sparc, PowerPC, …
13 Processor Architecure and ARM
14. Exploit ILP: Superscalar vs. VLIW
RISC pipeline executes one instruction per clock cycle (usually).
Based on complex hardware design: superscalar machines issue/execute
multiple instructions per clock cycle.
Faster execution.
More variability in execution times.
More expensive CPU.
VLIW machines rely on sophisticated compiler to identify ILP and statically
schedule parallel instructions
14 Processor Architecure and ARM
15. Finding Parallelism
Independent operations can be performed in parallel:
ADD r0, r0, r1
ADD r2, r2, r3
ADD r6, r4, r0 r0 r1 r2 r3
+ +
Register renaming:
ADD r10, r0, r1 r4 r2
ADD r11, r2, r3 r0
ADD r12, r4, r10 +
r6
15 Processor Architecure and ARM
16. Order of Execution
In-order:
Instructions are issued/executed in the program order
Machine stops issuing instructions when the next instruction
can’t be dispatched.
Out-of-order:
Instructions are eligible for issue/execution once source
operands become available
Machine will change order of instructions to keep dispatching.
Substantially faster but also more complex.
16 Processor Architecure and ARM
17. What is VLIW?
VLIW: very long instruction word
A VLIW instruction consists of several operations to be
executed in parallel
Parallel function units with shared register file:
register file
function function function ... function
unit unit unit unit
instruction decode and memory
17 Processor Architecure and ARM
18. VLIW Cluster
Organized into clusters to accommodate available
register bandwidth:
cluster cluster ... cluster
18 Processor Architecure and ARM
19. VLIW and Compilers
VLIW requires considerably more sophisticated compiler
technology than traditional architectures---must be able
to extract parallelism to keep the instructions full.
Many VLIWs have good compiler support.
Contemporary VLIW processors
TriMedia media processors by NXP (formerly Philips
Semiconductors),
SHARC DSP by Analog Devices,
C6000 DSP family by Texas Instruments, and
STMicroelectronics ST200 family based on the Lx architecture.
19 Processor Architecure and ARM
20. Static Scheduling
a b e f a b e
c g f c nop
d d g nop
expressions instructions
20 Processor Architecure and ARM
21. Limits in VLIW
VLIW (at least the original forms) has several short-
comings that precluded it from becoming mainstream:
VLIW instruction sets are not backward compatible between
implementations. As wider implementations (more execution
units) are built, the instruction set for the wider machines is
not backward compatible with older, narrower
implementations.
Load responses from a memory hierarchy which includes CPU
caches and DRAM do not give a deterministic delay of when
the load response returns to the processor. This makes static
scheduling of load instructions by the compiler very difficult.
21 Processor Architecure and ARM
22. EPIC
EPIC = Explicitly parallel instruction computing.
Used in Intel/HP Merced (IA-64) machine.
Incorporates several features to allow machine to find,
exploit increased parallelism.
Each group of multiple software instructions is called a bundle.
Each of the bundles has information indicating if this set of
operations is depended upon by the subsequent bundle.
A speculative load instruction is used as a type of data prefetch.
A check load instruction also aids speculative loads by checking
that a load was not dependent on a previous store.
22 Processor Architecure and ARM
23. IA-64 Instruction Format
Instructions are bundled with tag to indicate which
instructions can be executed in parallel:
128 bits
tag instruction 1 instruction 2 instruction 3
23 Processor Architecure and ARM
24. Assembly Language
One-to-one with instructions (more or less).
Basic features:
One instruction per line.
Labels provide names for addresses (usually in first column).
Instructions often start in later columns.
Columns run to end of line.
24 Processor Architecure and ARM
25. ARM Instruction Set
ARM versions.
ARM assembly language.
ARM programming model.
ARM data operations.
ARM flow of control.
25 Processor Architecure and ARM
26. ARM Versions
ARM architecture has been extended over several
versions.
Latest version: ARM11
We will concentrate on ARM7.
26 Processor Architecure and ARM
27. ARM Assembly Language Example
Fairly standard assembly language:
label1 ADR r4,c
LDR r0,[r4] ; a comment
ADR r4,d
LDR r1,[r4]
SUB r0,r0,r1 ; comment
destination
27 Processor Architecure and ARM
28. Pseudo-ops
Some assembler directives don’t correspond directly to
instructions:
Define current address.
Reserve storage.
Constants.
28 Processor Architecure and ARM
29. ARM Instruction Set Format
From ARM710T datasheet
29 Processor Architecure and ARM
30. ARM Data Types
Word is 32 bits long.
Word can be divided into four 8-bit bytes.
ARM addresses can be 32 bits long.
Address refers to byte.
Address 4 starts at byte 4.
Can be configured at power-up as either little- or big-
endian mode.
30 Processor Architecure and ARM
31. Endianness
Endianness: ordering of bytes within a larger object, e.g.,
word, i.e., how a large object is stored in memory
68000 is a BIG Endian processor
Memory
0x00..00
0x00..10
Big Endian Little Endian
0x00..13
3 2 1 0 3 2 1 0
0xffffffff
register register
31 Processor Architecure and ARM
32. ARM Programming Model
r0 r8
r1 r9 0
31
r2 r10
r3 r11 CPSR
r4 r12
r5 r13
r6 r14 NZCV
r7 r15 (PC)
32 Processor Architecure and ARM
33. The Program Status Registers (CPSR and SPSRs)
31 28 8 4 0
N Z CV I F T Mode
Copies of the ALU status flags (latched if the
instruction has the "S" bit set).
* Condition Code Flags * Interrupt Disable bits.
N = Negative result from ALU flag. I = 1, disables the IRQ.
Z = Zero result from ALU flag. F = 1, disables the FIQ.
C = ALU operation Carried out
V = ALU operation oVerflowed * T Bit (Architecture v4T only)
T = 0, Processor in ARM state
* Mode Bits T = 1, Processor in Thumb state
M[4:0] define the processor mode.
33 Processor Architecure and ARM
34. Processor Modes
The ARM has six operating modes:
User (16) (unprivileged mode under which most tasks run)
FIQ (17) (entered when a high priority (fast) interrupt is raised)
IRQ (18) (entered when a low priority (normal) interrupt is
raised)
Supervisor (19) (entered on reset and when a Software
Interrupt instruction is executed)
Abort (23) (used to handle memory access violations)
Undef (27) (used to handle undefined instructions)
ARM Architecture Version 4 adds a seventh mode:
System (31) (privileged mode using the same registers as user
mode)
34 Processor Architecure and ARM
35. Condition Flags
Logical Instruction Arithmetic Instruction
Flag
Negative No meaning Bit 31 of the result has been set
(N=‘1’) Indicates a negative number in
signed operations
Zero Result is all zeroes Result of operation was zero
(Z=‘1’)
Carry After Shift operation Result was greater than 32 bits
(C=‘1’) ‘1’ was left in carry flag
oVerflow No meaning Result was greater than 31 bits
(V=‘1’) Indicates a possible corruption of
the sign bit in signed
numbers
35 Processor Architecure and ARM
36. Conditional Execution
Most instruction sets only allow branches to be executed
conditionally.
However by reusing the condition evaluation hardware, ARM
effectively increases number of instructions.
All instructions contain a condition field which determines whether
the CPU will execute them.
Non-executed instructions soak up 1 cycle.
Still have to complete cycle so as to allow fetching and decoding of
following instructions.
This removes the need for many branches, which stall the
pipeline (3 cycles to refill).
Allows very dense in-line code, without branches.
The Time penalty of not executing several conditional instructions
is frequently less than overhead of the branch
or subroutine call that would otherwise be needed.
36 Processor Architecure and ARM
37. The Condition Field
31 28 24 20 16 12 8 4 0
Cond
0000 = EQ - Z set (equal) 1001 = LS - C clear or Z set
(unsigned lower or same)
0001 = NE - Z clear (not equal)
0010 = HS / CS - C set (unsigned 1010 = GE - N set and V set, or N
higher or same) clear and V clear (>or =)
0011 = LO / CC - C clear 1011 = LT - N set and V clear, or N
(unsigned lower) clear and V set (>)
0100 = MI -N set (negative) 1100 = GT - Z clear, and either N set
0101 = PL - N clear (positive or and V set, or N clear and V set
zero) (>)
0110 = VS - V set (overflow) 1101 = LE - Z set, or N set and V
0111 = VC - V clear (no overflow) clear,or N clear and V set (<,
or =)
1000 = HI - C set and Z clear
(unsigned higher) 1110 = AL - always
1111 = NV - reserved.
37 Processor Architecure and ARM
38. Using and updating the Condition Field
To execute an instruction conditionally, simply postfix it with
the appropriate condition:
For example an add instruction takes the form:
ADD r0,r1,r2 ; r0 = r1 + r2 (ADDAL)
To execute this only if the zero flag is set:
ADDEQ r0,r1,r2 ; If zero flag set then…
; ... r0 = r1 + r2
By default, data processing operations do not affect the
condition flags (apart from the comparisons where this is the
only effect). To cause the condition flags to be updated, the S
bit of the instruction needs to be set by postfixing the
instruction (and any condition code) with an “S”.
For example to add two numbers and set the condition flags:
ADDS r0,r1,r2 ; r0 = r1 + r2
; ... and set flags
38 Processor Architecure and ARM
39. Data processing Instructions
Largest family of ARM instructions, all sharing the same
instruction format.
Contains:
Arithmetic operations
Comparisons (no results - just set condition codes)
Logical operations
Data movement between registers
Remember, this is a load / store architecture
These instruction only work on registers, NOT memory.
They each perform a specific operation on one or two
operands.
First operand always a register - Rn
Second operand sent to the ALU via barrel shifter.
We will examine the barrel shifter shortly.
39 Processor Architecure and ARM
41. Multiplication Instructions
The Basic ARM provides two multiplication instructions.
Multiply
MUL{<cond>}{S} Rd, Rm, Rs ; Rd = Rm * Rs
Multiply Accumulate - does addition for free
MLA{<cond>}{S} Rd, Rm, Rs,Rn ; Rd = (Rm * Rs) + Rn
Restrictions on use:
Rd and Rm cannot be the same register
Can be avoid by swapping Rm and Rs around. This works because
multiplication is commutative.
Cannot use PC.
These will be picked up by the assembler if overlooked.
Operands can be considered signed or unsigned
Up to user to interpret correctly.
41 Processor Architecure and ARM
42. Comparisons
The only effect of the comparisons is to
UPDATE THE CONDITION FLAGS Thus no need to set S bit.
FLAGS.
Operations are:
CMP operand1 - operand2, but result not written
CMN operand1 + operand2, but result not written
TST operand1 AND operand2, but result not written
TEQ operand1 EOR operand2, but result not written
Syntax:
<Operation>{<cond>} Rn, Operand2
Examples:
CMP r0, r1
TSTEQ r2, #5
42 Processor Architecure and ARM
43. Logical Operations
Operations are:
AND operand1 AND operand2
EOR operand1 EOR operand2
ORR operand1 OR operand2
BIC operand1 AND NOT operand2 [ie bit clear]
Syntax:
<Operation>{<cond>}{S} Rd, Rn, Operand2
Examples:
AND r0, r1, r2
BICEQ r2, r3, #7
EORS r1,r3,r0
43 Processor Architecure and ARM
44. Data Movement
Operations are:
MOV operand2
MVN NOT operand2
Note that these make no use of operand1.
Syntax:
<Operation>{<cond>}{S} Rd, Operand2
Examples:
MOV r0, r1
MOVS r2, #10
MVNEQ r1,#0
44 Processor Architecure and ARM
45. The Barrel Shifter
The ARM doesn’t have actual shift instructions.
Instead it has a barrel shifter which provides a
mechanism to carry out shifts as part of other
instructions.
So what operations does the barrel shifter support?
45 Processor Architecure and ARM
46. Barrel Shifter - Left Shift
Shifts left by the specified amount (multiplies by powers
of two) e.g.
LSL #5 = multiply by 32
Logical Shift Left (LSL)
CF Destination 0
46 Processor Architecure and ARM
47. Barrel Shifter - Right Shifts
Logical Shift Right
• Shiftsright by the
specified amount (divides
Logical Shift Right
by powers of two) e.g.
LSR #5 = divide by 32 ...0 Destination CF
Arithmetic Shift Right
• Shifts
right (divides by
powers of two) and Arithmetic Shift Right
preserves the sign bit, for
2's complement
operations. e.g. Destination CF
ASR #5 = divide by 32
Sign bit shifted in
47 Processor Architecure and ARM
48. Barrel Shifter - Rotations
Rotate Right (ROR) Rotate Right
• Similar to an ASR but the bits
wrap around as they leave the
LSB and appear as the MSB. Destination CF
e.g. ROR #5
• Note the last bit rotated is also
used as the Carry Out.
Rotate Right Extended (RRX)
• This operation uses the CPSR C
flag as a 33rd bit. Rotate Right through Carry
• Rotates right by 1 bit. Encoded
as ROR #0.
Destination CF
48 Processor Architecure and ARM
49. Barrel Shifter
Barrel shifter: a hardware device that can shift or rotate a data word by any number of bits in
a single operation. It is implemented like a multiplexor, each output can be connected to any
input depending on the shift distance.
ECE 692 L02-ISA.49 Processor Architecure and
ARM
50. Using the Barrel Shifter: the Second
Operand
Operand Operand Register, optionally with shift
1 2 operation applied.
Shift value can be either be:
5 bit unsigned integer
Barrel Specified in bottom byte of
another register.
Shifter
* Immediate value
• 8 bit number
• Can be rotated right through
an even number of
ALU positions.
• Assembler will calculate
rotate for you from
constant.
Result
50 Processor Architecure and ARM
51. Second Operand : Shifted Register
The amount by which the register is to be shifted is
contained in either:
the immediate 5-bit field in the instruction
NO OVERHEAD
Shift is done for free - executes in single cycle.
the bottom byte of a register (not PC)
Then takes extra cycle to execute
ARM doesn’t have enough read ports to read 3 registers at
once.
Then same as on other processors where shift is
separate instruction.
If no shift is specified then a default shift is applied: LSL
#0
i.e. barrel shifter has no effect on value in register.
51 Processor Architecure and ARM
52. Second Operand : Using a Shifted Register
Using a multiplication instruction to multiply by a constant means first
loading the constant into a register and then waiting a number of internal
cycles for the instruction to complete.
A more optimum solution can often be found by using some combination
of MOVs, ADDs, SUBs and RSBs with shifts.
Multiplications by a constant equal to a ((power of 2) ± 1) can be done in one
cycle.
Example: r0 = r1 * 5
Example: r0 = r1 + (r1 * 4)
ADD r0, r1, r1, LSL #2
Example: r2 = r3 * 105
Example: r2 = r3 * 15 * 7
Example: r2 = r3 * (16 - 1) * (8 - 1)
RSB r2, r3, r3, LSL #4 ; r2 = r3 * 15
RSB r2, r2, r2, LSL #3 ; r2 = r2 * 7
52 Processor Architecure and ARM
53. ARM Load/Store Instructions
LDR, LDRH, LDRB : load (half-word, byte)
STR, STRH, STRB : store (half-word, byte)
Addressing modes:
register indirect : LDR r0,[r1]
with second register : LDR r0,[r1,-r2]
with constant : LDR r0,[r1,#4]
53 Processor Architecure and ARM
54. ARM ADR Pseudo-op
Cannot refer to an address directly in an instruction.
Generate value by performing arithmetic on PC.
ADR pseudo-op generates instruction required to
calculate address:
ADR r1,FOO
54 Processor Architecure and ARM
55. Additional addressing modes
Base-plus-offset addressing:
LDR r0,[r1,#16]
Loads from location r1+16
Auto-indexing increments base register:
LDR r0,[r1,#16]!
Post-indexing fetches, then does offset:
LDR r0,[r1],#16
Loads r0 from r1, then adds 16 to r1.
55 Processor Architecure and ARM
56. Example: C Assignments
C:
x = (a + b) - c;
Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b, reusing r4
LDR r1,[r4] ; get value of b
ADD r3,r0,r1 ; compute a+b
ADR r4,c ; get address for c
LDR r2,[r4] ; get value of c
SUB r3,r3,r2 ; complete computation of x
ADR r4,x ; get address for x
STR r3,[r4] ; store value of x
56 Processor Architecure and ARM
57. Example: C Assignment
C:
y = a*(b+c);
Assembler:
ADR r4,b ; get address for b
LDR r0,[r4] ; get value of b
ADR r4,c ; get address for c
LDR r1,[r4] ; get value of c
ADD r2,r0,r1 ; compute partial result
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MUL r2,r2,r0 ; compute final value for y
ADR r4,y ; get address for y
STR r2,[r4] ; store y
57 Processor Architecure and ARM
58. Example: C Assignment
C:
z = (a << 2) | (b & 15);
Assembler:
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
MOV r0,r0,LSL 2 ; perform shift
ADR r4,b ; get address for b
LDR r1,[r4] ; get value of b
AND r1,r1,#15 ; perform AND
ORR r1,r0,r1 ; perform OR
ADR r4,z ; get address for z
STR r1,[r4] ; store value for z
58 Processor Architecure and ARM
59. ARM Flow of Control
All operations can be performed conditionally, testing
CPSR:
EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS,
GE, LT, GT, LE
Branch operation:
B #100
Can be performed conditionally.
59 Processor Architecure and ARM
60. Example: if Statement
C:
if (a < b) { x = 5; y = c + d; } else x = c - d;
Assembler:
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
BGE fblock ; if a >= b, branch to false block
60 Processor Architecure and ARM
61. if Statement, cont’d.
; true block
MOV r0,#5 ; generate value for x
ADR r4,x ; get address for x
STR r0,[r4] ; store x
ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value of d
ADD r0,r0,r1 ; compute y
ADR r4,y ; get address for y
STR r0,[r4] ; store y
B after ; branch around false block
; false block
fblock ADR r4,c ; get address for c
LDR r0,[r4] ; get value of c
ADR r4,d ; get address for d
LDR r1,[r4] ; get value for d
SUB r0,r0,r1 ; compute a-b
ADR r4,x ; get address for x
STR r0,[r4] ; store value of x
after ...
61 Processor Architecure and ARM
62. Conditional Instruction Implementation
; compute and test condition
ADR r4,a ; get address for a
LDR r0,[r4] ; get value of a
ADR r4,b ; get address for b
LDR r1,[r4] ; get value for b
CMP r0,r1 ; compare a < b
; true block
MOVLT r0,#5 ; generate value for x
ADRLT r4,x ; get address for x
STRLT r0,[r4] ; store x
ADRLT r4,c ; get address for c
LDRLT r0,[r4] ; get value of c
ADRLT r4,d ; get address for d
LDRLT r1,[r4] ; get value of d
ADDLT r0,r0,r1 ; compute y
ADRLT r4,y ; get address for y
STRLT r0,[r4] ; store y
; false block
ADRGE r4,c ; get address for c
LDRGE r0,[r4] ; get value of c
ADRGE r4,d ; get address for d
LDRGE r1,[r4] ; get value for d
SUBGE r0,r0,r1 ; compute a-b
ADRGE r4,x ; get address for x
STRGE r0,[r4] ; store value of x
62 Processor Architecure and ARM
63. Example: switch Statement
C:
switch (test) { case 0: … break; case 1: … }
Assembler:
ADR r2,test ; get address for test
LDR r0,[r2] ; load value for test
ADR r1,switchtab ; load address for switch table
LDR r1,[r1,r0,LSL #2] ; index switch table
switchtab DCD case0
DCD case1
...
63 Processor Architecure and ARM
64. Example: FIR filter
C:
for (i=0, f=0; i<N; i++)
f = f + c[i]*x[i];
Assembler
; loop initiation code
MOV r0,#0 ; use r0 for I
MOV r8,#0 ; use separate index for arrays
ADR r2,N ; get address for N
LDR r1,[r2] ; get value of N
MOV r2,#0 ; use r2 for f
ADR r3,c ; load r3 with base of c
ADR r5,x ; load r5 with base of x
64 Processor Architecure and ARM
65. FIR filter, cont’.d
; loop body
loop LDR r4,[r3,r8] ; get c[i]
LDR r6,[r5,r8] ; get x[i]
MUL r4,r4,r6 ; compute c[i]*x[i]
ADD r2,r2,r4 ; add into running sum
ADD r8,r8,#4 ; add one word offset to array index
ADD r0,r0,#1 ; add 1 to i
CMP r0,r1 ; exit?
BLT loop ; if i < N, continue
65 Processor Architecure and ARM
66. ARM Subroutine Linkage
Branch and link instruction:
BL foo
Copies current PC to r14.
To return from subroutine:
MOV r15,r14
66 Processor Architecure and ARM
67. Summary
All instructions are 32 bits long.
Load/store architecture
Data processing instructions act only on registers
Specific memory access instructions with powerful auto-
indexing addressing modes.
Most instructions operate in single cycle.
Some multi-register operations take longer.
All instructions can be executed conditionally.
67 Processor Architecure and ARM