2. Introduction
● RISC-V is an open-source ISA designed at the University of
California, Berkeley.
● The ISA is designed to target a wide range of applications from
HPC to embedded :
○ RISC-V is a variable-length ISA where instructions can be any length
multiple of 16bit.
○ RISC-V ISA contains some vacant encoding space that third parties
are allowed to use to design their extensions.
● RISC-V application code size is considerably worse than that of
alternative commercial ISAs.
3. Factors to consider:
Compilers
GCC Clang Others
ABI
UABI EABI
ISA
RV32 RV64
Language
C
C++ Rust
Workload
IOT
ML
Possible Side
Effects:
• Performance
• Power
• Implementation
Complexity
Tool chain options: -Ox,
-msave-restore, ffunction-sections,
-fdata-sections, gc-sections
Short Immediate Fields
4. Analysis Script
Input ELF files Call Objdump
Parse Objdump
Symbol tables
Saves
disassembly
Construct
Instructions
record
Very limited
CFG recovery
Perform given
optimisations
sequentially
Report results
either as a CSV
or to STDIO
BFD and Ctypes
5. Finding optimisations opportunities
● Most new instruction proposals are either:
○ Fuse some common instruction sequences into a single instruction.
○ Convert a single normal instruction into a compressed one.
● We can identify new optimisation opportunities with one of the following two
methods:
○ Track how instructions results get used in other instructions.
○ Track how instructions operands get generated.
6. Couple of Issues :
● When tracking instructions, what do we do
when we reach a possible change of
control flow?
○ Unconditional calls to outside the current
function: we save the tracking buffer
○ Conditional calls, branch targets: Stop
tracking and remove tracking chain record
Or do we keep existing instructions we
tracked
● Function names cannot be used as a
unique key !
UABI Calling Convention
9. TBLJAL
MTBLJALVEC
.base
Addr 0
Addr N
Addr 255
Addr 1
Xlen
bits
Code 1
Code 0
Code N
Code 255
Rationale:
Function calls and jumps to fixed labels
typically take 32-bit or 64-bit instruction
sequences
Proposed Solution:
• Create a table of X entries
• Store Jump addresses in the table
• Separate entries in the table using
the lower two bits depending on link register
(x0,x1 and x5)
• Create a new compressed instruction
that jumps to addresses in the new jump table
TBLJAL Table:
11. TBLJAL Analysis
Get all function calls
and count the number
each is used
Go through the
entries and eliminate
all entries that wont
gain from substitution
(JAL,J) < 3
Change the weight of
JALR, and JR entries
to be 3*Count
Get the most
common (X entries)
Replace the entries in
the instructions
record
Calculate new
instruction record size
12. Determining the value of X
0.000%
2.000%
4.000%
6.000%
8.000%
10.000%
12.000%
0 100 200 300 400 500 600 700
Table Size Vs Saving (IOT_Application)
13. PUSH POP POPRET
<bt_rand>:
20405458: 1141 addi sp,sp,-16
2040545a: c04a sw s2,0(sp)
2040545c: 70000937 lui s2,0x70000
20405460: 62090613 addi a2,s2,1568
20405464: c422 sw s0,8(sp)
20405466: c226 sw s1,4(sp)
20405468: c606 sw ra,12(sp)
2040546a: 842a mv s0,a0
2040546c: 84ae mv s1,a1
<function body>
20405494: 4501 li a0,0
20405496: 40b2 lw ra,12(sp)
20405498: 4422 lw s0,8(sp)
2040549a: 4492 lw s1,4(sp)
2040549c: 4902 lw s2,0(sp)
2040549e: 0141 addi sp,sp,16
204054a0: 8082 ret
20405458 <bt_rand>:
20405458: <16-bit> push {ra,s0-s2},{a0-a1},-16
2040545c: 70000937 lui s2,0x70000
20405460: 62090613 addi a2,s2,1568
<function body>
20405496: <16-bit> popret {ra,s0-s2},{0} 16
Rationale:
Very often in functions epilogue and prologue, we need to save
and restore multiple registers to and from the stack.
Proposed Solution:
Instead of using multiple sw/ lw instructions, we can introduce
a single instruction that perform that.
14. MULIADD, MULI and ADDIADD
uint32 get_element(uint8 index) {
return
array_base[index].element1.element2
}
02002a96 <get_element>:
02002a96 47d1 li a5,20
02002a98 02f50533 mul a0,a0,a5
02002a9c 010057b7 lui a5,0x1005
02002aa0 74478793 addi a5,a5,1860
02002aa4 953e add a0,a0,a5
02002aa6 4548 lw a0,12(a0)
02002aa8 8082 ret
The code above get compiled into
the following assembly code
Rationale:
Indexing arrays of structures in C often
requires 3 instructions:
• Load immediate to get element size
• Multiplication by index to get location of
the required element
•Addition to the base address of the array
Proposed Solution:
• Create a new instruction (MULIADD)
to fuse the 3 Instructions into
a single instruction
• Similarly we can fuse mul and li to create
MULI and add and addi to create ADDIADD.
18. Bonus !
Estimated by searching for
double shifts or andi 255
for ZEXT.B.
Estimated by searching for
stack adjustments and sw
after or lw before.
Pseudo instruction fitting
(dst==src and reg range).
Normal mul fitting for
encoding, and li followed
by mul. Get all long addresses,
hash from objdump, add
to normalised list, create
a sliding window trying
to maximise benefit.
Normal instructions
fitting for the
compressed encoding.