SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
RISC-V Zce Extension
Ibrahim Abu Kharmeh
Huawei Bristol
19 April 2021
Introduction
● RISC-V is an open-source ISA designed at the University of
California, Berkeley.
● The ISA is designed to target a wide range of applications from
HPC to embedded :
○ RISC-V is a variable-length ISA where instructions can be any length
multiple of 16bit.
○ RISC-V ISA contains some vacant encoding space that third parties
are allowed to use to design their extensions.
● RISC-V application code size is considerably worse than that of
alternative commercial ISAs.
Factors to consider:
Compilers
GCC Clang Others
ABI
UABI EABI
ISA
RV32 RV64
Language
C
C++ Rust
Workload
IOT
ML
Possible Side
Effects:
• Performance
• Power
• Implementation
Complexity
Tool chain options: -Ox,
-msave-restore, ffunction-sections,
-fdata-sections, gc-sections
Short Immediate Fields
Analysis Script
Input ELF files Call Objdump
Parse Objdump
Symbol tables
Saves
disassembly
Construct
Instructions
record
Very limited
CFG recovery
Perform given
optimisations
sequentially
Report results
either as a CSV
or to STDIO
BFD and Ctypes
Finding optimisations opportunities
● Most new instruction proposals are either:
○ Fuse some common instruction sequences into a single instruction.
○ Convert a single normal instruction into a compressed one.
● We can identify new optimisation opportunities with one of the following two
methods:
○ Track how instructions results get used in other instructions.
○ Track how instructions operands get generated.
Couple of Issues :
● When tracking instructions, what do we do
when we reach a possible change of
control flow?
○ Unconditional calls to outside the current
function: we save the tracking buffer
○ Conditional calls, branch targets: Stop
tracking and remove tracking chain record
Or do we keep existing instructions we
tracked
● Function names cannot be used as a
unique key !
UABI Calling Convention
Benchmark Suite
Xlen
RV64
Top 30
Debian
V8
RV32
Opus L3C Embench Coremark Testfloat
Huawei
IOT Code
Zephyr
Audio Codec Collection of
benchmarks
FP tests RTOS / IOT
CPP test /
generic RV64
Zce Extension
TBLJAL
MTBLJALVEC
.base
Addr 0
Addr N
Addr 255
Addr 1
Xlen
bits
Code 1
Code 0
Code N
Code 255
Rationale:
Function calls and jumps to fixed labels
typically take 32-bit or 64-bit instruction
sequences
Proposed Solution:
• Create a table of X entries
• Store Jump addresses in the table
• Separate entries in the table using
the lower two bits depending on link register
(x0,x1 and x5)
• Create a new compressed instruction
that jumps to addresses in the new jump table
TBLJAL Table:
TBLJAL (Example)
<vsprintf>:
#64-bit AUIPC/JALR sequence
e084be: 001f8317 auipc t1,0x1f8
e084c2: 18a302e7 jalr t0,394(t1)
e084c6: 86b2 mv a3,a2
e084c8: 862e mv a2,a1
e084ca: 800005b7 lui a1,0x80000
e084ce: fff5c593 not a1,a1
#32-bit JAL
e084d2: f61ff0ef jal ra,e08432
#64-bit AUIPC/JALR sequence
e084d6: 001f8317 auipc t1,0x1f8
e084da: 19630067 jr 406(t1)
00e084be <vsprintf>:
e084be: xxxx tbljal #x
e084c6: 86b2 mv a3,a2
e084c8: 862e mv a2,a1
e084ca: 800005b7 lui a1,0x80000
e084ce: fff5c593 not a1,a1
e084d2: xxxx tbljal #y
e084da: xxxx tbljal #z
TBLJAL Analysis
Get all function calls
and count the number
each is used
Go through the
entries and eliminate
all entries that wont
gain from substitution
(JAL,J) < 3
Change the weight of
JALR, and JR entries
to be 3*Count
Get the most
common (X entries)
Replace the entries in
the instructions
record
Calculate new
instruction record size
Determining the value of X
0.000%
2.000%
4.000%
6.000%
8.000%
10.000%
12.000%
0 100 200 300 400 500 600 700
Table Size Vs Saving (IOT_Application)
PUSH POP POPRET
<bt_rand>:
20405458: 1141 addi sp,sp,-16
2040545a: c04a sw s2,0(sp)
2040545c: 70000937 lui s2,0x70000
20405460: 62090613 addi a2,s2,1568
20405464: c422 sw s0,8(sp)
20405466: c226 sw s1,4(sp)
20405468: c606 sw ra,12(sp)
2040546a: 842a mv s0,a0
2040546c: 84ae mv s1,a1
<function body>
20405494: 4501 li a0,0
20405496: 40b2 lw ra,12(sp)
20405498: 4422 lw s0,8(sp)
2040549a: 4492 lw s1,4(sp)
2040549c: 4902 lw s2,0(sp)
2040549e: 0141 addi sp,sp,16
204054a0: 8082 ret
20405458 <bt_rand>:
20405458: <16-bit> push {ra,s0-s2},{a0-a1},-16
2040545c: 70000937 lui s2,0x70000
20405460: 62090613 addi a2,s2,1568
<function body>
20405496: <16-bit> popret {ra,s0-s2},{0} 16
Rationale:
Very often in functions epilogue and prologue, we need to save
and restore multiple registers to and from the stack.
Proposed Solution:
Instead of using multiple sw/ lw instructions, we can introduce
a single instruction that perform that.
MULIADD, MULI and ADDIADD
uint32 get_element(uint8 index) {
return
array_base[index].element1.element2
}
02002a96 <get_element>:
02002a96 47d1 li a5,20
02002a98 02f50533 mul a0,a0,a5
02002a9c 010057b7 lui a5,0x1005
02002aa0 74478793 addi a5,a5,1860
02002aa4 953e add a0,a0,a5
02002aa6 4548 lw a0,12(a0)
02002aa8 8082 ret
The code above get compiled into
the following assembly code
Rationale:
Indexing arrays of structures in C often
requires 3 instructions:
• Load immediate to get element size
• Multiplication by index to get location of
the required element
•Addition to the base address of the array
Proposed Solution:
• Create a new instruction (MULIADD)
to fuse the 3 Instructions into
a single instruction
• Similarly we can fuse mul and li to create
MULI and add and addi to create ADDIADD.
Immediate Length Evaluation
0.00%
0.10%
0.20%
0.30%
0.40%
0.50%
0.60%
0.70%
0.80%
0.90%
1.00%
1 2 3 4 5 6 7 8 9 10 11 12
MULI Immediate Length Evaluation
GCC10_audiocodec_fixed_LC3plus GCC10_audiocodec_fixed_opus_demo
GCC10_audiocodec_float_LC3plus GCC10_audiocodec_float_opus_demo
GCC10_coremark GCC10_embench_aha-mont64
GCC10_embench_matmult-int GCC10_embench_minver
GCC10_embench_picojpeg GCC10_embench_st
GCC10_embench_ud GCC10_fpmark_atan-1M
GCC10_fpmark_inner-product-mid-10k GCC10_huawei_iot_application
GCC10_huawei_iot_protocol GCC10_zephyr_central
GCC10_zephyr_peripheral
Results !
Filename Instruction record size C.TBLJAL PUSHPOP MULI ADDIADD MULIADD
GCC10_huawei_iot_protocol 960678 7.00% 2.86% 0.34% 0.10% 0.30%
GCC10_huawei_iot_application 338824 9.36% 4.14% 0.18% 0.11% 0.18%
GCC10_zephyr_peripheral 76246 6.58% 0.17% 0.07% 0.05% 0.07%
GCC10_embench_cubic 45632 2.59% 4.69% 0.00% 0.40% 0.00%
GCC10_zephyr_central 43434 5.83% 0.21% 0.08% 0.12% 0.11%
GCC10_embench_nsichneu 15208 0.00% 1.42% 0.00% 0.00% 0.00%
GCC10_embench_wikisort 13776 1.32% 4.60% 0.00% 0.10% 0.00%
GCC10_embench_st 10228 1.29% 6.06% 0.04% 0.00% 0.00%
GCC10_embench_nbody 10026 1.38% 5.52% 0.00% 0.00% 0.00%
GCC10_embench_minver 8944 1.27% 5.73% 0.07% 0.00% 0.09%
GCC10_embench_picojpeg 8164 2.40% 3.28% 0.93% 0.59% 0.59%
GCC10_embench_qrduino 6314 0.25% 3.83% 0.03% 0.10% 0.06%
GCC10_embench_nettle-sha256 6120 0.07% 6.25% 0.07% 0.03% 0.10%
GCC10_embench_statemate 4312 0.05% 4.82% 0.00% 0.00% 0.00%
GCC10_embench_ud 3522 0.80% 8.52% 0.11% 0.06% 0.00%
GCC10_embench_nettle-aes 3290 0.49% 10.64% 0.06% 0.00% 0.00%
GCC10_embench_slre 2672 0.07% 8.38% 0.00% 0.22% 0.00%
GCC10_embench_sglib-combined 2542 0.31% 8.74% 0.00% 0.00% 0.00%
GCC10_embench_huffbench 1888 0.00% 10.81% 0.00% 0.21% 0.00%
GCC10_embench_edn 1696 0.00% 14.97% 1.06% 0.00% 0.00%
GCC10_embench_aha-mont64 1204 0.00% 17.11% 0.00% 0.00% 0.00%
GCC10_embench_matmult-int 652 0.00% 32.21% 0.61% 0.00% 0.00%
GCC10_embench_crc32 388 0.52% 51.54% 0.00% 0.52% 0.00%
Average 6.93% 3.17% 0.26% 0.11% 0.23%
Snapshot of complete results !
Bonus !
Estimated by searching for
double shifts or andi 255
for ZEXT.B.
Estimated by searching for
stack adjustments and sw
after or lw before.
Pseudo instruction fitting
(dst==src and reg range).
Normal mul fitting for
encoding, and li followed
by mul. Get all long addresses,
hash from objdump, add
to normalised list, create
a sliding window trying
to maximise benefit.
Normal instructions
fitting for the
compressed encoding.
Questions ?

Weitere ähnliche Inhalte

Was ist angesagt?

Lcu14 107- op-tee on ar mv8
Lcu14 107- op-tee on ar mv8Lcu14 107- op-tee on ar mv8
Lcu14 107- op-tee on ar mv8Linaro
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoCMacpaul Lin
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureRyo Jin
 
Tcache Exploitation
Tcache ExploitationTcache Exploitation
Tcache ExploitationAngel Boy
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownScyllaDB
 
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequenceHoucheng Lin
 
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMUSFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMULinaro
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE Linaro
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsBrendan Gregg
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bitsChiou-Nan Chen
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 艾鍗科技
 
ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいwata2ki
 
Chapitre ii mémoires
Chapitre ii mémoiresChapitre ii mémoires
Chapitre ii mémoiresSana Aroussi
 
Read-only rootfs: theory and practice
Read-only rootfs: theory and practiceRead-only rootfs: theory and practice
Read-only rootfs: theory and practiceChris Simmonds
 
Chapitre 2.1 - architecture d'un microprocesseur - bus et communication
Chapitre 2.1 -  architecture d'un microprocesseur - bus et communicationChapitre 2.1 -  architecture d'un microprocesseur - bus et communication
Chapitre 2.1 - architecture d'un microprocesseur - bus et communicationTarik Zakaria Benmerar
 
Secure storage updates - SFO17-309
Secure storage updates - SFO17-309Secure storage updates - SFO17-309
Secure storage updates - SFO17-309Linaro
 

Was ist angesagt? (20)

Lcu14 107- op-tee on ar mv8
Lcu14 107- op-tee on ar mv8Lcu14 107- op-tee on ar mv8
Lcu14 107- op-tee on ar mv8
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 
Tcache Exploitation
Tcache ExploitationTcache Exploitation
Tcache Exploitation
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequence
 
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMUSFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
SFO15-202: Towards Multi-Threaded Tiny Code Generator (TCG) in QEMU
 
BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE BUD17-400: Secure Data Path with OPTEE
BUD17-400: Secure Data Path with OPTEE
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame GraphsUSENIX ATC 2017: Visualizing Performance with Flame Graphs
USENIX ATC 2017: Visualizing Performance with Flame Graphs
 
Linux Device Tree
Linux Device TreeLinux Device Tree
Linux Device Tree
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron) 5.MLP(Multi-Layer Perceptron)
5.MLP(Multi-Layer Perceptron)
 
ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくい
 
Les systèmes embarqués arduino
Les systèmes embarqués arduinoLes systèmes embarqués arduino
Les systèmes embarqués arduino
 
Ixgbe internals
Ixgbe internalsIxgbe internals
Ixgbe internals
 
Chapitre ii mémoires
Chapitre ii mémoiresChapitre ii mémoires
Chapitre ii mémoires
 
Read-only rootfs: theory and practice
Read-only rootfs: theory and practiceRead-only rootfs: theory and practice
Read-only rootfs: theory and practice
 
Chapitre 2.1 - architecture d'un microprocesseur - bus et communication
Chapitre 2.1 -  architecture d'un microprocesseur - bus et communicationChapitre 2.1 -  architecture d'un microprocesseur - bus et communication
Chapitre 2.1 - architecture d'un microprocesseur - bus et communication
 
Secure storage updates - SFO17-309
Secure storage updates - SFO17-309Secure storage updates - SFO17-309
Secure storage updates - SFO17-309
 

Ähnlich wie RISC-V Zce Extension

Windows Debugging with WinDbg
Windows Debugging with WinDbgWindows Debugging with WinDbg
Windows Debugging with WinDbgArno Huetter
 
The forgotten art of assembly
The forgotten art of assemblyThe forgotten art of assembly
The forgotten art of assemblyMarian Marinov
 
My seminar new 28
My seminar new 28My seminar new 28
My seminar new 28rajeshkvdn
 
Symbolic Debugging with DWARF
Symbolic Debugging with DWARFSymbolic Debugging with DWARF
Symbolic Debugging with DWARFSamy Bahra
 
Make ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKMake ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKSaumil Shah
 
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance FuckupsNETFest
 
Reverse engineering of binary programs for custom virtual machines
Reverse engineering of binary programs for custom virtual machinesReverse engineering of binary programs for custom virtual machines
Reverse engineering of binary programs for custom virtual machinesSmartDec
 
HackLU 2018 Make ARM Shellcode Great Again
HackLU 2018 Make ARM Shellcode Great AgainHackLU 2018 Make ARM Shellcode Great Again
HackLU 2018 Make ARM Shellcode Great AgainSaumil Shah
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFBrendan Gregg
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWHsien-Hsin Sean Lee, Ph.D.
 
Scale17x buffer overflows
Scale17x buffer overflowsScale17x buffer overflows
Scale17x buffer overflowsjohseg
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversSatpal Parmar
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Jagadisha Maiya
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linaro
 
Kernel Recipes 2013 - Deciphering Oopsies
Kernel Recipes 2013 - Deciphering OopsiesKernel Recipes 2013 - Deciphering Oopsies
Kernel Recipes 2013 - Deciphering OopsiesAnne Nicolas
 

Ähnlich wie RISC-V Zce Extension (20)

ARM 64bit has come!
ARM 64bit has come!ARM 64bit has come!
ARM 64bit has come!
 
Debugging 2013- Jesper Brouer
Debugging 2013- Jesper BrouerDebugging 2013- Jesper Brouer
Debugging 2013- Jesper Brouer
 
Windows Debugging with WinDbg
Windows Debugging with WinDbgWindows Debugging with WinDbg
Windows Debugging with WinDbg
 
The forgotten art of assembly
The forgotten art of assemblyThe forgotten art of assembly
The forgotten art of assembly
 
My seminar new 28
My seminar new 28My seminar new 28
My seminar new 28
 
Symbolic Debugging with DWARF
Symbolic Debugging with DWARFSymbolic Debugging with DWARF
Symbolic Debugging with DWARF
 
Make ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKMake ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEK
 
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
.NET Fest 2019. Łukasz Pyrzyk. Daily Performance Fuckups
 
Reverse engineering of binary programs for custom virtual machines
Reverse engineering of binary programs for custom virtual machinesReverse engineering of binary programs for custom virtual machines
Reverse engineering of binary programs for custom virtual machines
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
HackLU 2018 Make ARM Shellcode Great Again
HackLU 2018 Make ARM Shellcode Great AgainHackLU 2018 Make ARM Shellcode Great Again
HackLU 2018 Make ARM Shellcode Great Again
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPFLinux 4.x Tracing: Performance Analysis with bcc/BPF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
 
OptimizingARM
OptimizingARMOptimizingARM
OptimizingARM
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
Scale17x buffer overflows
Scale17x buffer overflowsScale17x buffer overflows
Scale17x buffer overflows
 
Troubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device DriversTroubleshooting Linux Kernel Modules And Device Drivers
Troubleshooting Linux Kernel Modules And Device Drivers
 
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
Troubleshooting linux-kernel-modules-and-device-drivers-1233050713693744-1
 
Dynamic user trace
Dynamic user traceDynamic user trace
Dynamic user trace
 
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel - BUD17-120
 
Kernel Recipes 2013 - Deciphering Oopsies
Kernel Recipes 2013 - Deciphering OopsiesKernel Recipes 2013 - Deciphering Oopsies
Kernel Recipes 2013 - Deciphering Oopsies
 

Mehr von RISC-V International

London Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VLondon Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VRISC-V International
 
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...RISC-V International
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VRISC-V International
 
Standardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VStandardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VRISC-V International
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresRISC-V International
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipRISC-V International
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V International
 
RISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V International
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V International
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V International
 
RISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V International
 
RISC-V software state of the union
RISC-V software state of the unionRISC-V software state of the union
RISC-V software state of the unionRISC-V International
 
Ripes tracking computer architecture throught visual and interactive simula...
Ripes   tracking computer architecture throught visual and interactive simula...Ripes   tracking computer architecture throught visual and interactive simula...
Ripes tracking computer architecture throught visual and interactive simula...RISC-V International
 

Mehr von RISC-V International (20)

WD RISC-V inliner work effort
WD RISC-V inliner work effortWD RISC-V inliner work effort
WD RISC-V inliner work effort
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
 
London Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VLondon Open Source Meetup for RISC-V
London Open Source Meetup for RISC-V
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Standardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VStandardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-V
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
Security and functional safety
Security and functional safetySecurity and functional safety
Security and functional safety
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket Chip
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor Family
 
RISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_gen
 
RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentor
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmware
 
RISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notes
 
RISC-V software state of the union
RISC-V software state of the unionRISC-V software state of the union
RISC-V software state of the union
 
Ripes tracking computer architecture throught visual and interactive simula...
Ripes   tracking computer architecture throught visual and interactive simula...Ripes   tracking computer architecture throught visual and interactive simula...
Ripes tracking computer architecture throught visual and interactive simula...
 
Porting tock to open titan
Porting tock to open titanPorting tock to open titan
Porting tock to open titan
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

RISC-V Zce Extension

  • 1. RISC-V Zce Extension Ibrahim Abu Kharmeh Huawei Bristol 19 April 2021
  • 2. Introduction ● RISC-V is an open-source ISA designed at the University of California, Berkeley. ● The ISA is designed to target a wide range of applications from HPC to embedded : ○ RISC-V is a variable-length ISA where instructions can be any length multiple of 16bit. ○ RISC-V ISA contains some vacant encoding space that third parties are allowed to use to design their extensions. ● RISC-V application code size is considerably worse than that of alternative commercial ISAs.
  • 3. Factors to consider: Compilers GCC Clang Others ABI UABI EABI ISA RV32 RV64 Language C C++ Rust Workload IOT ML Possible Side Effects: • Performance • Power • Implementation Complexity Tool chain options: -Ox, -msave-restore, ffunction-sections, -fdata-sections, gc-sections Short Immediate Fields
  • 4. Analysis Script Input ELF files Call Objdump Parse Objdump Symbol tables Saves disassembly Construct Instructions record Very limited CFG recovery Perform given optimisations sequentially Report results either as a CSV or to STDIO BFD and Ctypes
  • 5. Finding optimisations opportunities ● Most new instruction proposals are either: ○ Fuse some common instruction sequences into a single instruction. ○ Convert a single normal instruction into a compressed one. ● We can identify new optimisation opportunities with one of the following two methods: ○ Track how instructions results get used in other instructions. ○ Track how instructions operands get generated.
  • 6. Couple of Issues : ● When tracking instructions, what do we do when we reach a possible change of control flow? ○ Unconditional calls to outside the current function: we save the tracking buffer ○ Conditional calls, branch targets: Stop tracking and remove tracking chain record Or do we keep existing instructions we tracked ● Function names cannot be used as a unique key ! UABI Calling Convention
  • 7. Benchmark Suite Xlen RV64 Top 30 Debian V8 RV32 Opus L3C Embench Coremark Testfloat Huawei IOT Code Zephyr Audio Codec Collection of benchmarks FP tests RTOS / IOT CPP test / generic RV64
  • 9. TBLJAL MTBLJALVEC .base Addr 0 Addr N Addr 255 Addr 1 Xlen bits Code 1 Code 0 Code N Code 255 Rationale: Function calls and jumps to fixed labels typically take 32-bit or 64-bit instruction sequences Proposed Solution: • Create a table of X entries • Store Jump addresses in the table • Separate entries in the table using the lower two bits depending on link register (x0,x1 and x5) • Create a new compressed instruction that jumps to addresses in the new jump table TBLJAL Table:
  • 10. TBLJAL (Example) <vsprintf>: #64-bit AUIPC/JALR sequence e084be: 001f8317 auipc t1,0x1f8 e084c2: 18a302e7 jalr t0,394(t1) e084c6: 86b2 mv a3,a2 e084c8: 862e mv a2,a1 e084ca: 800005b7 lui a1,0x80000 e084ce: fff5c593 not a1,a1 #32-bit JAL e084d2: f61ff0ef jal ra,e08432 #64-bit AUIPC/JALR sequence e084d6: 001f8317 auipc t1,0x1f8 e084da: 19630067 jr 406(t1) 00e084be <vsprintf>: e084be: xxxx tbljal #x e084c6: 86b2 mv a3,a2 e084c8: 862e mv a2,a1 e084ca: 800005b7 lui a1,0x80000 e084ce: fff5c593 not a1,a1 e084d2: xxxx tbljal #y e084da: xxxx tbljal #z
  • 11. TBLJAL Analysis Get all function calls and count the number each is used Go through the entries and eliminate all entries that wont gain from substitution (JAL,J) < 3 Change the weight of JALR, and JR entries to be 3*Count Get the most common (X entries) Replace the entries in the instructions record Calculate new instruction record size
  • 12. Determining the value of X 0.000% 2.000% 4.000% 6.000% 8.000% 10.000% 12.000% 0 100 200 300 400 500 600 700 Table Size Vs Saving (IOT_Application)
  • 13. PUSH POP POPRET <bt_rand>: 20405458: 1141 addi sp,sp,-16 2040545a: c04a sw s2,0(sp) 2040545c: 70000937 lui s2,0x70000 20405460: 62090613 addi a2,s2,1568 20405464: c422 sw s0,8(sp) 20405466: c226 sw s1,4(sp) 20405468: c606 sw ra,12(sp) 2040546a: 842a mv s0,a0 2040546c: 84ae mv s1,a1 <function body> 20405494: 4501 li a0,0 20405496: 40b2 lw ra,12(sp) 20405498: 4422 lw s0,8(sp) 2040549a: 4492 lw s1,4(sp) 2040549c: 4902 lw s2,0(sp) 2040549e: 0141 addi sp,sp,16 204054a0: 8082 ret 20405458 <bt_rand>: 20405458: <16-bit> push {ra,s0-s2},{a0-a1},-16 2040545c: 70000937 lui s2,0x70000 20405460: 62090613 addi a2,s2,1568 <function body> 20405496: <16-bit> popret {ra,s0-s2},{0} 16 Rationale: Very often in functions epilogue and prologue, we need to save and restore multiple registers to and from the stack. Proposed Solution: Instead of using multiple sw/ lw instructions, we can introduce a single instruction that perform that.
  • 14. MULIADD, MULI and ADDIADD uint32 get_element(uint8 index) { return array_base[index].element1.element2 } 02002a96 <get_element>: 02002a96 47d1 li a5,20 02002a98 02f50533 mul a0,a0,a5 02002a9c 010057b7 lui a5,0x1005 02002aa0 74478793 addi a5,a5,1860 02002aa4 953e add a0,a0,a5 02002aa6 4548 lw a0,12(a0) 02002aa8 8082 ret The code above get compiled into the following assembly code Rationale: Indexing arrays of structures in C often requires 3 instructions: • Load immediate to get element size • Multiplication by index to get location of the required element •Addition to the base address of the array Proposed Solution: • Create a new instruction (MULIADD) to fuse the 3 Instructions into a single instruction • Similarly we can fuse mul and li to create MULI and add and addi to create ADDIADD.
  • 15. Immediate Length Evaluation 0.00% 0.10% 0.20% 0.30% 0.40% 0.50% 0.60% 0.70% 0.80% 0.90% 1.00% 1 2 3 4 5 6 7 8 9 10 11 12 MULI Immediate Length Evaluation GCC10_audiocodec_fixed_LC3plus GCC10_audiocodec_fixed_opus_demo GCC10_audiocodec_float_LC3plus GCC10_audiocodec_float_opus_demo GCC10_coremark GCC10_embench_aha-mont64 GCC10_embench_matmult-int GCC10_embench_minver GCC10_embench_picojpeg GCC10_embench_st GCC10_embench_ud GCC10_fpmark_atan-1M GCC10_fpmark_inner-product-mid-10k GCC10_huawei_iot_application GCC10_huawei_iot_protocol GCC10_zephyr_central GCC10_zephyr_peripheral
  • 16. Results ! Filename Instruction record size C.TBLJAL PUSHPOP MULI ADDIADD MULIADD GCC10_huawei_iot_protocol 960678 7.00% 2.86% 0.34% 0.10% 0.30% GCC10_huawei_iot_application 338824 9.36% 4.14% 0.18% 0.11% 0.18% GCC10_zephyr_peripheral 76246 6.58% 0.17% 0.07% 0.05% 0.07% GCC10_embench_cubic 45632 2.59% 4.69% 0.00% 0.40% 0.00% GCC10_zephyr_central 43434 5.83% 0.21% 0.08% 0.12% 0.11% GCC10_embench_nsichneu 15208 0.00% 1.42% 0.00% 0.00% 0.00% GCC10_embench_wikisort 13776 1.32% 4.60% 0.00% 0.10% 0.00% GCC10_embench_st 10228 1.29% 6.06% 0.04% 0.00% 0.00% GCC10_embench_nbody 10026 1.38% 5.52% 0.00% 0.00% 0.00% GCC10_embench_minver 8944 1.27% 5.73% 0.07% 0.00% 0.09% GCC10_embench_picojpeg 8164 2.40% 3.28% 0.93% 0.59% 0.59% GCC10_embench_qrduino 6314 0.25% 3.83% 0.03% 0.10% 0.06% GCC10_embench_nettle-sha256 6120 0.07% 6.25% 0.07% 0.03% 0.10% GCC10_embench_statemate 4312 0.05% 4.82% 0.00% 0.00% 0.00% GCC10_embench_ud 3522 0.80% 8.52% 0.11% 0.06% 0.00% GCC10_embench_nettle-aes 3290 0.49% 10.64% 0.06% 0.00% 0.00% GCC10_embench_slre 2672 0.07% 8.38% 0.00% 0.22% 0.00% GCC10_embench_sglib-combined 2542 0.31% 8.74% 0.00% 0.00% 0.00% GCC10_embench_huffbench 1888 0.00% 10.81% 0.00% 0.21% 0.00% GCC10_embench_edn 1696 0.00% 14.97% 1.06% 0.00% 0.00% GCC10_embench_aha-mont64 1204 0.00% 17.11% 0.00% 0.00% 0.00% GCC10_embench_matmult-int 652 0.00% 32.21% 0.61% 0.00% 0.00% GCC10_embench_crc32 388 0.52% 51.54% 0.00% 0.52% 0.00% Average 6.93% 3.17% 0.26% 0.11% 0.23%
  • 17. Snapshot of complete results !
  • 18. Bonus ! Estimated by searching for double shifts or andi 255 for ZEXT.B. Estimated by searching for stack adjustments and sw after or lw before. Pseudo instruction fitting (dst==src and reg range). Normal mul fitting for encoding, and li followed by mul. Get all long addresses, hash from objdump, add to normalised list, create a sliding window trying to maximise benefit. Normal instructions fitting for the compressed encoding.