SlideShare a Scribd company logo
1 of 42
Download to read offline
BitVisor on Aarch64
2022/12/07 @ BitVisor Summit 11
Ake Koomsin
Agenda
◼ Current requirements
◼ How VMM works on Aarch64
◼ BitVisor Aarch64 initialization
◼ Interrupt handling
◼ MMIO handling
◼ Multiple core support
◼ Current limitation
◼ Ongoing tasks
◼ QEMU bugs we found
◼ Demo
1
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Current requirements
◼ Armv8.1 or later
– Need Virtualization Host Extension (VHE) for process
implementation
◼ Generic Interrupt Controller v3 (GICv3)
– Guest interrupt injection
◼ EL3 and Power State Coordination Interface (PSCI)
– Firmware running in EL3
– For secondary core start-up
◼ UEFI environment and ACPI
– BitVisor currently relies on them
2
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
How VMM works on Aarch64
3
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Firmware
Hypervisor
OS0 OS1
P0 P1 P2 P3
EL0
EL1
EL2
EL3
SMC
HVC
SVC
How VMM works on Aarch64
4
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Hypervisor
OS0
P0 P1
Host OS/Hypervisor
OS0
P0 P1 P2
Standard With VHE
EL0
EL1
EL2
How VMM works on Aarch64
◼ Main system registers related to virtualization
– HCR_EL2
• Enable/Disable hypervisor
• Hypervisor behavior
• Register trapping
– VTTBR_EL2
• Stage-2 translation page table
– VTCR_EL2
• Stage-2 translation control
– VMPIDR_EL2
• Multiple Processor ID MPIDR_EL1 value read by the guest
– VPIDR_EL2
• Processor ID PIDR_EL1 value read by the guest
5
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
How VMM works on Aarch64
◼ Page table on Aarch64 basic
– Typically, an OS sets up TTBR0_EL1 for a process’s page
table and TTBR1_EL1 for kernel page table
• Addresses with 0xF… prefix are mapped in TTBR1_EL1
– Normally, we can access only TTBR0_EL2 only on EL2
– With VHE feature, we can make EL2 behavior as same as
EL1
• Can access to TTBR1_EL2
• System registers related to translation change their structures
– Ex. TCR_EL2 bit definition becomes like TCR_EL1
6
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
How VMM works on Aarch64
◼ Guest OS returns to EL2 from time to time through
exceptions
– Interrupt
• If the hypervisor chooses to route interrupts to EL2
– Trapping
• Register accesses
• Intermediate Physical Address translation fault
7
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
How VMM works on Aarch64
◼ When an exception occurs
– The entry point is one of locations on the vector table
pointed by VBAR_EL2
• Depending on the current running EL/exception type/mode
– The first thing to do is saving the current context
• General registers x0-x30
• Floating registers if necessary
• Other system registers if necessary
– In BitVisor case (To switch between our processes and the guest)
» HCR_EL2
» ELR_EL2, SPSR_EL2, FAR_EL2, ESR_EL2
» SP_EL0, TPIDR_EL0
8
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
How VMM works on Aarch64
◼ Handling an exception
– Interrupt (Asynchronous)
• Interrupt controller handler
– Scheduling
– Forwarding to the guest
– Hand over to the appropriate device driver
– Trapping (Synchronous)
• Read ESR_EL2 for exception syndrome
• Handle them accordingly
◼ After handling the exception, return to the guest
– Restore the entry context
– eret instruction to return to either EL0 or EL1 depending on
SPSR_EL2
9
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Early Aarch64 boot
◼ Relocation
– To be able to run code at any address, we need a table
structure that tell us where and what to adjust to get final
addresses
• Usually for global variables
– In the linker file, we have a special section for this table
named rela.dyn
10
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
…
.rela.dyn : AT (phys + (_rela_start - head)) {
_rela_start = .;
*(.rela)
*(.rela.*)
_rela_end = .;
}
…
Early Aarch64 boot
◼ Relocation
– It contains an array of the following structure
– For BitVisor, we only deal with R_AARCH64_RELATIVE
operation currently
11
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
struct rela_entry {
u64 r_offset; /* Location to apply relocation */
u64 r_info; /* Determine operation to perform */
u64 r_addend;
};
Early Aarch64 boot
◼ Relocation
– Resolving R_AARCH64_RELATIVE type with Delta(S) +
Addend operation according to Aa64elf document
• S is the static address of a symbol
• Delta(S) means find the difference between the static link
address of S and the execution address of S
– In other words
• If head_linktime_addr is 0, diff is head_runtime_addr
– BitVisor head_linktime_addr is currently 0
12
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
diff = head_runtime_addr – head_linktime_addr;
*(u64 *)(diff + r_offset) = diff + r_addend;
Early Aarch64 boot
◼ Relocation
13
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
int SECTION_ENTRY_TEXT
apply_reloc (phys_t base, struct rela_entry *start, struct rela_entry *end)
{
struct rela_entry *entries = start;
u64 *target;
unsigned int i, n_entries = end - start;
for (i = 0; i < n_entries; i++) {
switch (entries[i].r_info) {
case R_AARCH64_NONE:
break; /* Do nothing */
case R_AARCH64_RELATIVE:
/*
* Static head address is 0. That means Delta(S) is
* the runtime address.
*/
target = (u64 *)(base + entries[i].r_offset);
*target = base + entries[i].r_addend;
break;
default:
/* Current deal with only R_AARCH64_RELATIVE */
return -1;
}
}
return 0;
}
Early Aarch64 boot
◼ Cross-compiling UEFI loader
– Mingw currently has no toolchain for Aarch64
– Switch to clang for cross-compiling instead
– Most of code for UEFI loader remains the same
◼ UEFI loader and bitvisor.elf relation
– UEFI loader looks for bitvisor.elf
– It then loads the first 64KB portion bitvisor.elf for
bootstrapping
• Early initialization + loading the rest of BitVisor into a memory
• .entry section of BitVisor must be within the first 64KB
– Once bootstrapping is done, we can jump to the newly
allocated BitVisor, and start the remaining initialization
14
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Early Aarch64 boot
◼ Entering BitVisor code
– Firstly, save context at entry
16
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
entry:
…
adrp x9, _uefi_entry_ctx
add x9, x9, :lo12:_uefi_entry_ctx
stp x19, x20, [x9], #16
…
stp x29, x30, [x9], #16
…
mov x10, sp
…
mrs x10, TTBR0_EL2
str x10, [x9], #8
mrs x10, VBAR_EL2
str x10, [x9], #8
…
Early Aarch64 boot
◼ Entering BitVisor code
– Apply relocation, need to correct addresses listed in rela.*
section
17
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
entry:
…
adrp x0, head
add x0, x0, :lo12:head
adrp x1, _rela_start
add x1, x1, :lo12:_rela_start
adrp x2, _rela_end
add x2, x2, :lo12:_rela_end
bl apply_reloc64k
cmp x0, 0
bne .L1 /* Return if apply_reloc64() fails */
…
Early Aarch64 boot
◼ Entering BitVisor code
– Then, enter uefi_entry()
• Save some UEFI routine addresses
• Load entire BitVisor to a new allocated location
• Setup virtual address
– Enable HCR_E2H so that TTBR1_EL2 becomes effective
– Setup TTBR1_EL2 table for hypervisor memory mapping
– Setup MAIR_EL2, TCR_EL2, and SCTLR_EL2
– 0xFFFF000000000000 is our current virtual base address
• Return virtual address base to the assembly code so that we
can jump to the new location with virtual address
18
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Early Aarch64 boot
◼ Entering BitVisor code
– Jump to asm_bitvisor_entry()
– Apply relocation again with the new virtual address base +
Additional setup for C code entry
19
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
/*
* x0 now contains new virtual memory base address.
* Next, calculate the position of asm_bitvisor_entry()
* relative x0.
*/
adrp x21, head /* Old head */
add x21, x21, :lo12:head
adrp x11, bitvisor_entry
add x11, x11, :lo12:asm_bitvisor_entry
sub x11, x11, x21
add x11, x11, x0
br x11 /* Jump to newly located asm_bitvisor_entry */
Early Aarch64 boot
◼ Before calling vmm_main()
– Initialize exception handling
20
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
void
bitvisor_entry (void)
{
uefi_booted = true;
/* Save this for secondary core start */
mair_host = mrs (MAIR_EL2);
tcr_host = mrs (TCR_EL2);
sctlr_host = mrs (SCTLR_EL2);
serial_init ();
disable_interrupt ();
init_default_exception_handler ();
init_exception ();
vmm_main ();
}
BitVisor Aarch64 initialization
◼ The initialization flow is roughly as same as current
BitVisor
– Mainly done through call_initfunc()
– There are some Aarch64 specific initialization to take care
• MMU/memory mapping/MMIO handling, GIC initialization, etc
◼ Need some adjustment of the original code
– Separate platform specific code into separate files and
create interfaces for platform independent code to call them
• Ex. in the process implementation
– x86 assembly in call_msgfunc0() is replaced by
process_exec()
– The actual implementation of process_exec() is in either
x86/process.c or aarch64/process.c
21
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
BitVisor Aarch64 initialization
◼ Entering guest
22
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
void
vm_start (void)
{
u64 orig_tcr, val;
…
/* Setting up EL1 environment */
msr (SP_EL1, _uefi_entry_ctx.sp);
msr (ESR_EL12, _uefi_entry_ctx.esr_el2);
msr (FAR_EL12, _uefi_entry_ctx.far_el2);
msr (MAIR_EL12, _uefi_entry_ctx.mair_el2);
…
msr (TCR_EL12, val);
msr (TPIDR_EL1, _uefi_entry_ctx.tpidr_el2);
msr (TTBR0_EL12, _uefi_entry_ctx.ttbr0_el2);
msr (VBAR_EL12, _uefi_entry_ctx.vbar_el2);
msr (CPACR_EL12, CPACR_ZEN (3) | CPACR_FPEN (3) | CPACR_SMEN (3));
val = (_uefi_entry_ctx.spsr_el2 & ~0xF) | 0x5; /* E1h */
msr (SPSR_EL2, val);
msr (ELR_EL2, _uefi_entry_ctx.x30);
msr (CPTR_EL2, CPTR_FLAGS);
msr (HCR_EL2, HCR_FLAGS);
start_guest (&_uefi_entry_ctx);
}
BitVisor Aarch64 initialization
◼ Entering guest
23
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
start_guest:
ldp x19, x20, [x0], #16
ldp x21, x22, [x0], #16
ldp x23, x24, [x0], #16
ldp x25, x26, [x0], #16
ldp x27, x28, [x0], #16
ldp x29, x30, [x0], #16
/* Clear all caller-saved register */
eor x15, x15, x15
eor x14, x14, x14
eor x13, x13, x13
…
mov x0, #1 /* Return 1 as success upon entry guest */
dsb ish
isb
eret
/* Prevent speculative execution */
dsb nsh
isb
Interrupt handling
◼ Physical interrupt and virtual interrupt
– The physical one is from an actual device
• Guest can receive physical interrupts if the hypervisor chooses
not to handle interrupts
– The virtual one is the one that the hypervisor injects to the
guest
• Cannot be trapped to EL2/3
– Interrupt type
• FIQ/vFIQ, high priority interrupt
• IRQ/vIRQ, low priority interrupt
• Serror/vSError, erroneous memory accesses (Ex. Bus error)
– No non-maskable interrupt until Armv8.8-A/Armv9.3-A
• QEMU still does not support this
• No need to worry about this for now
24
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Interrupt handling
◼ Injecting interrupts
– Via system registers
• We can write
– Set HCR_VF in HCR_EL2 to make vFIQ pending
– Set HCR_VI in HCR_EL2 to make vIRQ pending
– Set HCR_VSE in HCR_EL2 to make vSError pending
• Then, need to emulate an interrupt controller
– Via GIC (our focus)
25
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Interrupt handling
◼ Overview
26
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
GIC
Hypervisor
- Save context
- Identify interrupt
- Forward interrupt
- Return to the guest
Guest
Virtual interrupt
Inject virtual interrupt IMO=1 FMO=1
Interrupt handling
◼ BitVisor GIC initialization
– Set HCR_FMO, HCR_IMO, and HCR_AMO in HCR_EL2
– Set ICH_HCR_EN in ICH_HCR_EL2
– Configure ICH_VMCR_EL2 to initialize vGIC states
– Need to change we acknowledge an interrupt
• Make writing EOI be only dropping priority
• The guest ends the interrupt on its interrupt handling
◼ Interrupt Handling
– Read ICC_IAR0/1_EL1 to get intid and acknowledge the
interrupt
– Scheduling and do tasks
– Write ICC_EOIR0/1_EL1 with intid to drop priority
– Inject the interrupt to the guest
27
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Interrupt handling
◼ Injecting interrupts with GIC
– Each core has a set of List Register (LR) for injecting virtual
interrupts
• ICH_LR0 – (max) ICH_LR15
– The max number is platform specific
– To inject a virtual interrupt, simply write to one of empty
ICH_LR register
– The virtual interrupt gets trapped by the guest once we
return to the guest
28
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Interrupt handling
◼ Injecting interrupts with GIC
29
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
static void
try_inject_vint (u64 intid, u64 rpr, uint group)
{
…
/* Currently vintid = pintid */
g = !!group;
val = ICH_LR_VINTID (intid) | ICH_LR_PINTID (intid) |
ICH_LR_PRIORITY (rpr) | ICH_LR_GROUP (g) | ICH_LR_HW |
ICH_LR_STATE (LR_STATE_PENDING);
enqueue_lr (currentcpu, val);
elrsr = mrs (ICH_ELRSR_EL2);
for (i = 0; elrsr != 0 && i < currentcpu->max_int_slot; i++) {
empty = !!(elrsr & 0x1);
if (empty) {
if (dequeue_lr (currentcpu, &lr_val))
set_lr (i, lr_val);
else
break;
}
elrsr >>= 1;
}
}
Interrupt handling
◼ Injecting interrupts with GIC
30
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
static void
set_lr (uint lr_idx, u64 val)
{
switch (lr_idx) {
case 0:
msr (ICH_LR0_EL2, val);
break;
case 1:
msr (ICH_LR1_EL2, val);
break;
case 2:
msr (ICH_LR2_EL2, val);
break;
case 3:
msr (ICH_LR3_EL2, val);
break;
…
default:
panic ("lr_idx out of bound");
break;
}
}
MMIO handling
◼ Stage-1 and Stage-2 memory translation
– Stage-1 is for translating a virtual address (VA) to a physical
address (PA) or an intermediate physical address (IPA)
• For EL1, IPA is PA if stage-2 translation is not enabled
– Stage-2 is for translating the IPA to an actual PA
• Need to set up
– VTTBR_EL2 for stage-2 page tables
– VTCR_EL2 for stage-2 translation control
• In our case, IPA and PA are identity mapped
◼ In general, MMIO handling is be done through stage-
2 translation fault
– Not limited to MMIO address but any PA
31
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
MMIO handling
◼ Implementation concept
– During initialization, we create identity mapping for stage-2
address translation
• Does not need too many page tables as we can utilize 1GB
block mapping
– mmio_register() provides PA and size we want to monitor
• We unmap the address from stage-2 translation
• From MMU implementation point of view, we break down the
big mapping block into smaller blocks a hole of the address
– Exception handling is triggered once the guest accesses
monitored addresses
32
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
MMIO handling
◼ Implementation concept
– We need to emulate those accesses
• Get instruction address from ELR_EL2 register
• Get fault address from FAR_EL2 register
• Decode the instruction to get source/destination registers
• Get all necessary info together and pass them to a handler
– Once we finish access handling
• Skip the instruction by adding 4 to ELR_EL2
– An instruction is 4 bytes
• Update guest registers in saved context if necessary
33
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Multiple core support
◼ On platform that support PSCI, multiple core support
is straightforward
◼ When the guest wants to start a secondary core
– It issues SMC instruction
– The call follows Secure Monitor Calling Convention (SMCC)
• smc #0
• x0: Function ID, x1~: Parameters
◼ BitVisor simply needs to intercept SMC instructions
– Set HCR_TSC bit in HCR_EL2 register
– Check for CPU_ON Function ID
– Save entry_address and context_id information
• entry_address is physical address
• context_id appears at x0 on secondary core entry
34
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Multiple core support
◼ BitVisor then issues SMC on behalf of the guest
– Copy guest’s CPU_ON command
– Replace entry_address and context_id with our values
◼ Secondary core entry
– Set up MMU and stack
– Jump to designated virtual address to continue per core
initialization
– Finally, we start the guest at its entry_address with its
context_id at x0
35
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Current limitation
◼ No Aarch32 for now
– For simplicity
◼ No Suspend/Resume for now
– Going to implement later
– PSCI SMC handling
◼ No EL0 debug shell through hypercall
– hvc instruction is not available at EL0
– Need to find an alternative
• Virtual serial?
36
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Current limitation
◼ No 52-bit address support for now
– Need either 64KB page size or need Armv8.7
– BitVisor itself does not need 52-bit address
– To allow guest OS to use this, we need either
• 64KB page size
– Quite a waste of memory for our use cases
• Armv8.7 FEAT_LPA2 for 4KB and 16KB page size
– 4KB page size needs 5-level page table
– See no real hardware that supports this yet
– Not the current priority
37
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
Ongoing tasks
◼ Integrating Aarch64 implementation with the
mainstream
– Finalizing interfaces for platform specific implementation
– Cross-compiling implementation
38
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
QEMU patches
◼ e1000e: Fix possible interrupt loss when using MSI
– There was a logic error resulting in delaying MSI indefinitely
◼ target/arm: honor HCR_E2H and HCR_TGE in
arm_excp_unmasked()
– Found this problem when trying to run a process in EL0 with
interrupt masked
• This is valid according to the architecture manual
• It was impossible before this patch
◼ target/arm: Honor HCR_E2H and HCR_TGE in
ats_write64()
– AT instruction implementation forgot to honor HCR_E2H and
HCR_TGE
– Found this because there was a weird memory error panic
39
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
QEMU patches
◼ e1000e: Fix possible interrupt loss when using MSI
– https://github.com/qemu/qemu/commit/dd0ef128669c29734a
197ca9195e7ab64e20ba2c
◼ target/arm: honor HCR_E2H and HCR_TGE in
arm_excp_unmasked()
– https://github.com/qemu/qemu/commit/c939a7c7b93ee44a4
963fabe81454e1f956ecd4b
◼ target/arm: Honor HCR_E2H and HCR_TGE in
ats_write64()
– https://github.com/qemu/qemu/commit/638d5dbd78ea81c94
3959e2f2c65c109e5278a78
40
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
DEMO
41
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
THANK YOU
42
Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.

More Related Content

What's hot

仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング
Takuya ASADA
 
冬のLock free祭り safe
冬のLock free祭り safe冬のLock free祭り safe
冬のLock free祭り safe
Kumazaki Hiroki
 

What's hot (20)

ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくい
 
組み込みLinuxでのGolangのススメ
組み込みLinuxでのGolangのススメ組み込みLinuxでのGolangのススメ
組み込みLinuxでのGolangのススメ
 
仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング仮想化環境におけるパケットフォワーディング
仮想化環境におけるパケットフォワーディング
 
入門!Jenkins
入門!Jenkins入門!Jenkins
入門!Jenkins
 
Pwning in c++ (basic)
Pwning in c++ (basic)Pwning in c++ (basic)
Pwning in c++ (basic)
 
Glibc malloc internal
Glibc malloc internalGlibc malloc internal
Glibc malloc internal
 
エキスパートGo
エキスパートGoエキスパートGo
エキスパートGo
 
Linuxのsemaphoreとmutexを見る 
Linuxのsemaphoreとmutexを見る Linuxのsemaphoreとmutexを見る 
Linuxのsemaphoreとmutexを見る 
 
冬のLock free祭り safe
冬のLock free祭り safe冬のLock free祭り safe
冬のLock free祭り safe
 
DeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows KernelDeathNote of Microsoft Windows Kernel
DeathNote of Microsoft Windows Kernel
 
(旧版) オープンソースライセンスの基礎と実務
(旧版) オープンソースライセンスの基礎と実務(旧版) オープンソースライセンスの基礎と実務
(旧版) オープンソースライセンスの基礎と実務
 
リクルートライフスタイルの考える ストリームデータの活かし方(Hadoop Spark Conference2016)
リクルートライフスタイルの考えるストリームデータの活かし方(Hadoop Spark Conference2016)リクルートライフスタイルの考えるストリームデータの活かし方(Hadoop Spark Conference2016)
リクルートライフスタイルの考える ストリームデータの活かし方(Hadoop Spark Conference2016)
 
GCを発生させないJVMとコーディングスタイル
GCを発生させないJVMとコーディングスタイルGCを発生させないJVMとコーディングスタイル
GCを発生させないJVMとコーディングスタイル
 
My sqlで2億件のシリアルデータと格闘した話
My sqlで2億件のシリアルデータと格闘した話My sqlで2億件のシリアルデータと格闘した話
My sqlで2億件のシリアルデータと格闘した話
 
Javaメモリ勉強会
Javaメモリ勉強会Javaメモリ勉強会
Javaメモリ勉強会
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Basic of virtual memory of Linux
Basic of virtual memory of LinuxBasic of virtual memory of Linux
Basic of virtual memory of Linux
 
ドキュメントを作りたくなってしまう魔法のツールSphinx
ドキュメントを作りたくなってしまう魔法のツールSphinxドキュメントを作りたくなってしまう魔法のツールSphinx
ドキュメントを作りたくなってしまう魔法のツールSphinx
 
Post-quantum zk-SNARKs on Hyperledger Fabric​
Post-quantum zk-SNARKs on Hyperledger Fabric​Post-quantum zk-SNARKs on Hyperledger Fabric​
Post-quantum zk-SNARKs on Hyperledger Fabric​
 
Pythonによる黒魔術入門
Pythonによる黒魔術入門Pythonによる黒魔術入門
Pythonによる黒魔術入門
 

Similar to BitVisor Summit 11「2. BitVisor on Aarch64」

LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
Linaro
 
D1 t2 jonathan brossard - breaking virtualization by switching to virtual 8...
D1 t2   jonathan brossard - breaking virtualization by switching to virtual 8...D1 t2   jonathan brossard - breaking virtualization by switching to virtual 8...
D1 t2 jonathan brossard - breaking virtualization by switching to virtual 8...
kbour23
 
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
ssuser866937
 
Exploit access root to kernel 2.6.32 2.6.36 privilege escalation exploit
Exploit access root to kernel 2.6.32 2.6.36   privilege escalation exploitExploit access root to kernel 2.6.32 2.6.36   privilege escalation exploit
Exploit access root to kernel 2.6.32 2.6.36 privilege escalation exploit
Carlos Eduardo
 

Similar to BitVisor Summit 11「2. BitVisor on Aarch64」 (20)

I2C Drivers
I2C DriversI2C Drivers
I2C Drivers
 
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
XPDS16: Xen Live Patching - Updating Xen Without Rebooting - Konrad Wilk, Ora...
 
Preparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple ArchitecturesPreparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple Architectures
 
Armboot process zeelogic
Armboot process zeelogicArmboot process zeelogic
Armboot process zeelogic
 
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilitiesBlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
 
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from HellLAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
LAS16-500: The Rise and Fall of Assembler and the VGIC from Hell
 
D1 t2 jonathan brossard - breaking virtualization by switching to virtual 8...
D1 t2   jonathan brossard - breaking virtualization by switching to virtual 8...D1 t2   jonathan brossard - breaking virtualization by switching to virtual 8...
D1 t2 jonathan brossard - breaking virtualization by switching to virtual 8...
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey Kovalenko
 
Writing Metasploit Plugins
Writing Metasploit PluginsWriting Metasploit Plugins
Writing Metasploit Plugins
 
0x01 - Breaking into Linux VMs for Fun and Profit.pdf
0x01 - Breaking into Linux VMs for Fun and Profit.pdf0x01 - Breaking into Linux VMs for Fun and Profit.pdf
0x01 - Breaking into Linux VMs for Fun and Profit.pdf
 
C programming session10
C programming  session10C programming  session10
C programming session10
 
Embedded C programming session10
Embedded C programming  session10Embedded C programming  session10
Embedded C programming session10
 
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
2022-Cauldron-If-Conversion-for-a-Partially-Predicated-VLIW-Architecture.pdf
 
SR-IOV Introduce
SR-IOV IntroduceSR-IOV Introduce
SR-IOV Introduce
 
Malicious Hypervisor - Virtualization in Shellcodes by Adhokshaj Mishra
Malicious Hypervisor - Virtualization in Shellcodes by Adhokshaj MishraMalicious Hypervisor - Virtualization in Shellcodes by Adhokshaj Mishra
Malicious Hypervisor - Virtualization in Shellcodes by Adhokshaj Mishra
 
Advanced Root Cause Analysis
Advanced Root Cause AnalysisAdvanced Root Cause Analysis
Advanced Root Cause Analysis
 
Analisis_avanzado_vmware
Analisis_avanzado_vmwareAnalisis_avanzado_vmware
Analisis_avanzado_vmware
 
Sierraware ARM hypervisor
Sierraware ARM hypervisor Sierraware ARM hypervisor
Sierraware ARM hypervisor
 
Exploit access root to kernel 2.6.32 2.6.36 privilege escalation exploit
Exploit access root to kernel 2.6.32 2.6.36   privilege escalation exploitExploit access root to kernel 2.6.32 2.6.36   privilege escalation exploit
Exploit access root to kernel 2.6.32 2.6.36 privilege escalation exploit
 
I2c drivers
I2c driversI2c drivers
I2c drivers
 

More from BitVisor

More from BitVisor (10)

BitVisor Summit 11「1. BitVisor 2022年の主な変更点」
BitVisor Summit 11「1. BitVisor 2022年の主な変更点」BitVisor Summit 11「1. BitVisor 2022年の主な変更点」
BitVisor Summit 11「1. BitVisor 2022年の主な変更点」
 
BitVisor Summit 10「1. BitVisor 2021年の主な変更点」
BitVisor Summit 10「1. BitVisor 2021年の主な変更点」 BitVisor Summit 10「1. BitVisor 2021年の主な変更点」
BitVisor Summit 10「1. BitVisor 2021年の主な変更点」
 
BitVisor Summit 9「2. BitVisor 2020年の主な変更点」
BitVisor Summit 9「2. BitVisor 2020年の主な変更点」 BitVisor Summit 9「2. BitVisor 2020年の主な変更点」
BitVisor Summit 9「2. BitVisor 2020年の主な変更点」
 
BitVisor Summit 8「3. AQC107 Driver and Changes coming to network API」
BitVisor Summit 8「3. AQC107 Driver and Changes coming to network API」BitVisor Summit 8「3. AQC107 Driver and Changes coming to network API」
BitVisor Summit 8「3. AQC107 Driver and Changes coming to network API」
 
BitVisor Summit 8「2. BitVisor 2019年の主な変更点」
BitVisor Summit 8「2. BitVisor 2019年の主な変更点」 BitVisor Summit 8「2. BitVisor 2019年の主な変更点」
BitVisor Summit 8「2. BitVisor 2019年の主な変更点」
 
BitVisor Summit 7「8. ベアメタルクラウドにおけるハードウェア保護に関する研究 & Advent Calendar について」
BitVisor Summit 7「8. ベアメタルクラウドにおけるハードウェア保護に関する研究 & Advent Calendar について」BitVisor Summit 7「8. ベアメタルクラウドにおけるハードウェア保護に関する研究 & Advent Calendar について」
BitVisor Summit 7「8. ベアメタルクラウドにおけるハードウェア保護に関する研究 & Advent Calendar について」
 
BitVisor Summit 7「5. CTFVisor: BitVisorによるCTF作問・出題支援」
BitVisor Summit 7「5. CTFVisor: BitVisorによるCTF作問・出題支援」BitVisor Summit 7「5. CTFVisor: BitVisorによるCTF作問・出題支援」
BitVisor Summit 7「5. CTFVisor: BitVisorによるCTF作問・出題支援」
 
BitVisor Summit 7「4. BitVisorによるOSの見かけ上10倍速実行」
BitVisor Summit 7「4. BitVisorによるOSの見かけ上10倍速実行」BitVisor Summit 7「4. BitVisorによるOSの見かけ上10倍速実行」
BitVisor Summit 7「4. BitVisorによるOSの見かけ上10倍速実行」
 
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」
 
BitVisor Summit 7「2. BitVisor 2018年の主な変更点」
BitVisor Summit 7「2. BitVisor 2018年の主な変更点」BitVisor Summit 7「2. BitVisor 2018年の主な変更点」
BitVisor Summit 7「2. BitVisor 2018年の主な変更点」
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

BitVisor Summit 11「2. BitVisor on Aarch64」

  • 1. BitVisor on Aarch64 2022/12/07 @ BitVisor Summit 11 Ake Koomsin
  • 2. Agenda ◼ Current requirements ◼ How VMM works on Aarch64 ◼ BitVisor Aarch64 initialization ◼ Interrupt handling ◼ MMIO handling ◼ Multiple core support ◼ Current limitation ◼ Ongoing tasks ◼ QEMU bugs we found ◼ Demo 1 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 3. Current requirements ◼ Armv8.1 or later – Need Virtualization Host Extension (VHE) for process implementation ◼ Generic Interrupt Controller v3 (GICv3) – Guest interrupt injection ◼ EL3 and Power State Coordination Interface (PSCI) – Firmware running in EL3 – For secondary core start-up ◼ UEFI environment and ACPI – BitVisor currently relies on them 2 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 4. How VMM works on Aarch64 3 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. Firmware Hypervisor OS0 OS1 P0 P1 P2 P3 EL0 EL1 EL2 EL3 SMC HVC SVC
  • 5. How VMM works on Aarch64 4 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. Hypervisor OS0 P0 P1 Host OS/Hypervisor OS0 P0 P1 P2 Standard With VHE EL0 EL1 EL2
  • 6. How VMM works on Aarch64 ◼ Main system registers related to virtualization – HCR_EL2 • Enable/Disable hypervisor • Hypervisor behavior • Register trapping – VTTBR_EL2 • Stage-2 translation page table – VTCR_EL2 • Stage-2 translation control – VMPIDR_EL2 • Multiple Processor ID MPIDR_EL1 value read by the guest – VPIDR_EL2 • Processor ID PIDR_EL1 value read by the guest 5 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 7. How VMM works on Aarch64 ◼ Page table on Aarch64 basic – Typically, an OS sets up TTBR0_EL1 for a process’s page table and TTBR1_EL1 for kernel page table • Addresses with 0xF… prefix are mapped in TTBR1_EL1 – Normally, we can access only TTBR0_EL2 only on EL2 – With VHE feature, we can make EL2 behavior as same as EL1 • Can access to TTBR1_EL2 • System registers related to translation change their structures – Ex. TCR_EL2 bit definition becomes like TCR_EL1 6 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 8. How VMM works on Aarch64 ◼ Guest OS returns to EL2 from time to time through exceptions – Interrupt • If the hypervisor chooses to route interrupts to EL2 – Trapping • Register accesses • Intermediate Physical Address translation fault 7 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 9. How VMM works on Aarch64 ◼ When an exception occurs – The entry point is one of locations on the vector table pointed by VBAR_EL2 • Depending on the current running EL/exception type/mode – The first thing to do is saving the current context • General registers x0-x30 • Floating registers if necessary • Other system registers if necessary – In BitVisor case (To switch between our processes and the guest) » HCR_EL2 » ELR_EL2, SPSR_EL2, FAR_EL2, ESR_EL2 » SP_EL0, TPIDR_EL0 8 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 10. How VMM works on Aarch64 ◼ Handling an exception – Interrupt (Asynchronous) • Interrupt controller handler – Scheduling – Forwarding to the guest – Hand over to the appropriate device driver – Trapping (Synchronous) • Read ESR_EL2 for exception syndrome • Handle them accordingly ◼ After handling the exception, return to the guest – Restore the entry context – eret instruction to return to either EL0 or EL1 depending on SPSR_EL2 9 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 11. Early Aarch64 boot ◼ Relocation – To be able to run code at any address, we need a table structure that tell us where and what to adjust to get final addresses • Usually for global variables – In the linker file, we have a special section for this table named rela.dyn 10 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. … .rela.dyn : AT (phys + (_rela_start - head)) { _rela_start = .; *(.rela) *(.rela.*) _rela_end = .; } …
  • 12. Early Aarch64 boot ◼ Relocation – It contains an array of the following structure – For BitVisor, we only deal with R_AARCH64_RELATIVE operation currently 11 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. struct rela_entry { u64 r_offset; /* Location to apply relocation */ u64 r_info; /* Determine operation to perform */ u64 r_addend; };
  • 13. Early Aarch64 boot ◼ Relocation – Resolving R_AARCH64_RELATIVE type with Delta(S) + Addend operation according to Aa64elf document • S is the static address of a symbol • Delta(S) means find the difference between the static link address of S and the execution address of S – In other words • If head_linktime_addr is 0, diff is head_runtime_addr – BitVisor head_linktime_addr is currently 0 12 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. diff = head_runtime_addr – head_linktime_addr; *(u64 *)(diff + r_offset) = diff + r_addend;
  • 14. Early Aarch64 boot ◼ Relocation 13 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. int SECTION_ENTRY_TEXT apply_reloc (phys_t base, struct rela_entry *start, struct rela_entry *end) { struct rela_entry *entries = start; u64 *target; unsigned int i, n_entries = end - start; for (i = 0; i < n_entries; i++) { switch (entries[i].r_info) { case R_AARCH64_NONE: break; /* Do nothing */ case R_AARCH64_RELATIVE: /* * Static head address is 0. That means Delta(S) is * the runtime address. */ target = (u64 *)(base + entries[i].r_offset); *target = base + entries[i].r_addend; break; default: /* Current deal with only R_AARCH64_RELATIVE */ return -1; } } return 0; }
  • 15. Early Aarch64 boot ◼ Cross-compiling UEFI loader – Mingw currently has no toolchain for Aarch64 – Switch to clang for cross-compiling instead – Most of code for UEFI loader remains the same ◼ UEFI loader and bitvisor.elf relation – UEFI loader looks for bitvisor.elf – It then loads the first 64KB portion bitvisor.elf for bootstrapping • Early initialization + loading the rest of BitVisor into a memory • .entry section of BitVisor must be within the first 64KB – Once bootstrapping is done, we can jump to the newly allocated BitVisor, and start the remaining initialization 14 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 16. Early Aarch64 boot ◼ Entering BitVisor code – Firstly, save context at entry 16 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. entry: … adrp x9, _uefi_entry_ctx add x9, x9, :lo12:_uefi_entry_ctx stp x19, x20, [x9], #16 … stp x29, x30, [x9], #16 … mov x10, sp … mrs x10, TTBR0_EL2 str x10, [x9], #8 mrs x10, VBAR_EL2 str x10, [x9], #8 …
  • 17. Early Aarch64 boot ◼ Entering BitVisor code – Apply relocation, need to correct addresses listed in rela.* section 17 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. entry: … adrp x0, head add x0, x0, :lo12:head adrp x1, _rela_start add x1, x1, :lo12:_rela_start adrp x2, _rela_end add x2, x2, :lo12:_rela_end bl apply_reloc64k cmp x0, 0 bne .L1 /* Return if apply_reloc64() fails */ …
  • 18. Early Aarch64 boot ◼ Entering BitVisor code – Then, enter uefi_entry() • Save some UEFI routine addresses • Load entire BitVisor to a new allocated location • Setup virtual address – Enable HCR_E2H so that TTBR1_EL2 becomes effective – Setup TTBR1_EL2 table for hypervisor memory mapping – Setup MAIR_EL2, TCR_EL2, and SCTLR_EL2 – 0xFFFF000000000000 is our current virtual base address • Return virtual address base to the assembly code so that we can jump to the new location with virtual address 18 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 19. Early Aarch64 boot ◼ Entering BitVisor code – Jump to asm_bitvisor_entry() – Apply relocation again with the new virtual address base + Additional setup for C code entry 19 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. /* * x0 now contains new virtual memory base address. * Next, calculate the position of asm_bitvisor_entry() * relative x0. */ adrp x21, head /* Old head */ add x21, x21, :lo12:head adrp x11, bitvisor_entry add x11, x11, :lo12:asm_bitvisor_entry sub x11, x11, x21 add x11, x11, x0 br x11 /* Jump to newly located asm_bitvisor_entry */
  • 20. Early Aarch64 boot ◼ Before calling vmm_main() – Initialize exception handling 20 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. void bitvisor_entry (void) { uefi_booted = true; /* Save this for secondary core start */ mair_host = mrs (MAIR_EL2); tcr_host = mrs (TCR_EL2); sctlr_host = mrs (SCTLR_EL2); serial_init (); disable_interrupt (); init_default_exception_handler (); init_exception (); vmm_main (); }
  • 21. BitVisor Aarch64 initialization ◼ The initialization flow is roughly as same as current BitVisor – Mainly done through call_initfunc() – There are some Aarch64 specific initialization to take care • MMU/memory mapping/MMIO handling, GIC initialization, etc ◼ Need some adjustment of the original code – Separate platform specific code into separate files and create interfaces for platform independent code to call them • Ex. in the process implementation – x86 assembly in call_msgfunc0() is replaced by process_exec() – The actual implementation of process_exec() is in either x86/process.c or aarch64/process.c 21 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 22. BitVisor Aarch64 initialization ◼ Entering guest 22 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. void vm_start (void) { u64 orig_tcr, val; … /* Setting up EL1 environment */ msr (SP_EL1, _uefi_entry_ctx.sp); msr (ESR_EL12, _uefi_entry_ctx.esr_el2); msr (FAR_EL12, _uefi_entry_ctx.far_el2); msr (MAIR_EL12, _uefi_entry_ctx.mair_el2); … msr (TCR_EL12, val); msr (TPIDR_EL1, _uefi_entry_ctx.tpidr_el2); msr (TTBR0_EL12, _uefi_entry_ctx.ttbr0_el2); msr (VBAR_EL12, _uefi_entry_ctx.vbar_el2); msr (CPACR_EL12, CPACR_ZEN (3) | CPACR_FPEN (3) | CPACR_SMEN (3)); val = (_uefi_entry_ctx.spsr_el2 & ~0xF) | 0x5; /* E1h */ msr (SPSR_EL2, val); msr (ELR_EL2, _uefi_entry_ctx.x30); msr (CPTR_EL2, CPTR_FLAGS); msr (HCR_EL2, HCR_FLAGS); start_guest (&_uefi_entry_ctx); }
  • 23. BitVisor Aarch64 initialization ◼ Entering guest 23 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. start_guest: ldp x19, x20, [x0], #16 ldp x21, x22, [x0], #16 ldp x23, x24, [x0], #16 ldp x25, x26, [x0], #16 ldp x27, x28, [x0], #16 ldp x29, x30, [x0], #16 /* Clear all caller-saved register */ eor x15, x15, x15 eor x14, x14, x14 eor x13, x13, x13 … mov x0, #1 /* Return 1 as success upon entry guest */ dsb ish isb eret /* Prevent speculative execution */ dsb nsh isb
  • 24. Interrupt handling ◼ Physical interrupt and virtual interrupt – The physical one is from an actual device • Guest can receive physical interrupts if the hypervisor chooses not to handle interrupts – The virtual one is the one that the hypervisor injects to the guest • Cannot be trapped to EL2/3 – Interrupt type • FIQ/vFIQ, high priority interrupt • IRQ/vIRQ, low priority interrupt • Serror/vSError, erroneous memory accesses (Ex. Bus error) – No non-maskable interrupt until Armv8.8-A/Armv9.3-A • QEMU still does not support this • No need to worry about this for now 24 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 25. Interrupt handling ◼ Injecting interrupts – Via system registers • We can write – Set HCR_VF in HCR_EL2 to make vFIQ pending – Set HCR_VI in HCR_EL2 to make vIRQ pending – Set HCR_VSE in HCR_EL2 to make vSError pending • Then, need to emulate an interrupt controller – Via GIC (our focus) 25 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 26. Interrupt handling ◼ Overview 26 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. GIC Hypervisor - Save context - Identify interrupt - Forward interrupt - Return to the guest Guest Virtual interrupt Inject virtual interrupt IMO=1 FMO=1
  • 27. Interrupt handling ◼ BitVisor GIC initialization – Set HCR_FMO, HCR_IMO, and HCR_AMO in HCR_EL2 – Set ICH_HCR_EN in ICH_HCR_EL2 – Configure ICH_VMCR_EL2 to initialize vGIC states – Need to change we acknowledge an interrupt • Make writing EOI be only dropping priority • The guest ends the interrupt on its interrupt handling ◼ Interrupt Handling – Read ICC_IAR0/1_EL1 to get intid and acknowledge the interrupt – Scheduling and do tasks – Write ICC_EOIR0/1_EL1 with intid to drop priority – Inject the interrupt to the guest 27 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 28. Interrupt handling ◼ Injecting interrupts with GIC – Each core has a set of List Register (LR) for injecting virtual interrupts • ICH_LR0 – (max) ICH_LR15 – The max number is platform specific – To inject a virtual interrupt, simply write to one of empty ICH_LR register – The virtual interrupt gets trapped by the guest once we return to the guest 28 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 29. Interrupt handling ◼ Injecting interrupts with GIC 29 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. static void try_inject_vint (u64 intid, u64 rpr, uint group) { … /* Currently vintid = pintid */ g = !!group; val = ICH_LR_VINTID (intid) | ICH_LR_PINTID (intid) | ICH_LR_PRIORITY (rpr) | ICH_LR_GROUP (g) | ICH_LR_HW | ICH_LR_STATE (LR_STATE_PENDING); enqueue_lr (currentcpu, val); elrsr = mrs (ICH_ELRSR_EL2); for (i = 0; elrsr != 0 && i < currentcpu->max_int_slot; i++) { empty = !!(elrsr & 0x1); if (empty) { if (dequeue_lr (currentcpu, &lr_val)) set_lr (i, lr_val); else break; } elrsr >>= 1; } }
  • 30. Interrupt handling ◼ Injecting interrupts with GIC 30 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved. static void set_lr (uint lr_idx, u64 val) { switch (lr_idx) { case 0: msr (ICH_LR0_EL2, val); break; case 1: msr (ICH_LR1_EL2, val); break; case 2: msr (ICH_LR2_EL2, val); break; case 3: msr (ICH_LR3_EL2, val); break; … default: panic ("lr_idx out of bound"); break; } }
  • 31. MMIO handling ◼ Stage-1 and Stage-2 memory translation – Stage-1 is for translating a virtual address (VA) to a physical address (PA) or an intermediate physical address (IPA) • For EL1, IPA is PA if stage-2 translation is not enabled – Stage-2 is for translating the IPA to an actual PA • Need to set up – VTTBR_EL2 for stage-2 page tables – VTCR_EL2 for stage-2 translation control • In our case, IPA and PA are identity mapped ◼ In general, MMIO handling is be done through stage- 2 translation fault – Not limited to MMIO address but any PA 31 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 32. MMIO handling ◼ Implementation concept – During initialization, we create identity mapping for stage-2 address translation • Does not need too many page tables as we can utilize 1GB block mapping – mmio_register() provides PA and size we want to monitor • We unmap the address from stage-2 translation • From MMU implementation point of view, we break down the big mapping block into smaller blocks a hole of the address – Exception handling is triggered once the guest accesses monitored addresses 32 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 33. MMIO handling ◼ Implementation concept – We need to emulate those accesses • Get instruction address from ELR_EL2 register • Get fault address from FAR_EL2 register • Decode the instruction to get source/destination registers • Get all necessary info together and pass them to a handler – Once we finish access handling • Skip the instruction by adding 4 to ELR_EL2 – An instruction is 4 bytes • Update guest registers in saved context if necessary 33 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 34. Multiple core support ◼ On platform that support PSCI, multiple core support is straightforward ◼ When the guest wants to start a secondary core – It issues SMC instruction – The call follows Secure Monitor Calling Convention (SMCC) • smc #0 • x0: Function ID, x1~: Parameters ◼ BitVisor simply needs to intercept SMC instructions – Set HCR_TSC bit in HCR_EL2 register – Check for CPU_ON Function ID – Save entry_address and context_id information • entry_address is physical address • context_id appears at x0 on secondary core entry 34 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 35. Multiple core support ◼ BitVisor then issues SMC on behalf of the guest – Copy guest’s CPU_ON command – Replace entry_address and context_id with our values ◼ Secondary core entry – Set up MMU and stack – Jump to designated virtual address to continue per core initialization – Finally, we start the guest at its entry_address with its context_id at x0 35 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 36. Current limitation ◼ No Aarch32 for now – For simplicity ◼ No Suspend/Resume for now – Going to implement later – PSCI SMC handling ◼ No EL0 debug shell through hypercall – hvc instruction is not available at EL0 – Need to find an alternative • Virtual serial? 36 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 37. Current limitation ◼ No 52-bit address support for now – Need either 64KB page size or need Armv8.7 – BitVisor itself does not need 52-bit address – To allow guest OS to use this, we need either • 64KB page size – Quite a waste of memory for our use cases • Armv8.7 FEAT_LPA2 for 4KB and 16KB page size – 4KB page size needs 5-level page table – See no real hardware that supports this yet – Not the current priority 37 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 38. Ongoing tasks ◼ Integrating Aarch64 implementation with the mainstream – Finalizing interfaces for platform specific implementation – Cross-compiling implementation 38 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 39. QEMU patches ◼ e1000e: Fix possible interrupt loss when using MSI – There was a logic error resulting in delaying MSI indefinitely ◼ target/arm: honor HCR_E2H and HCR_TGE in arm_excp_unmasked() – Found this problem when trying to run a process in EL0 with interrupt masked • This is valid according to the architecture manual • It was impossible before this patch ◼ target/arm: Honor HCR_E2H and HCR_TGE in ats_write64() – AT instruction implementation forgot to honor HCR_E2H and HCR_TGE – Found this because there was a weird memory error panic 39 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 40. QEMU patches ◼ e1000e: Fix possible interrupt loss when using MSI – https://github.com/qemu/qemu/commit/dd0ef128669c29734a 197ca9195e7ab64e20ba2c ◼ target/arm: honor HCR_E2H and HCR_TGE in arm_excp_unmasked() – https://github.com/qemu/qemu/commit/c939a7c7b93ee44a4 963fabe81454e1f956ecd4b ◼ target/arm: Honor HCR_E2H and HCR_TGE in ats_write64() – https://github.com/qemu/qemu/commit/638d5dbd78ea81c94 3959e2f2c65c109e5278a78 40 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 41. DEMO 41 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.
  • 42. THANK YOU 42 Copyright© 2022 IGEL Co., Ltd. All Rights Reserved.