Agenda:
This talk will provide an in-depth review of the usage of canaries in the kernel and the interaction with userspace, as well as a short review of canaries and why they are needed in general so don't be afraid if you never heard of them.
Speaker:
Gil Yankovitch, CEO, Chief Security Researcher from Nyx Security Solutions
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
The Silence of the Canaries
1. The Silence of the Canaries
Gili Yankovitch, Nyx Software Security Solutions
2. Prerequisites
● A functioning brain
● A knowledge in the x86-x64 architectures
● Process loading
● Security attacks
● Operating system basics
3. Calling Convention
● foo() has something to tell bar()
● Presenting, our stack
● And the Assembly for the code
i = 42
RetAddr = 0x080483b6
EBP
Locals
Thread Stack
Lower Addr (0x00..)
Higher Addr (0xFF..)
4. Buffer Overflow
● Spot the vulnerability
argc, argv...
RetAddr
EBP
Locals
Thread Stack
Lower Addr (0x00..)
Higher Addr (0xFF..)
● What happens now?
RetAddr
EBP
Locals
“In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer,
overruns the buffer’s boundary and overwrites adjacent memory locations.”
5. Canaries
● A brief historical context
● Random value
○ Must be random for an attacker won’t be able to guess it.
● Stored before protected data
○ “Before” is relative to direction of overflow.
● Should be changed as much as possible
○ Heavy operation depending on the number of places the canaries are placed at.
7. What is %gs?
● Segment register
○ Once used to partition the memory
○ Memory accesses were SEGMENT:OFFSET
○ i.e. %cs:0x0040 or %ds:0x0040 results different memory regions.
● Now used for special data storage
● %gs segment register used differently across architectures
● Canary values are stored
○ %gs:20 for 32 bit
8. Random
● execve() loads binary
● Transfers Auxiliary Vector to
usermode
○ binfmt_elf.c:load_elf_binary()
-> create_elf_tables()
● “Good” random numbers
9. ld.so init
● Every ELF process has an “interpreter”
● Its path is named in the ELF header
● ELF binary interpreter is the dynamic loader
readelf -a <elf_binary>
...
● Initializing internal members at startup
● The described ld.so is GlibC
○ Too much code complexity
○ Very widespread
10. Using the random
● During init phase (dl_main), calls security_init
● Initializes TLS (Thread Local Storage)
○ in x86_64 stored in %fs segment register
Offset
0
8
16
24
28
32
40
12. Kernel canaries
● Compiling with CONFIG_CC_STACKPROTECTOR
○ General -> Stack Protector buffer overflow detection
○ Exists for quite some time in Linux
○ Even 2.6.32.68 in kernel.org supports it.
● When rebuilding, needs a clean build
○ Adds snippets for every function prologue and epilogue
● Adds a performance overhead
○ Sorry Linus :(
13. Kernel canaries
● Let’s say there’s a stack based BOF vulnerability in a system call
● Kernel compiled with CC_STACKPROTECTOR
● However, canary value stored at %gs.
● Malicious program can read value and bypass kernel protection!
14. Kernel canaries
● We call a system call
● From Intel x86_64 Instruction set
● %gs holds percpu kernel data structures.
○ So we have a different canary for the Kernel.
18. 32 Bit canary placement
● In x86 32 bit, Kernel uses %gs only for canaries. Setup GDT accordingly
● Reading stored canary from boot_init_stack_canary
● Reading GDT table
● Picking the GDT entry for stack canaries
● Writing to the specific GDT entry in its wierd encoding
● Flushing the GDT to the register
19. Kernel canary per process
● Not enough a single canary for kernel
● A kernel canary per user process
○ During fork() in dup_task_struct()
● Randomizes a new canary for Kernel
20. You get a canary, and you get a canary, and...
● We want a different kernel canary for
every process
● Need to swap the %gs segment
register in context switch
● Load per-process kernel canary
explicitly after task switch
● Kernel canary must be set explicitly
so stack unwinding will succeed after
context swapped in __switch_to()
21. LAZY_GS
● The top comment at
○ arch/x86/include/asm/stackprotector.h
22. LAZY_GS
● Returning to context switch.
○ This is __switch_to in
○ arch/x86/kernel/process_32.c
○ 64 bit isn’t lazy and saves the segment
23. 32 bit System Call
● When we call 32 bit syscall, save all the registers
24. LAZY_GS Macros
● We can see that if %gs is not lazy
kernel changes the segment register
upon syscall entry.
● But when it’s lazy, it does nothing?
● Problem someone?
● If this is true, then a hostile usermode
process can overflow canaries
with no apparent problem
on x86 32 bit with
CONFIG_X86_32_LAZY_GS!
25. Can it be?
● Remember this comment at stackprotector.h?
● It seems to be the only place it is done, when kernel is LAZY_GS.
26. Look closer
● It seems the kernel holds logic not only in code:
● in arch/x86/Kconfig
● So actually we cannot have stack protection and LAZY_GS after all.
● (Well, obviously!)
27. “Buffer overflows are the poster child of why problems aren't getting better. They were discovered in the 1960s and were first used to attack
computers in the 1970s. The Morris worm in 1989 was a very public use of an overflow, which at the time knocked out 10 percent of the Internet--
6000 computers. Here we are 40 years later, and buffer overflows are the most common security problem. And that's an easy problem to fix. If you
are a software vendor, there is zero excuse for buffer overflows.”
-
Bruce Schneier
End to the Overflows
Questions?
Hinweis der Redaktion
Hi, My name is Gili Yankovitch, I’m the CEO and Chief Security researcher at my company, Nyx Software Security solutions.
Today we will talk about:
How Stack Smashing works
Why it is fun
What can we do about it
In Usermode
In Kernelmode
In order to understand the lecture you need;
To know a bit about Intel architecture
I will cover this anyway but it’s a plus to know about SSP (Stack Smashing, P is for Protection)
Basic terms in operating systems, like: Context switch, System calls etc...
Every program, anywhere, has a stack.
Every thread/task/whatever has its own stack.
This is the basic calling convention
A classic buffer overflow from network
Canaries are used in coal mines
If they stopped tweeting, it means no air is comming in
Means danger
Random data before protected data
Simulation of canary addition to assembly code.
%gs is a segment register.
Once it was used to partition the memory into different regions comprised of BASE + OFFSET to access any memory address
It was used to separate code from data
Now it is used usually for program control flow with special data saved
In Windows, %fs:0x0 holds first exception handler in chain
In Linux, %gs:20/40 holds the canary value of each process/kernel cpu/etc
Process initialization generates random numbers at process startup
Sends it to the process for usage of various things
One of them is the process canary.
More of less...
A very brief explanation regarding the dynamic loader
Snippets given here are from GlibC.
I strongly discourage the use GlibC.
Too complex, very (VERY!!) messy code.
TLS is used even in single threaded applications.
We can see here that the header described in the pthread structure fits exactly to the offset needed by gcc (%gs:40 in x86 64 bit)
We can see we set the Thread Local Storage to the right, with the appropriate offset in the struct.
Review of canaries in x86 64 bit, user/kernel
Very easy to add to your kernel.
General -> Stack Protector buffer overflow detection
You should rebuild your kernel if you set this option with a precompiled kernel (make clean all)
Let’s imagine a possible attack on this mechanism
Let’s say an attacker reads the %gs:40 canary value. Can he now exploit a kernel stack based BOF?
Kernel should protect from such things, otherwise the protection is useless.
So the attacker tries to exploit the vulnerability.
But it seems that the kernel holds his own %gs segment register and it swaps it the first thing on syscall entry.
%gs is an interesting register, as it is a percpu register and holds the pointer to percpu data structures including the kernel stack, kernel canaries etc.
So where is this canary initialized? It is initialized at kernel startup, and written percpu to remember the kernel canary.
The position of this function is very critical, as from this point on, any functions that installed a different value as a canary will fault upon return.
Percpu writes are comprised of tons of macros
Eventually, it comes down to something like
movl 0x00CANARY, %%gs:0x28
We write percpu to the previously declared variables irq_stack_union or stack_canary
This setup happens just for 32 bit, as we just need to remember the canary. We don’t use %gs to anything else but it.
This is set and swapped in context switch, as we will see in a minute.
Note that this is a KERNEL canary. usermode canaries are set by TLS from ld.so!
During context switch, the kernel takes the canary stored in the task_struct and sets it in the percpu relevant to gs
This is done for the usermode canaries, in order to have a different canary for every process and ensure the integrity of canaries in case someone changed it in runtime.
When we use 32 bit, things are a bit more complicated. Linux try to optimize switching from kernel/user or other processes by not swapping gs.
Notice the lazy gs loading, if it is 0, loading is skipped.
When the kernel enters a system call, it saves all its registers on the stack
Then it loads the kernel GS register
Notice that when using lazily with GS, it does nothing!
This is in order to accelerate performance while switching from usermode to kernelmode.
Usually………….
It was really weird that Linux had such a major vulnerability like this.
This is a lesson for everyone that uses Linux: Do read the Kconfig files too.
References:
The Linux Kernel
Seriously. There’s no documentation of this at all.