The Linux Security and Isolation APIs have become the basis of some of the most useful features server-side, providing the isolation required for efficient containers.
However, these APIs also form the basis of the Chromium Sandbox on Linux, and we will study them in that context. This is the sandbox used in the Vivaldi, Brave, Chrome and Opera browsers among others. The Chromium Sandbox has a very platform specific implementation, using the platform APIs available to construct it. In this talk we will describe the requirements of the Chromium Sandbox in detail and go through how the Linux implementation fulfills these requirements.
2. Linux Security
and How Web Browser Sandboxes Really Work
Patricia Aas
NDC Oslo 2017
@pati_gallardo
3. Patricia Aas
Programmer - mainly in C++ and Java
Currently : Vivaldi Technologies
Previously : Cisco Systems, Knowit, Opera Software
Master in Computer Science
Twitter : @pati_gallardo
Will put up link to slides on Twitter/LinkedIn
4. Overview of the Browser
The Linux Security APIs Available
Building the Sandboxes
Some Problems
The Plan
8. Sandboxing
External & Internal Threats
● System External Threat :
Protect the system from
the vulnerabilities in
the browser
● System Internal Threat :
Protect the browser from
malware on the system
@pati_gallardo
@pati_gallardo
10. The Executable Files
vivaldi
The bash script wrapper, launches
vivaldi-bin. Sets up environment.
vivaldi-bin
The browser binary
vivaldi-sandbox
A setuid binary, not in much use
today
@pati_gallardo
14. ONE executable : vivaldi-bin
The First: Browser Process forks into:
1. The Zygote(s) which fork into The Renderers
2. The Gpu Process which forks a Gpu Broker
15. Browser Process
1 process : browser
Main: BrowserMain
Manages the other processes,
the IPC, the windows, GUI
and network
Trusted @pati_gallardo
16. Zygote Processes
2 processes: zygote
Main : ZygoteMain
Spawns all of the renderer
processes. Parent Zygote is
an init reaper process.
Has some sandboxing :
“Trusted”
@pati_gallardo
17. Renderer Processes
Many processes : renderer
Main : RendererMain
One for each tab etc, holds
web content. Spawned by the
Zygote process.
Has full sandboxing:
Untrusted*
*Does seccomp sandbox construction
@pati_gallardo
18. GPU Processes
2 processes: gpu + gpu-broker
Main : GpuMain
Does all interfacing directly
with the GPU diver. All
graphics work is piped to
them + HW decoding.
Has some sandboxing :
“Trusted”
@pati_gallardo
30. YAMA LSM
Limits the access that other
process have to this process
- especially ptracing
Status is checked by reading :
/proc/sys/kernel/yama/ptrace_scope
@pati_gallardo
31. Other APIs in USe
But not covered
CGroups (ChromeOs)
Setuid/Setgid (Legacy)
Process Groups (Chromedriver)
32. Setuid/setgid
(Legacy)
Increases what a process is
privileged enough to do, by
using the privilege of the
executables owner
Api : set*uid / set*gid @pati_gallardo
34. Process Groups
(Chromedriver)
Can be used to treat a group
of processes as one.
Used with the ‘detach’
property of Chromedriver
Api : setpgid @pati_gallardo
39. Windows of Opportunity
BROWSER
GPU
seccomp
ZYGOTE
USER capset
chroot
GPUBROKER
seccomp
ZYGOTE
SYS_ADMIN
seccomp
RENDERER
capset rlimit
PID
NO_NEW_PRIVS
USER/PID/NET
NO_NEW_PRIVS
FORKCLONE
EXEC EXEC
CLONE
FORKFORK
Done post-fork
40. Fork / Exec
A typical start of a new
process on Linux is a
“fork/exec”
“Forking” is creating a new
process
“Clone” is a type of fork which
can restrict a process at
creation
“Exec” is executing a binary in
a process
@pati_gallardo
41. Windows of Opportunity: fork/exec
1. Before clone/fork
2. At clone
3. Before Exec
4. At startup (input : argv)
43. Initial Sandbox Construction
This initial sandbox
construction is done at startup
when the Browser process
creates the Gpu and Zygote
processes.
The most important part of the
sandboxes created here are the
Namespaces created for the
Zygote process
@pati_gallardo
GPU + Zygote
44. Before Clone / Fork
As a rule the child process
will inherit the privileges and
limitations of its parent.
This is an opportunity to
release anything you don’t want
to pass on.
Remember the whole memory of
the parent is copied* including
all state
* Linux does copy on write
@pati_gallardo
All Processes
45. At Clone : Create NAmespaces
Clone flags define the process*
created and will create
namespaces (NS) for it
1. Test which NS are available
2. Fail if not sufficient
3. Construct the biggest
supported and applicable set
Emulates fork with longjmp
* Also used to create threads@pati_gallardo
Zygote + Renderer
46. Namespaces in use
CLONE_NEWUSER
No privilege is needed to create a
User NS, and in one we can create a
PID NS without global privilege.
CLONE_NEWPID
Same PID number can represent
different processes in different
PID namespaces. One init (PID 1)
process per PID NS
CLONE_NEWNET
Isolate a process from network@pati_gallardo
Zygote + Renderer
47. Before Exec
The process will be executed in
the current process.
Before Exec is the last chance
to clean up before the executed
binary takes over
@pati_gallardo
GPU + Zygote
48. Before Exec : Launch Options
The process is prepared for the
upcoming exec (possibilities) :
1. Fix the environment
2. Fix file descriptors
3. Fix signal handling
4. Set up process group
5. Maximize resource limits
6. Set PR_SET_NO_NEW_PRIVS
7. Change current dir
8. Select executable path
9. Setup command-line@pati_gallardo
GPU + Zygote
51. PR_SET_NO_NEW_PRIVS
prctl(PR_SET_NO_NEW_PRIVS)
Preserved across fork, clone
and execve
“If no_new_privs is set, then
operations that grant new
privileges (i.e. execve) will
either fail or not grant them.
This affects suid/sgid, file
capabilities, and LSMs.”
/usr/include/linux/prctl.h
@pati_gallardo
All except Browser
52. Seccomp BPF Program
Program written in an
assembly-like language to
filter system-calls.
Runs in a simple VM in kernel
space. All syscalls will be
filtered by this program
TSYNC : Once a Seccomp Program
is installed it applies to all
threads in a process
@pati_gallardo
Renderer + Gpu + Broker
53. Seccomp : BPF Policies
BPF Program defined in a Policy
Fundamentally a whitelist,
allows a set of syscalls and
has custom handling of others
An extended Policy is then
generally more permissive
1. BaselinePolicy
1.1 GpuProcessPolicy
1.1.1 GpuBrokerProcessPolicy
1.2 RendererProcessPolicy@pati_gallardo
Renderer + Gpu + Broker
55. Chroot : Drop Access to FS
A chroot is done in a
clone(CLONE_FS) child that does
chroot(”/proc/self/fdinfo/”) and
immediately does a chdir(“/”) and
_exit(0)
You can see this by looking at
ls -l /proc/<pid>/root
Of the Zygote or any ancestor
Credentials::DropFileSystemAccess
@pati_gallardo
Zygotes + Renderer
56. Drop Capabilities
Uses capset() to drop all or some
capabilities
“Linux divides the privileges
traditionally associated with
superuser into distinct units,
known as capabilities, which can be
independently enabled and
disabled.”
Man page for capabilities
Credentials::DropAllCapabilities@pati_gallardo
Zygotes + Renderers
57. Resource Limits : setrlimit
Limits using setrlimit:
1. RLIMIT_AS : Maximum size of the
process’ virtual memory
(address space) in bytes
2. RLIMIT_DATA : Maximum size of
the process's data segment
LinuxSandbox::LimitAddressSpace
@pati_gallardo
Renderer
66. Sources
Michael Kerrisk
Book: The Linux Programming Interface
Course: Linux Security and Isolation APIs
Chromium/Kernel source + Linux Man Pages + lwn.net
All Errors Are My Own
67. Linux Security
and How Web Browser Sandboxes Really Work
Patricia Aas, Vivaldi Technologies
@pati_gallardo
Photos from pixabay.com