SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Docker Container: Isolation and Security
Eric Fu
1
chroot
In UNIX, everything is a file.
2
Overview
Isolation ‑ Linux Namespaces
Isolation ‑ Control Groups
Container Security
3
Isolation ‑ Linux Namespaces
Process‑level Isolation
4
Linux Namespaces
Category Clone Flag Kernel version
Mount namespaces CLONE_NEWNS Linux 2.4.19
UTS namespaces CLONE_NEWUTS Linux 2.6.19
IPC namespaces CLONE_NEWIPC Linux 2.6.19
PID namespaces CLONE_NEWPID Linux 2.6.24
Network namespaces CLONE_NEWNET Linux 2.6.24, completed in 2.6.29
User namespaces CLONE_NEWUSER Linux 2.6.23, completed in 3.8
5
clone()
static char container_stack[STACK_SIZE];
char* const container_args[] = {"/bin/bash", NULL};
int container_main(void* arg)
{
// Open a shell
execv(container_args[0], container_args);
// Should never be here
}
int main()
{
int container_pid = clone(container_main, container_stack+STACK_SIZE,
SIGCHLD, NULL);
waitpid(container_pid, NULL, 0);
return 0;
}
6
UTS Namespace ( CLONE_NEWUTS )
Isolates system identifiers:  nodename and  domainname .
int container_main(void* arg)
{
sethostname("container", 10);
// Open a shell
execv(container_args[0], container_args);
// Should never be here
}
7
IPC Namespace ( CLONE_NEWIPC )
Isolates IPC resources: SystemV IPC objects and POSIX message queues.
root@eric-vm:/home/eric/linux_namespace# ipcmk -Q
Message queue id: 0
root@eric-vm:/home/eric/linux_namespace# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xd5467105 0 root 644 0 0
root@eric-vm:/home/eric/linux_namespace# ./test_ipc_ns
Parent - start a container!
Container - inside the container!
root@container:/home/eric/linux_namespace# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
8
PID Namespace ( CLONE_NEWPID )
Isolate the PID space.
Processes in different PID namespaces can have the same PID.
eric@eric-vm:~/linux_namespace$ sudo ./test_pid_ns
Parent (2536) - start a container!
Container (1) - inside the container!
Why  ps aux still show all processes?
9
Mount Namespace ( CLONE_NEWNS )
Isolate the set of filesystem mount points seen by a group of processes.
Processes in different mount namespaces can have different views of the filesystem hierarchy.
mount("proc", "/proc", "proc", 0, NULL);
Inside the container:
/ # ps aux
PID USER TIME COMMAND
1 root 0:00 /bin/sh
3 root 0:00 ps aux
10
Mount a Real Docker Image
docker save alpine | undocker -i -o rootfs alpine
// System mount points
mount("proc", "rootfs/proc", "proc", 0, NULL);
mount("sysfs", "rootfs/sys", "sysfs", 0, NULL);
mount("none", "rootfs/tmp", "tmpfs", 0, NULL);
mount("udev", "rootfs/dev", "devtmpfs", 0, NULL);
// Config files
mount("conf/hosts", "rootfs/etc/hosts", "none", MS_BIND, NULL);
mount("conf/hostname", "rootfs/etc/hostname", "none", MS_BIND, NULL);
mount("conf/resolv.conf", "rootfs/etc/resolv.conf", "none", MS_BIND, NULL);
// Chroot
chdir("./rootfs");
chroot("./");
11
User namespace ( CLONE_NEWUSER )
Isolates the user and group ID spaces.
A process's UID and GID can be different inside and outside a user namespace.
void set_map(char* file, int inside_id, int outside_id, int len) {
FILE *fd = fopen(file, "w");
fprintf(fd, "%d %d %d", inside_id, outside_id, len);
fclose(fd);
}
void set_uid_map(pid_t pid, int inside_id, int outside_id, int len) {
char file[256];
sprintf(file, "/proc/%d/uid_map", pid);
set_map(file, inside_id, outside_id, len);
}
void set_gid_map(pid_t pid, int inside_id, int outside_id, int len) {
char file[256];
sprintf(file, "/proc/%d/gid_map", pid);
set_map(file, inside_id, outside_id, len);
}
12
Network namespace ( CLONE_NEWNET )
Preparation
brctl addbr br0
ifconfig br0 192.168.10.1/24 up
Host
ip link add veth0 type veth peer name veth1
ip link set veth1 netns $PID
brctl addif br0 veth0
ip link set veth0 up
Container
ip link set dev veth1 name eth0
ip link set eth0 up
ip link set lo up
ip addr add 192.168.10.2/24 dev eth0
ip route add default via 192.168.10.1
13
Network Topology
14
Isolation ‑ Control Groups
Resource Limiting
15
Linux Control Groups
blkio (Disk I/O)
cpu (CPU quota)
cpuset (CPU cores)
devices
memory
net_cls (Network package class id)
net_prio (Network package priority)
hugetlb (HugeTLB)
cpuacct
freezer
16
Glance
root@eric-vm:/sys/fs/cgroup# ls
blkio cpuacct cpuset freezer memory net_cls,net_prio perf_event systemd
cpu cpu,cpuacct devices hugetlb net_cls net_prio pids
root@eric-vm:/sys/fs/cgroup/cpu$ sudo mkdir test
root@eric-vm:/sys/fs/cgroup/cpu/test$ ls
cgroup.clone_children cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.stat
cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.shares notify_o
17
We have a CPU killer
int main()
{
int i = 0;
for (;;) i++;
return 0;
}
 top 
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3985 eric 20 0 4224 648 576 R 99.9 0.1 0:15.53 deadloop
18
Usage
Create a group. (Yes, just  mkdir )
sudo mkdir /sys/fs/cgroup/cpu/test
Set a limit. 20000 means 20% CPU time.
echo 20000 > /sys/fs/cgroup/cpu,cpuacct/test
Add a process to our group.
echo 3985 >> /sys/fs/cgroup/cpu,cpuacct/test/tasks
19
Container Security
20
"Container"
Linux kernel namespaces provide the isolation (hence “container”) in which we place one or
more processes
Linux kernel cgroups (“Control groups”) provide resource limiting and accounting (CPU,
memory, I/O bandwidth, etc.)
21
Container Properties
A shared kernel across all containers on a single host.
Unique filesystem, a layered model using CoW (copy‑on‑write) union filesystems.
Linux namespaces are shareable (Kubernetes “pod”)
One process per container
22
Linux Capabilities
Add/Drop unnecessary capabilities from a container.
$ docker run --rm -ti busybox sh
/ # hostname foo
hostname: sethostname: Operation not permitted
$ docker run --rm -ti --cap-add=SYS_ADMIN busybox sh
/ # hostname foo
<hostname changed>
$ docker run --rm -ti --cap-drop=NET_RAW busybox sh
/ # ping 8.8.8.8
ping: permission denied (are you root?)
23
Linux Capabilities
24
Seccomp
Block specific syscalls from being used by container binaries.
$ cat policy.json
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"name": "chmod",
"action": "SCMP_ACT_ERRNO"
}
]
}
$ docker run --rm -it --security-opt seccomp:policy.json busybox chmod 640
/etc/resolv.conf
chmod: /etc/resolv.conf: Operation not permitted
25
AppArmor/SELinux
Limit access to specific filesystem paths in container
https://raw.githubusercontent.com/jessfraz/bane/master/docker‑nginx‑sample
$ docker run --rm -ti --security-opt="apparmor:docker-nginx-sample" 
-p 80:80 nginx bash
root@6da5a2a930b9:/# top
bash: /usr/bin/top: Permission denied
root@6da5a2a930b9:/# touch ~/thing
touch: cannot touch 'thing': Permission denied
26
Attack a Container!
“attack surface”
Host <‑> Container
Container <‑> Container
External ‑> Container
Application Security
27
Host <‑> Container
Protecting the host from containers
THREAT MITIGATION
DoS Host (use up CPU,
memory, disk), Forkbomb
Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel
4.3)
Access host/private
information
Namespace configuration; AppArmor/SELinux profiles, seccomp
(1.10)
Kernel modification/insert
module
Capabilities (already dropped); seccomp, LSMs; don’t run  --
privileged mode
Docker administrative
access (API socket
access)
Don’t share the Docker UNIX socket without Authz plugin
limitations; use TLS certificates for TCP endpoint configurations
28
Container <‑> Container
Malicious or Multi‑tenant
THREAT MITIGATION
DoS other containers (noisy
neighbor using significant % of
CPU, memory, disk)
Cgroup controls, disk quotas (1.12), kernel pids limit (1.11
+ Kernel 4.3)
Access other container’s
information (pids, files, etc.)
Namespace configuration; AppArmor/SELinux profile for
containers
Docker API access (full control
over other containers)
Don’t share the Docker UNIX socket without Authz
plugin limitations (1.10); use TLS certificates for TCP
endpoint configurations
29
External ‑> Container
The big, bad Internet
THREAT MITIGATION
DDoS attacks
Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel
4.3), Proactive monitoring infrastructure/operational readiness
Malicious (remote)
access
Appropriate application security model No weak/default passwords! ‑
‑readonly filesystem (limit blast radius)
Unpatched exploits
(underlying OS layers)
Vulnerability scanning (IBM Bluemix, Docker Data Center, CoreOS
Clair, Red Hat “SmartState” CloudForms (w/Black Duck)
30
Application Security
Significant container benefit: provided protections are in place (seccomp, LSMs, dropped caps,
user namespaces) the exploited application has greatly reduced ability to inflict harm beyond
container “walls”
Proper handling of secrets through dev/build/deploy process (no passwords in Dockerfile,
as an example)
Unnecessary services not exposed externally (shared namespaces; internal/management
networks)
Secure coding/design principles
31
Thank You!
32

Weitere ähnliche Inhalte

Was ist angesagt?

Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
Dobrica Pavlinušić
 
Docker Internals - Twilio talk November 14th, 2013
Docker Internals - Twilio talk November 14th, 2013Docker Internals - Twilio talk November 14th, 2013
Docker Internals - Twilio talk November 14th, 2013
Guillaume Charmes
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
Sim Janghoon
 

Was ist angesagt? (20)

Docker Meetup: Docker Networking 1.11, by Madhu Venugopal
Docker Meetup: Docker Networking 1.11, by Madhu VenugopalDocker Meetup: Docker Networking 1.11, by Madhu Venugopal
Docker Meetup: Docker Networking 1.11, by Madhu Venugopal
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Lxc- Introduction
Lxc- IntroductionLxc- Introduction
Lxc- Introduction
 
Linux Containers From Scratch
Linux Containers From ScratchLinux Containers From Scratch
Linux Containers From Scratch
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 
Docker orchestration using core os and ansible - Ansible IL 2015
Docker orchestration using core os and ansible - Ansible IL 2015Docker orchestration using core os and ansible - Ansible IL 2015
Docker orchestration using core os and ansible - Ansible IL 2015
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love Systemd
 
Introduction to linux containers
Introduction to linux containersIntroduction to linux containers
Introduction to linux containers
 
Docker Networking - Current Status and goals of Experimental Networking
Docker Networking - Current Status and goals of Experimental NetworkingDocker Networking - Current Status and goals of Experimental Networking
Docker Networking - Current Status and goals of Experimental Networking
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 
Docker Internals - Twilio talk November 14th, 2013
Docker Internals - Twilio talk November 14th, 2013Docker Internals - Twilio talk November 14th, 2013
Docker Internals - Twilio talk November 14th, 2013
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
 
Linux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPSLinux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPS
 
Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302
 
CoreOS @Codetalks Hamburg
CoreOS @Codetalks HamburgCoreOS @Codetalks Hamburg
CoreOS @Codetalks Hamburg
 
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
 
LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?
 
Introduction to Docker & CoreOS - Symfony User Group Cologne
Introduction to Docker & CoreOS - Symfony User Group CologneIntroduction to Docker & CoreOS - Symfony User Group Cologne
Introduction to Docker & CoreOS - Symfony User Group Cologne
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and Docker
 

Ähnlich wie Docker Container: isolation and security

Linux Container Technology 101
Linux Container Technology 101Linux Container Technology 101
Linux Container Technology 101
inside-BigData.com
 
Secure development on Kubernetes by Andreas Falk
Secure development on Kubernetes by Andreas FalkSecure development on Kubernetes by Andreas Falk
Secure development on Kubernetes by Andreas Falk
SBA Research
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
Teja Bheemanapally
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
WSO2
 
lxc-namespace.pdf
lxc-namespace.pdflxc-namespace.pdf
lxc-namespace.pdf
-
 

Ähnlich wie Docker Container: isolation and security (20)

Docker London: Container Security
Docker London: Container SecurityDocker London: Container Security
Docker London: Container Security
 
Docker Security Paradigm
Docker Security ParadigmDocker Security Paradigm
Docker Security Paradigm
 
How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016
 
MINCS - containers in the shell script (Eng. ver.)
MINCS - containers in the shell script (Eng. ver.)MINCS - containers in the shell script (Eng. ver.)
MINCS - containers in the shell script (Eng. ver.)
 
Docker 基本概念與指令操作
Docker  基本概念與指令操作Docker  基本概念與指令操作
Docker 基本概念與指令操作
 
Linux Container Technology 101
Linux Container Technology 101Linux Container Technology 101
Linux Container Technology 101
 
Build, Ship, and Run Any App, Anywhere using Docker
Build, Ship, and Run Any App, Anywhere using Docker Build, Ship, and Run Any App, Anywhere using Docker
Build, Ship, and Run Any App, Anywhere using Docker
 
Securing Applications and Pipelines on a Container Platform
Securing Applications and Pipelines on a Container PlatformSecuring Applications and Pipelines on a Container Platform
Securing Applications and Pipelines on a Container Platform
 
Secure development on Kubernetes by Andreas Falk
Secure development on Kubernetes by Andreas FalkSecure development on Kubernetes by Andreas Falk
Secure development on Kubernetes by Andreas Falk
 
Docker: Aspects of Container Isolation
Docker: Aspects of Container IsolationDocker: Aspects of Container Isolation
Docker: Aspects of Container Isolation
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
 
Evoluation of Linux Container Virtualization
Evoluation of Linux Container VirtualizationEvoluation of Linux Container Virtualization
Evoluation of Linux Container Virtualization
 
Container security
Container securityContainer security
Container security
 
What You Should Know About Container Security
What You Should Know About Container SecurityWhat You Should Know About Container Security
What You Should Know About Container Security
 
Linux seccomp(2) vs OpenBSD pledge(2)
Linux seccomp(2) vs OpenBSD pledge(2)Linux seccomp(2) vs OpenBSD pledge(2)
Linux seccomp(2) vs OpenBSD pledge(2)
 
Lecture 4 Cluster Computing
Lecture 4 Cluster ComputingLecture 4 Cluster Computing
Lecture 4 Cluster Computing
 
lxc-namespace.pdf
lxc-namespace.pdflxc-namespace.pdf
lxc-namespace.pdf
 
Ch04 system administration
Ch04 system administration Ch04 system administration
Ch04 system administration
 

Mehr von 宇 傅

Mehr von 宇 傅 (12)

Parallel Query Execution
Parallel Query ExecutionParallel Query Execution
Parallel Query Execution
 
The Evolution of Data Systems
The Evolution of Data SystemsThe Evolution of Data Systems
The Evolution of Data Systems
 
The Volcano/Cascades Optimizer
The Volcano/Cascades OptimizerThe Volcano/Cascades Optimizer
The Volcano/Cascades Optimizer
 
PelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloadsPelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloads
 
Immutable Data Structures
Immutable Data StructuresImmutable Data Structures
Immutable Data Structures
 
The Case for Learned Index Structures
The Case for Learned Index StructuresThe Case for Learned Index Structures
The Case for Learned Index Structures
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8
 
第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms
 
Golang 101
Golang 101Golang 101
Golang 101
 
Paxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus AlgorithmPaxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus Algorithm
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Docker Container: isolation and security

  • 1. Docker Container: Isolation and Security Eric Fu 1
  • 3. Overview Isolation ‑ Linux Namespaces Isolation ‑ Control Groups Container Security 3
  • 4. Isolation ‑ Linux Namespaces Process‑level Isolation 4
  • 5. Linux Namespaces Category Clone Flag Kernel version Mount namespaces CLONE_NEWNS Linux 2.4.19 UTS namespaces CLONE_NEWUTS Linux 2.6.19 IPC namespaces CLONE_NEWIPC Linux 2.6.19 PID namespaces CLONE_NEWPID Linux 2.6.24 Network namespaces CLONE_NEWNET Linux 2.6.24, completed in 2.6.29 User namespaces CLONE_NEWUSER Linux 2.6.23, completed in 3.8 5
  • 6. clone() static char container_stack[STACK_SIZE]; char* const container_args[] = {"/bin/bash", NULL}; int container_main(void* arg) { // Open a shell execv(container_args[0], container_args); // Should never be here } int main() { int container_pid = clone(container_main, container_stack+STACK_SIZE, SIGCHLD, NULL); waitpid(container_pid, NULL, 0); return 0; } 6
  • 7. UTS Namespace ( CLONE_NEWUTS ) Isolates system identifiers:  nodename and  domainname . int container_main(void* arg) { sethostname("container", 10); // Open a shell execv(container_args[0], container_args); // Should never be here } 7
  • 8. IPC Namespace ( CLONE_NEWIPC ) Isolates IPC resources: SystemV IPC objects and POSIX message queues. root@eric-vm:/home/eric/linux_namespace# ipcmk -Q Message queue id: 0 root@eric-vm:/home/eric/linux_namespace# ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 0xd5467105 0 root 644 0 0 root@eric-vm:/home/eric/linux_namespace# ./test_ipc_ns Parent - start a container! Container - inside the container! root@container:/home/eric/linux_namespace# ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 8
  • 9. PID Namespace ( CLONE_NEWPID ) Isolate the PID space. Processes in different PID namespaces can have the same PID. eric@eric-vm:~/linux_namespace$ sudo ./test_pid_ns Parent (2536) - start a container! Container (1) - inside the container! Why  ps aux still show all processes? 9
  • 10. Mount Namespace ( CLONE_NEWNS ) Isolate the set of filesystem mount points seen by a group of processes. Processes in different mount namespaces can have different views of the filesystem hierarchy. mount("proc", "/proc", "proc", 0, NULL); Inside the container: / # ps aux PID USER TIME COMMAND 1 root 0:00 /bin/sh 3 root 0:00 ps aux 10
  • 11. Mount a Real Docker Image docker save alpine | undocker -i -o rootfs alpine // System mount points mount("proc", "rootfs/proc", "proc", 0, NULL); mount("sysfs", "rootfs/sys", "sysfs", 0, NULL); mount("none", "rootfs/tmp", "tmpfs", 0, NULL); mount("udev", "rootfs/dev", "devtmpfs", 0, NULL); // Config files mount("conf/hosts", "rootfs/etc/hosts", "none", MS_BIND, NULL); mount("conf/hostname", "rootfs/etc/hostname", "none", MS_BIND, NULL); mount("conf/resolv.conf", "rootfs/etc/resolv.conf", "none", MS_BIND, NULL); // Chroot chdir("./rootfs"); chroot("./"); 11
  • 12. User namespace ( CLONE_NEWUSER ) Isolates the user and group ID spaces. A process's UID and GID can be different inside and outside a user namespace. void set_map(char* file, int inside_id, int outside_id, int len) { FILE *fd = fopen(file, "w"); fprintf(fd, "%d %d %d", inside_id, outside_id, len); fclose(fd); } void set_uid_map(pid_t pid, int inside_id, int outside_id, int len) { char file[256]; sprintf(file, "/proc/%d/uid_map", pid); set_map(file, inside_id, outside_id, len); } void set_gid_map(pid_t pid, int inside_id, int outside_id, int len) { char file[256]; sprintf(file, "/proc/%d/gid_map", pid); set_map(file, inside_id, outside_id, len); } 12
  • 13. Network namespace ( CLONE_NEWNET ) Preparation brctl addbr br0 ifconfig br0 192.168.10.1/24 up Host ip link add veth0 type veth peer name veth1 ip link set veth1 netns $PID brctl addif br0 veth0 ip link set veth0 up Container ip link set dev veth1 name eth0 ip link set eth0 up ip link set lo up ip addr add 192.168.10.2/24 dev eth0 ip route add default via 192.168.10.1 13
  • 15. Isolation ‑ Control Groups Resource Limiting 15
  • 16. Linux Control Groups blkio (Disk I/O) cpu (CPU quota) cpuset (CPU cores) devices memory net_cls (Network package class id) net_prio (Network package priority) hugetlb (HugeTLB) cpuacct freezer 16
  • 17. Glance root@eric-vm:/sys/fs/cgroup# ls blkio cpuacct cpuset freezer memory net_cls,net_prio perf_event systemd cpu cpu,cpuacct devices hugetlb net_cls net_prio pids root@eric-vm:/sys/fs/cgroup/cpu$ sudo mkdir test root@eric-vm:/sys/fs/cgroup/cpu/test$ ls cgroup.clone_children cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.stat cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.shares notify_o 17
  • 18. We have a CPU killer int main() { int i = 0; for (;;) i++; return 0; }  top  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3985 eric 20 0 4224 648 576 R 99.9 0.1 0:15.53 deadloop 18
  • 19. Usage Create a group. (Yes, just  mkdir ) sudo mkdir /sys/fs/cgroup/cpu/test Set a limit. 20000 means 20% CPU time. echo 20000 > /sys/fs/cgroup/cpu,cpuacct/test Add a process to our group. echo 3985 >> /sys/fs/cgroup/cpu,cpuacct/test/tasks 19
  • 21. "Container" Linux kernel namespaces provide the isolation (hence “container”) in which we place one or more processes Linux kernel cgroups (“Control groups”) provide resource limiting and accounting (CPU, memory, I/O bandwidth, etc.) 21
  • 22. Container Properties A shared kernel across all containers on a single host. Unique filesystem, a layered model using CoW (copy‑on‑write) union filesystems. Linux namespaces are shareable (Kubernetes “pod”) One process per container 22
  • 23. Linux Capabilities Add/Drop unnecessary capabilities from a container. $ docker run --rm -ti busybox sh / # hostname foo hostname: sethostname: Operation not permitted $ docker run --rm -ti --cap-add=SYS_ADMIN busybox sh / # hostname foo <hostname changed> $ docker run --rm -ti --cap-drop=NET_RAW busybox sh / # ping 8.8.8.8 ping: permission denied (are you root?) 23
  • 25. Seccomp Block specific syscalls from being used by container binaries. $ cat policy.json { "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "name": "chmod", "action": "SCMP_ACT_ERRNO" } ] } $ docker run --rm -it --security-opt seccomp:policy.json busybox chmod 640 /etc/resolv.conf chmod: /etc/resolv.conf: Operation not permitted 25
  • 26. AppArmor/SELinux Limit access to specific filesystem paths in container https://raw.githubusercontent.com/jessfraz/bane/master/docker‑nginx‑sample $ docker run --rm -ti --security-opt="apparmor:docker-nginx-sample" -p 80:80 nginx bash root@6da5a2a930b9:/# top bash: /usr/bin/top: Permission denied root@6da5a2a930b9:/# touch ~/thing touch: cannot touch 'thing': Permission denied 26
  • 27. Attack a Container! “attack surface” Host <‑> Container Container <‑> Container External ‑> Container Application Security 27
  • 28. Host <‑> Container Protecting the host from containers THREAT MITIGATION DoS Host (use up CPU, memory, disk), Forkbomb Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3) Access host/private information Namespace configuration; AppArmor/SELinux profiles, seccomp (1.10) Kernel modification/insert module Capabilities (already dropped); seccomp, LSMs; don’t run  -- privileged mode Docker administrative access (API socket access) Don’t share the Docker UNIX socket without Authz plugin limitations; use TLS certificates for TCP endpoint configurations 28
  • 29. Container <‑> Container Malicious or Multi‑tenant THREAT MITIGATION DoS other containers (noisy neighbor using significant % of CPU, memory, disk) Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3) Access other container’s information (pids, files, etc.) Namespace configuration; AppArmor/SELinux profile for containers Docker API access (full control over other containers) Don’t share the Docker UNIX socket without Authz plugin limitations (1.10); use TLS certificates for TCP endpoint configurations 29
  • 30. External ‑> Container The big, bad Internet THREAT MITIGATION DDoS attacks Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3), Proactive monitoring infrastructure/operational readiness Malicious (remote) access Appropriate application security model No weak/default passwords! ‑ ‑readonly filesystem (limit blast radius) Unpatched exploits (underlying OS layers) Vulnerability scanning (IBM Bluemix, Docker Data Center, CoreOS Clair, Red Hat “SmartState” CloudForms (w/Black Duck) 30
  • 31. Application Security Significant container benefit: provided protections are in place (seccomp, LSMs, dropped caps, user namespaces) the exploited application has greatly reduced ability to inflict harm beyond container “walls” Proper handling of secrets through dev/build/deploy process (no passwords in Dockerfile, as an example) Unnecessary services not exposed externally (shared namespaces; internal/management networks) Secure coding/design principles 31