DockIR: Incident Response in Containerized, Immutable Environments

DockIR: Incident Response
in a Containerized,
Immutable, Continually
Deployed Environment

Who Am I
Security guy at Coinbase
We protect $5+ Billion in Digital Assets (bitcoin, ethereum, litecoin) on a 100% container-based,
AWS, continually deployed infra
Long time container enthusiast
Occasional open source software dev
https://github.com/Phillipmartin

What we're not covering
Secure Configuration of Docker
This is a deep complex topic, there are some great resource online including a CIS benchmark
(https://www.cisecurity.org/cis-benchmarks/)
AWS Security
Lots of great talks about this
SDLC impact of Containers and CD
This deserves it’s own talk (and there are several good ones)
Secure deployment concepts

What we are covering
Some Coinbase context
Preparation - things that we do because containers make it necessary or
because containers make it easier/possible
Detection - the different things we can or must do to detect evil in our
environment
Response - things that we can do once we do detect evil

Why should we care?
Docker
CI/CD

Preparation
Challenge: System and Software inventories are on essentially every list of
critical security controls everywhere. How do we gain visibility into the
container OS and software, and how we can reconstruct historical inventory
when it changes so rapidly?
Response:
Establish managed base container OSs
Only allow deployment of whitelisted containers
Use Claire to scan *and log* container packages in the CD pipeline
Controls on Dockerfile contents (e.g. don’t allow RUN curl > filesystem)

Preparation
Opportunity: Containers should be single purpose and don’t generally need a
full userspace environment or the ability to call all syscalls.
Response:
Make the managed base OSs as minimal as possible (Alpine is great for this)
If you can, abstract this away from developers entirely
We actually use scratch containers for some Go services
Docker already has sane defaults for capabilities and seccomp, don’t turn them off
Because we follow an immutable deployment concept, we can actually deploy many containers
fully read-only

Detection
Challenge: Containers, by design, should be a single purpose environment.
That means no room for agent-based security solutions. How do we get
telemetry?
Response:
There are a number of vendors out there that answer this question in a bunch of different ways.
I’m not going to talk about any of them
We answer this question using a combo of surveillance from the host OS based on auditd (and
one day soon eBPF!), sidecar containers that inspect specific things (e.g. DNS logging, rolling
pcap, etc) and very verbose application logs
All of this is shoved into a Kinesis stream and sucked into our log pipeline

Detection
Opportunity: Containers, by design, should be a single purpose environment.
Does that mean we can do real whitelisting?
Response: Mostly.
Whitelisting exec calls per process/container
Whitelisting connect calls per process/container

Response
Challenge: If attacker dwell time is measured in weeks or months and container
lifetimes are measured in hours or days, how do you effectively investigate
the full scope of a breach?
Response:
This is one of the core problems, to me, in an environment with broad adoption of CD.
Log everything (audit, docker logs, system logs, etc)
Enrich log lines when they are logged (e.g. IP 10.2.3.4 may be hosting app A today, but app B by
the time a breach is detected)
Keep logs for years (even if they fall into cold storage after a while)
For highly vulnerable or critical services, consider saving some filesystem or memory artifacts

Response
Challenge: How do I isolate a container for forensics?
Response:
At the AWS level this is fairly well understood. Changing the security group for an instance is
possible to do on the fly and an effective way to ensure that the potentially compromised host
can only talk to IR tooling.
At the host level there are a few options:
docker pause will pause all processes in the container. This can be useful if you need to
mitigate an incident but can’t investigate this host right now
Network isolation using docker network disconnect (unless you are using --network=host)
Network isolation using iptables and the DOCKER-USER chain (docker 17+)

Response
Challenge: How can I do live response on a container?
Response:
Standard docker commands provide a lot of insight (more in a sec)
inspect, diff, cp, export, pause, exec
But fundamentally, docker containers are reflected on the host as some processes with specific
restrictions, so most of your normal tools will mostly work (e.g. strace, gdb, etc)
You can even grab process memory directly using /proc/$pid/mem and /proc/$pid/maps or
something like gcore from the host
A full memory image of the host will also capture the running processes, but a bunch of details
will be wacky because of namespaces (e.g. paths, PIDs, UIDs, maybe IPs, hostnames, etc).

Response
Challenge: How can I do live response on a container?
Response:
Standard docker commands provide a lot of insight
docker inspect - dump container metadata
docker diff - diff a running container against the base image
docker cp - move files in and out of a running container from the host
docker export - create a tar of the current state of a running instance’s filesystem
docker pause - pause the target instance (using cgoup freezer)

Response
Challenge: How do I respond if I have no access?
Response:
Automate your response actions, make it a single script that auto-fetches dependencies
It’s actually OK if this is loud in your logging/monitoring. You probably should alert when
someone loads a new kernel module for memory dumping
Have a process poll SQS (or whatever) for a signal that it should kick off a response
I strongly suggest you follow the GRR model and make sure the commands in that channel
are signed and authorized keys are hardcoded in the response script
Once your response runs, tar it up and encrypt it back to the key that signed to the command (or
some central key or whatever works for your setup) and upload it to an S3 bucket

DockIR: Incident Response in Containerized, Immutable Environments

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie DockIR: Incident Response in Containerized, Immutable Environments

Ähnlich wie DockIR: Incident Response in Containerized, Immutable Environments (20)

Mehr von Shakacon

Mehr von Shakacon (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

DockIR: Incident Response in Containerized, Immutable Environments

Hinweis der Redaktion