This document discusses incident response strategies in a containerized and immutable infrastructure environment like Docker. It addresses challenges like lack of system and software inventory visibility due to rapid container changes, and lack of agent-based security due to single-purpose containers. It proposes solutions like establishing managed base container OSs, whitelisting allowed containers and files, and leveraging logs and sidecar containers to monitor for detections. Response challenges around long investigation timeframes due to short container lifetimes and lack of access are addressed with strategies like comprehensive logging, filesystem artifact preservation, and automating remote response capabilities.
3. Who Am I
Security guy at Coinbase
We protect $5+ Billion in Digital Assets (bitcoin, ethereum, litecoin) on a 100% container-based,
AWS, continually deployed infra
Long time container enthusiast
Occasional open source software dev
https://github.com/Phillipmartin
4. What we're not covering
Secure Configuration of Docker
This is a deep complex topic, there are some great resource online including a CIS benchmark
(https://www.cisecurity.org/cis-benchmarks/)
AWS Security
Lots of great talks about this
SDLC impact of Containers and CD
This deserves it’s own talk (and there are several good ones)
Secure deployment concepts
5. What we are covering
Some Coinbase context
Preparation - things that we do because containers make it necessary or
because containers make it easier/possible
Detection - the different things we can or must do to detect evil in our
environment
Response - things that we can do once we do detect evil
8. Preparation
Challenge: System and Software inventories are on essentially every list of
critical security controls everywhere. How do we gain visibility into the
container OS and software, and how we can reconstruct historical inventory
when it changes so rapidly?
Response:
Establish managed base container OSs
Only allow deployment of whitelisted containers
Use Claire to scan *and log* container packages in the CD pipeline
Controls on Dockerfile contents (e.g. don’t allow RUN curl > filesystem)
9. Preparation
Opportunity: Containers should be single purpose and don’t generally need a
full userspace environment or the ability to call all syscalls.
Response:
Make the managed base OSs as minimal as possible (Alpine is great for this)
If you can, abstract this away from developers entirely
We actually use scratch containers for some Go services
Docker already has sane defaults for capabilities and seccomp, don’t turn them off
Because we follow an immutable deployment concept, we can actually deploy many containers
fully read-only
10. Detection
Challenge: Containers, by design, should be a single purpose environment.
That means no room for agent-based security solutions. How do we get
telemetry?
Response:
There are a number of vendors out there that answer this question in a bunch of different ways.
I’m not going to talk about any of them
We answer this question using a combo of surveillance from the host OS based on auditd (and
one day soon eBPF!), sidecar containers that inspect specific things (e.g. DNS logging, rolling
pcap, etc) and very verbose application logs
All of this is shoved into a Kinesis stream and sucked into our log pipeline
11. Detection
Opportunity: Containers, by design, should be a single purpose environment.
Does that mean we can do real whitelisting?
Response: Mostly.
Whitelisting exec calls per process/container
Whitelisting connect calls per process/container
12. Response
Challenge: If attacker dwell time is measured in weeks or months and container
lifetimes are measured in hours or days, how do you effectively investigate
the full scope of a breach?
Response:
This is one of the core problems, to me, in an environment with broad adoption of CD.
Log everything (audit, docker logs, system logs, etc)
Enrich log lines when they are logged (e.g. IP 10.2.3.4 may be hosting app A today, but app B by
the time a breach is detected)
Keep logs for years (even if they fall into cold storage after a while)
For highly vulnerable or critical services, consider saving some filesystem or memory artifacts
13. Response
Challenge: How do I isolate a container for forensics?
Response:
At the AWS level this is fairly well understood. Changing the security group for an instance is
possible to do on the fly and an effective way to ensure that the potentially compromised host
can only talk to IR tooling.
At the host level there are a few options:
docker pause will pause all processes in the container. This can be useful if you need to
mitigate an incident but can’t investigate this host right now
Network isolation using docker network disconnect (unless you are using --network=host)
Network isolation using iptables and the DOCKER-USER chain (docker 17+)
14. Response
Challenge: How can I do live response on a container?
Response:
Standard docker commands provide a lot of insight (more in a sec)
inspect, diff, cp, export, pause, exec
But fundamentally, docker containers are reflected on the host as some processes with specific
restrictions, so most of your normal tools will mostly work (e.g. strace, gdb, etc)
You can even grab process memory directly using /proc/$pid/mem and /proc/$pid/maps or
something like gcore from the host
A full memory image of the host will also capture the running processes, but a bunch of details
will be wacky because of namespaces (e.g. paths, PIDs, UIDs, maybe IPs, hostnames, etc).
15. Response
Challenge: How can I do live response on a container?
Response:
Standard docker commands provide a lot of insight
docker inspect - dump container metadata
docker diff - diff a running container against the base image
docker cp - move files in and out of a running container from the host
docker export - create a tar of the current state of a running instance’s filesystem
docker pause - pause the target instance (using cgoup freezer)
16. Response
Challenge: How do I respond if I have no access?
Response:
Automate your response actions, make it a single script that auto-fetches dependencies
It’s actually OK if this is loud in your logging/monitoring. You probably should alert when
someone loads a new kernel module for memory dumping
Have a process poll SQS (or whatever) for a signal that it should kick off a response
I strongly suggest you follow the GRR model and make sure the commands in that channel
are signed and authorized keys are hardcoded in the response script
Once your response runs, tar it up and encrypt it back to the key that signed to the command (or
some central key or whatever works for your setup) and upload it to an S3 bucket
3
Define containerized, immutable, CD
Take a quick poll:
Who has some experience with containers?
Who has used or dealt with containers in production at scale?
Immutable
CD
3
5
We’re talking about what we do in the context of Coinbase, so a brief intro to the environment is in order. (If you want a deeper look our infra team talks and writes about this stuff a lot.)
Codeflow
Geoengineer
Deployment concepts
Developer contract
5
Alpha support in Kub for whitelisted containers/sources