2. Ryan Hunter
● SRE Lead @ TrueMotion
○ First Backend Engineer 3 years ago
○ Moved to operations in search of new
challenges!
● I’m an automation fanatic!
● When I’m not working to make on-call a
thing of the past, I enjoy:
○ Diving
○ Hiking
○ Building drones and other useless
contraptions in my basement
3. Pets Cattle A Herd
Servers as...
Infrastructure Evolves with a Company
4. Why did we Switch to Containers?
November 2016
3.05%
5. Why did we Switch to Containers?
● Debian based deploys
o Great so long as all you
dependencies were in debians too
● Ansible: Build the server from scratch
o External dependency hell
● Neither flexible or reliable
● Minimum provisioning size was too large
● A more flexible build artifact
● Decouple instance size from
application software
● A common, preloaded AMI could
be used to run all (most) services
Where we started... Where we wanted to go...
6. What did Docker give us?
●A flexible, portable, runtime artifact
■ Described runtime requirements
■ Memory/CPU requirements
●An ecosystem of tools to manage, version, and
develop these containers
7. What Docker didn’t give us
●Really nice match for stateless services
●Stateful containers ARE possible, but significantly
complicates scheduling
8. What Docker didn’t give us
●How do you…
...these containers?
○ schedule
○ provision
○ discover (and monitor)
○ configure
14. Provision - Why Cloudformation
●Well integrated with AWS
●We can provision both docker containers and
infrastructure in one template (because we use
ECS)
●AWS Supported
●Parameter Validation
15. Provision - Why Cloudformation
Application CodeDependencies
Docker Container
Cloudformation
Template
Lambda Code
Lambda Zip
Package
Versioned Cloudformation Template
Deployed Cloudformation Stack
Develop
Build
Package
Deploy
Stamp Template
Each Service is deployed via a Cloudformation stack
16. Provision - Why Cloudformation
stacks:
- name: prod
template: prod-env
region: us-east-1
version: prod
parameters:
EIPList: <redacted>
EnvCIDR: 16
EnvMaturity: prod
PagerDutyKey: {{ pagerduty_key }}
RDSPassword: {{ rds_password }}
- name: prod-etl
template: dw-etl
region: us-east-1
version: "92"
parameters:
DesiredInstanceCount: 6
EnvironmentName: prod
EnvMaturity: prod
...
● Each service pushes a template with a name
and a version to S3
● That template has all the application
dependencies hardcoded (docker container
version, lambdas, etc)
● Each environment has its own repo containing
a deploy.yaml
18. Discover (and monitor)
●We use Registrator to join new containers to
consul
●Custom version that supports services without
exposed ports
●Loadbalancers (internal and external) are
configured via consul to route traffic to the
appropriate container
19. Monitor (Is my service up?)
●Consul Docker exec health checks
are very powerful
●Docker also has a new health check
API!
●Configured via Registrator
Consul Agent
My Service Container
health-check.py
My Service Check
Docker Host
20. Monitor (Logging)
●Sumo provides a docker log collector
●Wrote a script that fetches containers and assigns
source category based on the container type
●Runs as a container on each docker host
_sourceCategory = <Environment name>/<Service Name>/<Environment Maturity>
21. Monitor (Whitebox)
●Traffic - Requests per second, trips per second
●Errors - Rate of status codes and error logs
●Latency - How long does the service take to do a
unit of work
●Saturation - How do I know I need to scale out?
●Consul Check (is it up?)
22. Monitor (Whitebox)
●We have very similar services
■ Webservice (http)
■ Data pipeline (etl, trip processing)
●TruMonitor library
■ Common monitoring tools library
■ UNVERSIONED - controversial
24. Last Mile Configuration
●Cloudformation provides
a parameter interface
■ Pass on to container via
Environment Variables
■ AWS infrastructure can be
passed in directly
●Per Company Configs
■ Consul K/V + consul-template
stacks:
- name: prod
template: prod-env
region: us-east-1
version: prod
parameters:
EIPList: <redacted>
EnvCIDR: 16
EnvMaturity: prod
PagerDutyKey: {{ pagerduty_key }}
RDSPassword: {{ rds_password }}
...
25. Consul + Consul Template
Consul Cluster
Consul
Template
Config File
Application Process
Exec
PublishEntrypoint
Docker Container
● Great for configs to complex
for params
● Git2consul will sync configs
in VCS with cluster
● Parameter validation
matters!
■ Wrote SOME test
coverage using
JSONSchema
26. What about secrets storage?
●Initially used KMS Encrypted
values decrypted with consul-
template plugin
●DO NOT write consul template
plugins with blocking/high
latency calls
27. What we did instead
●Borrowed from the ansible-vault concept
●Encrypted “privates” file inside environment repo
●Populate cloudformation parameters using Jinja2
●Works well enough… will not work for per
company config values
28. Conclusions
●Developer training is hard: example repos work
REALLY well
●Secrets management requires some forethought
●Jenkins Pipelines is very powerful…
●Spend time automating creating and removing ECS
nodes
●Auto Scaling a docker cluster is nuanced!
29. Want to Help? We’re Hiring!
●I’m looking for backend software engineers with a
passion for automation
●Talk to me!
●… or https://gotruemotion.com/careers/
33. EC2 Instance
Today’s Pipeline
Build Scripts
Debian
Pip
Gemfury
Ansible EC2 Instance provision.py
● Inflexible
● Jobs managed
through UI
● Restricted
versioning
convention
● Supports only
specific
distro/version
● Pip doesn’t
enforce
dependencies
for crap!
● Gemfury goes
down!
● Instance config
is in a separate
repo from
service code
● We can’t
version
configuration
against services
● Lots of tight
coupling
between service
roles
● Fails a LOT!
● Services tied to
instance
● Instance type
for a service
defined globally
● Manual process
to provision
instances and
other AWS
resources
● AWS instance
provisioning is
entirely manual
● Difficult to
automate
● Too easy to
create and
forget about
instances
34. EC2 Instance
Cloudformation/Docker Pipeline
Jenkins
Pipelines
Docker
CF
Template
CF Pipeline
ECS Cluster
Environment
Config
● Resources
defined per
service
● Configs
validated per
service
● Leverage
docker as a
common
runtime
framework
● Build process
definition lives in
service repo
● Common
processes can
be defined via
global library
● Use docker to
provide build
dependencies
● Cloudformation
templates are
used as the
deployment
artifact
● Environment
updates via
code review
● Tight coupling
between resource
requirements and
resources
provisioned
● Ability to use spot
fleet/spot instances