As a product grows, and the infrastructure becomes more complex, the Operations team traditionally shoulders the burden of maintaining this infrastructure while deploying code from Software Engineers. Code is sometimes given to Operations with little to no information regarding how it should run or what the criteria for successful deployment is. This is not due to lack of caring, Software Engineers often lack the context themselves to provide production deployment instructions. To Software Engineers, production can be like a walled off city, filled with pathways and rooms not to be explored, guarded by Operations.
This presentation aims to provide a solution to this problem. We will address how the traditional separation of Operations and Software Engineers slows innovation, and redefine their relationship -- blending responsibilities. We will examine the transition of two real teams, an Operations team and Engineering team, from complete isolation, to closer environments through virtual machines, to one cloud environment shared by all and managed with CloudFormation.
Tear It Down, Build It Back Up: Empowering Developers with Amazon CloudFormation
1. James Andrew Vaughn (Andy) @MindTouch
Tear It Down, Build It Back Up:
Empowering Developers with
Amazon CloudFormation
2. James Andrew Vaughn (Andy)
• Software Architect at MindTouch
• @modethirteen on Twitter & GitHub
• Interests
• Software Build and Testing Automation
• Frontend Web Performance
• Web Components & Polymer
• SSO and Identity Management
3. @modethirteen
Agenda
• What is Amazon CloudFormation? Why use it?
• Managing your release testing and production infrastructure
code
• Give developers the power (`cause knowledge is power!)
6. @modethirteen
All of our customers host their brand on our
common, hosted infrastructure. One mistake
and all customer brands look bad #yousuck
7. @modethirteen
Before CloudFormation
• Infrastructure had grown organically over years
• Hand rolled scripts with boto.py to create different EC2 instance
types, and manual Puppet runs to configure them
• Non EC2 AWS Resources managed by hand
• No infrastructure in different zones or fast, programatic disaster
recovery for entire infrastructure
• Developers were ignorant of production infrastructure
12. @modethirteen
CloudFormation: Define creation of AWS
resources (EC2 as well as Security
Groups, SQS, RDS, etc)
Puppet, Chef, SaltStack, Ansible: Define
actions that occur within EC2 instances
once they’ve been provisioned
13. @modethirteen
CloudFormation vs Terraform
• Access to nearly every AWS
resource. Better support for
VPC, Security Groups, IAM,
Cloudfront, SQS
• Stable and mature
• JSON infrastructure templates
can be generated by
Troposphere (with Python
logic)
• Vendor neutrality: AWS, OpenStack, Heroku,
etc
• Can execute infrastructure plans as a dry run
• DSL for generating infrastructure templates
(HCL)
• If one resource fails to build, subsequent
rebuild will only build tainted resource and
those dependent on it
• Open source so AWS API coverage can be
improved by community
Google Docs: Terraform AWS Coverage
16. @modethirteen
CloudFormation Stacks
Resources are things that can be queried, configured in the AWS API (including CloudFormation
sub stacks). Examples: Listing S3 buckets, Adding Route 53 DNS entries, Taking DB snapshots
20. @modethirteen
• MySQL Storage Engine
App Server Pool
Stack
Database
Stack
ElasticSearch
Stack
App Server Pool
Stack
Main Stack
• ElasticSearch Version
• App Server Pool EC2 Group Name
• ElasticSearch EC2 Group Name
• RDS MySQL IP & Port
22. @modethirteen
Template: {…}
App Server Pool
Stack
Database
Stack
ElasticSearch
Stack
App Server Pool
Stack
Main Stack
• MySQL Storage Engine
• ElasticSearch EC2 Group Name
• RDS MySQL IP & Port
• ElasticSearch Version
• App Server Pool EC2 Group Name
Template: {…}
26. @modethirteen
Puppet / Chef / SaltStack / Ansible
• Stack includes an EC2 Instance or AutoScaling Group Resource
• Resource includes a “UserData” metadata section, for bootstrapping an instance or group of
instances
• Include data that cloud-init uses to install instance configuration tool of choice
• curl http://169.254.169.254/latest/user-data
• Example:
• cloud-init installs puppet from UserData commands
• cloud-init runs puppet (configures instance and installs cfn-signal)
• cfn-signal notifies CloudFormation that puppet was success or failure
28. @modethirteen
Lessons Learned
• Goal was to put entire existing AWS infrastructure into CloudFormation, no
immediate value was attained
• Difficult getting buy in for incremental improvements to infrastructure
management
• Existing resources cannot be migrated to CloudFormation
• Know the caveats of deleting AWS Resources, they can fail a stack tear down
• AWS Resources missing from CloudFormation API can be mitigated with
Custom Resources
• Must understand what a resource does when it updates
34. @modethirteen
The Teams
• Are developer teams responsible for their own container /
infrastructure templates, are operators part of these teams
• Are developers just as responsible for troubleshooting when
infrastructure goes down
• What are operator obligations to developers
• What are developer obligations to operations
35. @modethirteen
TL;DR
• Your product is application code, data, services, and servers
• CloudFormation deploys your product to production
• CloudFormation deploys your product for development and testing
• Your developers can make better decisions
• Your operators can make better decisions
• Your customers / users are happy
devs have control over app knowledge, ops control production with free flow of ideas between teams
there may be biz blockers to free flow, not technical ones
What’s wrong with run scripts or just using puppet or ansible?
saas platform for hosting customer support sites for other companies and brands. Our customers feed the engine with support knowledgeable articles, product information, howto’s and they get this nice branded support site with all sorts of algorithms to rank content by usefulness, and guide their users to the most successful articles. so their users become more knowledgable about their products and become smarter, and more successful with their products.
* product manager speak aside, our product is code, customer data, servers and services
our product exists to turn unhappy people into happy ones
weekly oppty to make the company look bad
Downtime affects other brands, SLA’s, lawsuits
Love sam ramji’s vision for cloud foundry, early advisor to mind touch when it was a purely open source product, but
We committed to AWS, specializing on a platform to take advantage of all the features that AWS has, not hedging our bets with the common functionality openstack and aws share
no one has ever considered aws a bad decision
developers could not explore AWS features, devs had Virtual Machines with “production-like” AWS resources (mocks)
* audit trail is AWS billing record? wth?
code revs
stacks are like sep of concerns (modules)
stacks have parameter and resource limits
group stacks with resources that should move together, common lifecycle, common security
separate app from state
WaitCondition: we don’t have the same insights into when things are done
A is done then B (dependency management, and order enforcement)
Atom: Template JSON
Atom: Troposphere
Maintained by Canonical
Available on AMI or any dist
S3 bucket will need to be secured for cloudformation
something needs to drive biz decision (disaster recovery)
CloudFormation can automate the creation of an S3 bucket, cannot automate the deletion of an S3 bucket with objects (Stack delete will fail!)
CloudFormation cannot handle the configuration of the Cloudfront CDN OriginAccessIdentityUser (User that can talk to a private S3 bucket)
Remember another goal was to empower developers, to get them working closely with operators, using cloudformation as a mechanism to strengthen this relationship
not saying this is going to happen, but you may want to consider your team and that their creativness may lead to unintended consequences. operators need to be hands off from the stack creation process, so that continuous innovation isn’t blocked, so i have some approaches to mitigating risk
Turn off AWS Console access for developers
Turn on API access for a developer portal
Pro’s: all things that work in the production env should work in test/dev (same level of internal access to resources)
Con’s: build, walled off
Consolidated billing
checkout app & infra code, run from cmd
Probably built CFN in God Mode
Some resources may not be accessible by dev accts, refactor
Pro and Con: Developers can mess with other developer’s stacks - practical jokes
Security when team member leaves
* the intent here is once dev’s and ops are working together, a number of new questions arise, which is good.
manage docker container, cfn templates for the services required by the things in that container
my feeling on the last bit is operators provide working infra templates to devs, dev’s understand what the reprucussions of their decisions on infra — hammering a service, does their code make an ec2 process get cpu bound