DevEX - reference for building teams, processes, and platforms
Our long road to….continuous improvement (DevOps Days Boston 2014)
1. Our long road to….
continuous improvement
Kevin Amorin
2. BitSight Team
15 yrs in Enterprise & Startups
Enterprise Internal IT
IT Virtualization Software
SaaS low latency, high volume
SaaS Big Data, Analytics
team:
Issa Ashwash
Isaac Boehman
Sathya Ragavan
Pavel Sadikov
K e v i
n
3. starting the journey
it’s a journey not a destination
no silver bullet
“always make new mistakes”
do not re-invent wheel
continuous improvement
find and reduce the bottleneck (Lean)
10. infrastructure v1
what does this server do?
symptom: lost time on debugging/managing
products in data center
cause: organic growth, little planning
improvement: redesign infrastructure &
process for naming (host+subnet+app), access
(vpn), users, deployment
11. build v1
build is broken!
symptom: lost time on broken build debugging
cause: lack of uniform build environment,
committed code with minimal testing and review
improvement: centralized build & CI, branching,
pull requests, additional unit tests
12. deployment v1
that system doesn’t have my fix.
symptom: lost time debugging on wrong code
cause: lack of revisioned artifacts
improvement: central artifact repo each
package with banch, time, commit
13. communication v1
what new feature?
symptom: lost time with misinformation
cause: siloing of information
improvement: chat, single issue tracking &
process, representatives in standup &
retrospective
14. infrastructure v2
that system is missing pip module.
symptom: lost time debugging misconfigured
application & systems
cause: lack of consistency of systems &
applications
improvement: config management,
monitoring & alerting
15. build v2
ran in my local system…
symptom: code freeze branch would not run
end to end. lost time debugging which change
caused issue
cause: lack of regression/functional test
improvement: functional tests & require it to run
before merge
16. deployment v2
db is not correct?
symptom: database schema/data did not match
code
cause: inconsistent process & manual steps on
schema/data updates
improvement: db schema management tool +
process
17. deployment v3
new server didn’t come up right?
symptom: lost time with misconfigured or failed
provisioned nodes in AWS
cause: inconsistent semi-automated provisioning
steps did not have the flexibility needed for a
growing product line
improvement: knife-bs provisioning & deployment
http://github.com/CBitLabs/knife-bs
18. provisioning research & design
Netflix Asgard: Web interface for application
deployments and cloud management in Amazon
Web Services (AWS)
Infochimps Ironfan: Chef orchestration layer --
your system diagram come to life
Chef Knife-ec2: plugin gives knife the ability to
create, bootstrap, and manage EC2 instances.
19. knife-bs
cloud provisioning tool build on top of
opscode/knife-ec2. Using a description of
your infrastructure and stacks (in either
YAML or JSON), knife-bs will build correct
the stack in the correct environment and
bootstrap chef.
22. infrastructure v3
can I grab a cluster?
symptom: who is using what, what state is it in?
cause: lack of visibility of ownership & state of
application
improvement: infrastructure web UI which
overlays org meta-data
http://github.com/CBitLabs/atlas