As IT goes through a tectonic shift from cost center to profit center, the importance of IT operations is also being redefined. If you are primarily worried about uptime and server-to-admin ratios, you are already behind the times. Today's top performing operations professionals are obsessed with how they can enable the business to innovate quicker, react quicker to market and competitive demands, and scale up for the win. With a heathy doses of DevOps, Lean thinking, infrastructure as code, and open source tools like RunDeck, Puppet and Chef, these top performers are turning their operations from a necessary cost center into a strategic weapon.
Speaker Bio
John Willis has worked in the IT management industry for more than 30 years. Prior to joining enStratus, Willis was the VP of Solutions for DTO Solutions where he led the transition to a new suite of automated infrastructure and DevOps solutions. Prior to DTO Solutions. Willis was the VP of Training & Services at Opscode where he formalized the training, evangelism, and professional services functions at the firm. Willis also founded Gulf Breeze Software, an award winning IBM business partner, which specializes in deploying Tivoli technology for the enterprise. Willis has authored six IBM Redbooks for IBM on enterprise systems management and was the founder and chief architect at Chain Bridge Systems.
--Tweet best describtion to date... --Devops imo is a convergence of a few different paths --Here is my view of how we arrived at here in this room...
--Tweet best describtion to date... --Devops imo is a convergence of a few different paths --Here is my view of how we arrived at here in this room...
--Tweet best describtion to date... --Devops imo is a convergence of a few different paths --Here is my view of how we arrived at here in this room...
--Tweet best describtion to date... --Devops imo is a convergence of a few different paths --Here is my view of how we arrived at here in this room...
--Andrew and Patrick did an Agile Infrastructure BOF --No one showed up... --2009 First Devops Days in Ghent --Google Devops
--Can you throw it out the window test....
--There are a lot of object/components in Chef --There in the problem lies --I like to focus on these primary three to get started
--Explain Chef Solo first.. Installing the Client on any machine installs Solo --Then explain Chef server --Hosted Chef --Provate Chef
"Lean," is a production practice that considers the expenditure of resources for any goal other than the creation of value for the end customer to be wasteful, and thus a target for elimination. Lean manufacturing is a management philosophy derived mostly from the Toyota Production System (TPS) Examples of such "tools" are Value Stream Mapping , Fi ve S, Kanban (pull s yste ms), and p ok a-yoke (erro r-proofi ng). Cycle time : the time it tak es to go through the whole process. If this can be shor tened it means that the process can run much smoother Quality : they have a zero defect tolerance, where they understand whatever defect that will occur during manufacturing will be easier and less costly to fix as soon as possible within the process Synchronization : make the different processes work together in lock step together and also work at the requested speed of the customer (pull not push)
--Customer Development methodology--Customer discovery--Eric was a founder and CTO at IMVU--50 to 100 deploys a day
-How many ppl are familiar with the metaphor used in softw dev called “ TD ” ? -In softw dev you typally have two choices get it done quick and take hit of future issues or spend to time on a cleaner design, but will take longer to put in place. -Quick typically p with a technical debt, which is similar to a financial debt.interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. - -Most of you are probably familiar with “ Software ” TD -I am going to tell you a story about “ Infrastructure TD ”
-How many ppl know who these guys are? -The dudes who invented Facebook.. rght? -However, this was not there first venture. -Green vs Red Widgits compnay story. -After many arguments the both decide to split up. Cameron to make green widgets and Tyler decides to make red widgets. -They both go to their father each ask for a 1m dollars and the father asks how much money are you going to make. They both say 10 million. -Father calculates the ROR to be 900% and gives em each 1m. 10-1 = 9/1 = 900%-However one of them lied.... The red widgets only return 233%10-1-2 = 7/3 = 233%
It gets even worse. Isreal Ghat of the Cutter Group call the vicious cycle of TD. You wind up fixing a lot of things that you didn ’ t fix in the first place.. this pulls more resources from delivery good service and compound effect is that you are spending more and more resources that you should have gotten right the firs time.. but ever worse the effect of customer satisfaction starts loosing more business and the “ V ” cycle is out of control. TD->VTD->Toxic operations->Terminal I call this are you running a business or building a business? Toxic operations. Amertrade/etrade story
Jesse Robbins my ex boss and CEO of Opscode/Chef did a great post on O ’ rielly rdar a few years ago called the Tale of Two Startups. The chart looked like this first 4 weeks.First chart legacy (I call it the non devops startup/project)Second Chart is the (secret sauce startup ... #devops)I played around with this using “ R ” to be cool and I came up with a=140% ROR and 700% ROR
Alistar Croll has a great post called the meat to math ratio. Amazon had $12.95B in Q410 revenues and 33,700 employees, revenue per employee of $384,273. For Barnes & Noble: Barnes & Noble: $1.91B in Q410 revenues, and 35,000 employees, meaning a revenue per employee of $54,571. Netflix: $444M in Q409 revenues, and 1,000 employees, meaning a revenue-per-employee of $444,000. Blockbuster: $400M in Q409 revenues. The company peaked at 60,000 employees iDropbox: In Q211 Dropbox had $25M in revenues, and 74 employees, for a revenue per employee of $338K.
I got into a Twttier argument that went like this... See “ Cloud Gone Wrong ”
-So let ’ s start the story in the begining... Biz guy got a great idea for a service. -They used a classic web2.0 app architecture: apache/php, memcached & mysql. Development done on a single server -Production ran on EC2 and they used rightscale server templates -Release done by pushing code and assets to s3 buckets and then running a parallel SSH scripts to distribute them -This approach seemed to work... They got up to a few hundred nodes pretty fast.. business was cooking
-First service was such a huge success they decided to launch other sevices -So they ” Copy and pasted ” the whole architecture and lifecycle to launch the new businesses -Each new group pushed assets to s3, scripted the distribution, and hacked the rightscripts and templates -Things were obviously getting more complicated, so they did what they were supposed to do and added centralized tooling like puppet and yum -They thought they were doing things the cloud way and that all would be fine
First we got the team on the whiteboard to map out the “ as is ” picture. This is a, believe it or not, a simplified version of that. Some of the highlights... -First you ’ ll notice that different groups had their own path to production... different methods of control, provisioning, and release. -Each group and role seemed to have a different way to editing or storing config -There were differing ways of packaging software... sometimes it might be a .tar.gz other times it might be an RPM. -Shockingly... Some things were even being built directly on production servers. -There was no authoritative source of information is maintained about nodes, application topology, software versions, etc.... -Everything was being stored in S3 buckets... which is great because it ’ s so easy to use... but it ’ s unversioned and people would just upload whatever was newest for them. But that stuff was never really tested in unison.. so old stuff wasn ’ t working with new stuff... and its was unclear what was different or why it was different. -We can go on... but you should get the point that they had the right cloud, the right tools, and lots of smart people, but it all got dangerously out of control very quickly Other: -Changes hit all customers at once => Puppet configs in unversioned S3 buckets -Buggy node classification causing provisioning problems = > complex/long node classifier script - ” Dead boxes ” after provisioning => rightscript/puppet ordering problems - ” my box got clobbered! ” => puppet, is it supposed to be on or off? -new environment setup was taking longer and longer => from days to weeks because of “ fooled by false horizons ” - ” is the system ready yet? ” => nobody knows what “ ready ” means -Scripts contained a list of role-to-node lists to put things in the right places - scripts crap out on nodes taken out of commission - ” software works differently ” => rightscale driven compile/installs
I won ’ t go into too much detail about the tooling that was put into place to support all of this but here are some highlights.. -took a loosely coupled toolchain approach... using mostly open source tools -This became their standard stack of “ operations middleware ” . Of course, we are all used to the notion of application middleware... but to an online service, the management infrastructure is just as much a part of the service as the application itself. -This operations middleware stack is a first class citizen along with application stack and it all goes through the same SDLC... everything is versioned, built, deployed, and packaged via the same process -Once in place, this middleware provides a single path for releasing, provisioning, and controlling anything that goes into an environment. Other: Management infrastructure based on “ swap-able ” sets of integrated tools - Organized into three rough categories: Control, Provisioning, Monitoring - Control tools support routine and ad hoc procedures executed as commands/scripts - Provisioning tools support package delivery and post install customization - Monitoring tools actively check health and collect log data When you think of middleware, you think of where your app code works.. but in the service world, you have an operations middleware that is just as important. All the provisioning and management stuff... it ’ s just as important. It ’ s one and the same. Solve the information problem... where is the system of record? easy in cloud to get basic node data... that comes from the compute service... but what about everything else you need to manage your infrastructure? Key integrations: - SVN drives everything in the tool chain! - Rundeck synchronizes to RS; must be connected to compute service to know what nodes are provisioned and ready - All packages come through yum Infrastructure SDLC