SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Changing 
wheels of a moving car 
Replacing core technologies in a growing 
startup 
Michael Neale 
CloudBees 
#DV14 @michaelneale
This talk 
• lucky early decisions 
• transitions and containers 
• lessons learned on changing 
continuously 
• finally: monitoring, alerting, 
health - ops for devs. 
(rarely talked about) 
#DV14 @michaelneale
ABOUT ME 
• Co-founder CloudBees (the Jenkins company) 
• Developer with an interest in Ops 
• built DEV@cloud RUN@cloud 
#DV14 @michaelneale
Working with Cloud Platforms 
• not as “friendly” as traditional hosting: 
• Awesome power at fingertips: try everything, try all hardware 
• Iterate rapidly 
• But: 
• APIs have lower QoS than hosts 
• Servers are cattle, not pets 
• Jenkins (and others) still need filesystems (not always easy on cloud) 
• multi tenancy for scale/cost 
#DV14 @michaelneale
Lucky decisions we made 
• Isolate EC2 apis with fault tolerant REST app for provisioning 
• API can behave strangely - backoff and retry, API limits and more 
• Build pathological API simulator 
• Enable replacement of servers via termination 
• “chaos monkey” approach 
• Reality: I didn’t understand chef. So replace AMI by terminating, new 
latest takes its place 
• Done as a “hack” but core platform value today 
• ie. we are always changing, always replacing “naturally” 
#DV14 @michaelneale
NetflixOSS productised this! 
https://github.com/Netflix/Hystrix 
https://github.com/Netflix/SimianArmy 
http://netflix.github.io 
netflixoss.ci.cloudbees.com 
#DV14 @michaelneale
Chaos monkeying to upgrade 
• OS change: new AMI == terminate, let system replace 
• (in ec2: autoscale groups can do this for you) 
• Security patch? == terminate. 
• Server a bit sick? TERMINATE 
• (we actually use chef for minor config changes and some app 
level upgrades… relax…) 
• If in doubt.. you get the idea… 
#DV14 @michaelneale
A bad year for security 
• Heartbleed 
• Shell-shock 
• POODLE 
• XEN guest flaw, aws-reboot-a-thon 
#DV14 @michaelneale
But a great year for logos: 
Xen 
#DV14 @michaelneale
Upgrades… 
• In place or… TERMINATE? 
• Often easier and safer to swap out: 
• eg revproxy (nginx) cluster replacement process: 
• warm new server, cut over IP and traffic, terminate old 
• No half-measures, half-upgrades, clean slate 
• (elastic IP helped in this case) 
#DV14 @michaelneale
More benefits of terminate … 
• “Retirement notices” from AWS - daily event! 
• Even “new” servers - 3 days until “retire” 
• No you can’t see the server in retirement home. 
• Reboot at some vague time - TERMINATE 
• Encourages immutable servers 
• predictable state 
• security advantages of being “locked down” in image 
#DV14 @michaelneale
But what about data… 
• Some say filesystem dependency is “legacy” 
• I say “you aren’t trying hard enough” 
• APIs such as EBS allow quick volume creation based on 
snapshots: 
• Continuous (delta) snapshotting of data 
• Can quickly restore service in healthy data centers 
• Faster time to recovery, route around failing zones 
• Ideal: use distributed data in all forms if you can! 
#DV14 @michaelneale
Containment challenge: 
APPS 
JENKINS 
MASTERS 
BUILD 
EXECUTORS 
#DV14 @michaelneale
Containment 
• Apps (paas) can do anything 
• Builds DO do anything 
• Need a clean slate for users 
• Process cleanup 
• Jenkins masters have plugins 
• Multi tenancy: cost effective, higher density, better elasticity 
(fine grained processes vs autoscale groups) 
#DV14 @michaelneale
Containment Evolution 
• Unix user isolation + cgroups 
• LXC (builds on cgroups, namespaces) 
• Docker (builds on cgroups, namespaces, NOT lxc) 
• Natural current end point and so hot right now: 
#DV14 @michaelneale
Containment challenge: 
http://developer-blog.cloudbees.com/2013/05/inside-linux-containers-lxc-with.html 
#DV14 @michaelneale
Security benefits of containers? 
• Not complete 
• Not a replacement for current measures, but help 
• Lots of (changing) content online 
• Next: linux user-namespace for “fake root user” 
• “coming real soon now??” already in lxc, not in docker at this time. 
#DV14 @michaelneale
Transition of a build service 
• Initial: discrete build nodes, “recycled” between use 
• Pools with “mark and sweep” garbage collection of unused build 
servers 
• unix user and cgroup/namespace isolation 
• Attach build data from snapshots 
#DV14 @michaelneale
Transition of a build service 
• Next: use LXC for containment isolation 
• Finally: Use multi-tenant pools with full container isolation 
• Pool disks for IO and EBS resilience (ZFS) 
• Use larger more economical server (more burst power) 
• Consistent hashing to get server with warm “build cache” 
• (sorry if your maven re-downloads the world, hopefully not all the 
time) 
#DV14 @michaelneale
Transition of a build service 
• Done continually over a year 
• Limited user opt-in/out, majority do not notice 
• Strategy options: 
• roll out to 10%, 50% 
• roll out to tiered users (ie freemium users get new/unstable?) 
• roll out to all - incremental uptake due to natural restarting/ 
reprovisioning 
• ALWAYS dog food 
#DV14 @michaelneale
Dog food 
• Always roll out to self first 
• (occasionally joyously discover bootstrapping problem if it goes bad!) 
• True indicator of confidence 
• We get used to change, from users point of view 
#DV14 @michaelneale
• How we apply Jenkins with CD: 
UPSTREAM 
CHANGE 
terminate at any time 
CHEF RECIPE 
MASTER BRANCH TEST ENV 
CHEF RECIPE 
PRODUCTION BRANCH 
PROD ENV 
rollout 
strategy 
#DV14 @michaelneale
#DV14 @michaelneale
Wide feedback 
• Provide something community want to try: 
• https://registry.hub.docker.com/_/jenkins/ 
• Helps them, helps us learn 
#DV14 @michaelneale
Lessons on continual change 
• Cost of change == F(gap between deployments) 
• CD etc etc (you will hear a lot elsewhere) 
• Keep MTTR (mean time to recovery) low 
• If short enough, people will blame internet connection (ssshhhh) 
#DV14 @michaelneale
Lessons on continual change 
• Always be doing DR 
• People ask about “DR” strategy 
• If you DR often, then it isn’t really DR - just BAU*, TMA*? 
• Normal service restoration and termination exercises “backups” 
#DV14 @michaelneale
Changes in a SaaS 
• If people use a SaaS, upgrades/change expected 
• Communicate to users on changes, let them know how much work 
you do for them! It isn’t easy! 
• Some changes visible, some not (some you thought invisible, but 
were visible) - let people know. 
• Even outages can create good will: 
• Explanations and understanding == appreciation, it happens 
• Proactive security patching this year 
• “we don’t want to run this ourselves” 
#DV14 @michaelneale
Monitoring and alerting 
• Not often talked about in classic dev circles 
• Increasingly passionate in “devops” circles (monitorama) 
• Alerting a staple of traditional ops and “on call” 
• These roles now smearing out amongst all devs 
#DV14 @michaelneale
Why monitoring? 
• SaaS always changing 
• The Question: 
• Are things better or worse than before? 
• Did the change make things better or worse 
• Not so much: 
• Is everything perfect (it won’t be) 
#DV14 @michaelneale
Monitoring and alerting 
• Roughly split: 
• “check engines” (nagios, pingdom etc) 
• receive events, work if service up/down 
• “notifications” - pagerduty and email, sms 
• tell people about things 
• analytics and monitoring (librato, boundary, new relic and more) 
• DASHBOARDS AND GRAPHS EVERYWHERE 
#DV14 @michaelneale
Analytics 
Checks 
#DV14 #YourTag @michaelneale
All exist to inform you 
• Graphic dashboards can overwhelm 
• Some people treat them as end goal 
• Too much information often - are things OK Y/N? 
• Aim is to get insight (eg new relic like an online profiler) 
WHEN problems are happening 
• Aim is to tell people when problems are happening 
• Reports/graphs can be useful, but not at the expense of “health” 
monitoring 
#DV14 @michaelneale
If you must graph, a most important feature: 
Deploy happened here! 
#DV14 #YourTag @michaelneale
Alert and information fatigue 
• A real (world) problem: 
• http://fractio.nl/2014/08/26/cardiac-alarms-and-ops/ 
• Eg: cardiac monitors: 
• Thresholds adjusted until only life critical 
• No “ACK” of noisy alerts (no “WARNING”) 
• Increased urgency, but reduced volume 
• reduced noise, reduced fatigue and fatalities! (counterintuitive?) 
#DV14 @michaelneale
Alert and information fatigue 
• Avoid “warnings” that interrupt people 
• (remember each interruption is > 1 hour really) 
• Push messages to chat rooms “chat ops” 
• Allow people already distracted to act 
• Alerts/info as “streams” people can dip into and help out 
• Avoid escalation 
• Follow the sun support! (if your team has it! Great!) 
#DV14 @michaelneale
End to End test monitor 
• Why save testing for dev time only 
• Apply a kind of integration test to production 
• Can be a “synthetic transaction” 
• eg: signup, run some process, exit 
• Run it continually 
• Increases confidence 
• “Out Of Band End To End Test” “oobetet” 
• technically monitoring, not testing! 
#DV14 @michaelneale
Codahale metrics 
• https://dropwizard.github.io/metrics/3.1.0/ 
• Simple metrics to your app: 
• Binary health checks “foo.widget.thing is OK” 
• Numerical metrics: 
• Gauges, meters, histograms and more 
• Lots of statistical goodness baked in (so you don’t have to) 
• Expose via /health URL and JSON, push to metrics services and 
more (can use a servlet): 
#DV14 @michaelneale
Gauge measurement: 
metrics.register(“important thing”, ”size”), 
new Gauge<Integer>() { 
@Override 
public Integer getValue() { 
return queue.size(); 
} 
}); 
#DV14 #YourTag @YourTwitterHandle
trace percentile of times spent in.. 
private final Timer responses = metrics.timer(“important thing”); 
public String handleRequest(Request request, Response response) { 
final Timer.Context context = responses.time(); 
try { 
// do some work; 
return "OK"; 
} finally { 
context.stop(); 
} 
} 
#DV14 #YourTag @YourTwitterHandle
Minimal points to take away 
• Give codahale/dropwizard stuff a good look! 
• Instrument at least a /health check that can be wired in later 
• *think* about monitoring 
• Replace/restore as matter of “routine” 
• Change becomes the normal 
• Terminate, restart, are often an OK way to recover! 
#DV14 @michaelneale
Thank you! 
Questions? 
@michaelneale 
developer-blog.cloudbees.com 
#DV14 @michaelneale

Weitere ähnliche Inhalte

Was ist angesagt?

LJC: "Chuck Norris Doesn't Do DevOps...but Java developers might benefit"
LJC: "Chuck Norris Doesn't Do DevOps...but Java developers might benefit"LJC: "Chuck Norris Doesn't Do DevOps...but Java developers might benefit"
LJC: "Chuck Norris Doesn't Do DevOps...but Java developers might benefit"Daniel Bryant
 
LJC 05/14 "Cloud Developer's DHARMA"
LJC 05/14 "Cloud Developer's DHARMA"LJC 05/14 "Cloud Developer's DHARMA"
LJC 05/14 "Cloud Developer's DHARMA"Daniel Bryant
 
LSCC 2014 "Crafting DevOps: Applying Software Craftsmanship to DevOps"
LSCC 2014 "Crafting DevOps: Applying Software Craftsmanship to DevOps"LSCC 2014 "Crafting DevOps: Applying Software Craftsmanship to DevOps"
LSCC 2014 "Crafting DevOps: Applying Software Craftsmanship to DevOps"Daniel Bryant
 
Designing for Tomorrow, Delivering Today
Designing for Tomorrow, Delivering TodayDesigning for Tomorrow, Delivering Today
Designing for Tomorrow, Delivering TodayFrank Rydzewski
 
OpenStack at EBSCO
OpenStack at EBSCOOpenStack at EBSCO
OpenStack at EBSCOTesora
 
LJC 2015 "The Crafty Consultants Guide to DevOps"
LJC 2015 "The Crafty Consultants Guide to DevOps"LJC 2015 "The Crafty Consultants Guide to DevOps"
LJC 2015 "The Crafty Consultants Guide to DevOps"Daniel Bryant
 
Value streammapping cascadiait2014-mceniry
Value streammapping cascadiait2014-mceniryValue streammapping cascadiait2014-mceniry
Value streammapping cascadiait2014-mceniryChris McEniry
 
KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)Ford Prior
 
Dockercon USA 2016 - Immutable Awesomeness
Dockercon USA 2016 - Immutable Awesomeness Dockercon USA 2016 - Immutable Awesomeness
Dockercon USA 2016 - Immutable Awesomeness John Willis
 
LJCConf 2013 "Chuck Norris Doesn't Need DevOps"
LJCConf 2013 "Chuck Norris Doesn't Need DevOps"LJCConf 2013 "Chuck Norris Doesn't Need DevOps"
LJCConf 2013 "Chuck Norris Doesn't Need DevOps"Daniel Bryant
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Kris Buytaert
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Patrick McDonnell
 
Devops Devops Devops, at Froscon
Devops Devops Devops, at FrosconDevops Devops Devops, at Froscon
Devops Devops Devops, at FrosconKris Buytaert
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsMike Brittain
 
Puppet Camp Melbourne 2014:
Puppet Camp Melbourne 2014: Puppet Camp Melbourne 2014:
Puppet Camp Melbourne 2014: Puppet
 
Hashicorp at holaluz
Hashicorp at holaluzHashicorp at holaluz
Hashicorp at holaluzRicard Clau
 

Was ist angesagt? (20)

LJC: "Chuck Norris Doesn't Do DevOps...but Java developers might benefit"
LJC: "Chuck Norris Doesn't Do DevOps...but Java developers might benefit"LJC: "Chuck Norris Doesn't Do DevOps...but Java developers might benefit"
LJC: "Chuck Norris Doesn't Do DevOps...but Java developers might benefit"
 
LJC 05/14 "Cloud Developer's DHARMA"
LJC 05/14 "Cloud Developer's DHARMA"LJC 05/14 "Cloud Developer's DHARMA"
LJC 05/14 "Cloud Developer's DHARMA"
 
LSCC 2014 "Crafting DevOps: Applying Software Craftsmanship to DevOps"
LSCC 2014 "Crafting DevOps: Applying Software Craftsmanship to DevOps"LSCC 2014 "Crafting DevOps: Applying Software Craftsmanship to DevOps"
LSCC 2014 "Crafting DevOps: Applying Software Craftsmanship to DevOps"
 
Designing for Tomorrow, Delivering Today
Designing for Tomorrow, Delivering TodayDesigning for Tomorrow, Delivering Today
Designing for Tomorrow, Delivering Today
 
OpenStack at EBSCO
OpenStack at EBSCOOpenStack at EBSCO
OpenStack at EBSCO
 
LJC 2015 "The Crafty Consultants Guide to DevOps"
LJC 2015 "The Crafty Consultants Guide to DevOps"LJC 2015 "The Crafty Consultants Guide to DevOps"
LJC 2015 "The Crafty Consultants Guide to DevOps"
 
Value streammapping cascadiait2014-mceniry
Value streammapping cascadiait2014-mceniryValue streammapping cascadiait2014-mceniry
Value streammapping cascadiait2014-mceniry
 
DevOps Requires Agility
DevOps Requires AgilityDevOps Requires Agility
DevOps Requires Agility
 
KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)
 
Dockercon USA 2016 - Immutable Awesomeness
Dockercon USA 2016 - Immutable Awesomeness Dockercon USA 2016 - Immutable Awesomeness
Dockercon USA 2016 - Immutable Awesomeness
 
LJCConf 2013 "Chuck Norris Doesn't Need DevOps"
LJCConf 2013 "Chuck Norris Doesn't Need DevOps"LJCConf 2013 "Chuck Norris Doesn't Need DevOps"
LJCConf 2013 "Chuck Norris Doesn't Need DevOps"
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
 
Devops Devops Devops, at Froscon
Devops Devops Devops, at FrosconDevops Devops Devops, at Froscon
Devops Devops Devops, at Froscon
 
What DevOps Isn't
What DevOps Isn'tWhat DevOps Isn't
What DevOps Isn't
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
Don't Mind the Gap by Galen Emery
Don't Mind the Gap by Galen EmeryDon't Mind the Gap by Galen Emery
Don't Mind the Gap by Galen Emery
 
Puppet Camp Melbourne 2014:
Puppet Camp Melbourne 2014: Puppet Camp Melbourne 2014:
Puppet Camp Melbourne 2014:
 
manage databases like codebases
manage databases like codebasesmanage databases like codebases
manage databases like codebases
 
Hashicorp at holaluz
Hashicorp at holaluzHashicorp at holaluz
Hashicorp at holaluz
 

Ähnlich wie Devoxx 2014 michael_neale

Moving to a DevOps mode - easy, hard or just plain terrifying? - Daniel Bryan...
Moving to a DevOps mode - easy, hard or just plain terrifying? - Daniel Bryan...Moving to a DevOps mode - easy, hard or just plain terrifying? - Daniel Bryan...
Moving to a DevOps mode - easy, hard or just plain terrifying? - Daniel Bryan...JAXLondon2014
 
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"Daniel Bryant
 
11 Goals of High Functioning SQL Developers
11 Goals of High Functioning SQL Developers11 Goals of High Functioning SQL Developers
11 Goals of High Functioning SQL DevelopersIke Ellis
 
What we talk about when we talk about DevOps
What we talk about when we talk about DevOpsWhat we talk about when we talk about DevOps
What we talk about when we talk about DevOpsRicard Clau
 
LJC 4/21"Easy Debugging of Java Microservices Running on Kubernetes with Tele...
LJC 4/21"Easy Debugging of Java Microservices Running on Kubernetes with Tele...LJC 4/21"Easy Debugging of Java Microservices Running on Kubernetes with Tele...
LJC 4/21"Easy Debugging of Java Microservices Running on Kubernetes with Tele...Daniel Bryant
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerRightScale
 
The DevOps Journey at bwin.party
The DevOps Journey at bwin.partyThe DevOps Journey at bwin.party
The DevOps Journey at bwin.partyKelly Looney
 
Tech writing in a continuous deployment environment
Tech writing in a continuous deployment environmentTech writing in a continuous deployment environment
Tech writing in a continuous deployment environmentChristine Burwinkle
 
Deployment is the new build
Deployment is the new buildDeployment is the new build
Deployment is the new buildAndrew Phillips
 
Vagrant for Effective DevOps Culture
Vagrant for Effective DevOps CultureVagrant for Effective DevOps Culture
Vagrant for Effective DevOps CultureVaidik Kapoor
 
MJC 2021: "Debugging Java Microservices Running on Kubernetes with Telepresence"
MJC 2021: "Debugging Java Microservices Running on Kubernetes with Telepresence"MJC 2021: "Debugging Java Microservices Running on Kubernetes with Telepresence"
MJC 2021: "Debugging Java Microservices Running on Kubernetes with Telepresence"Daniel Bryant
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyKelly Looney
 
Container Days: Easy Debugging of Microservices Running on Kubernetes with Te...
Container Days: Easy Debugging of Microservices Running on Kubernetes with Te...Container Days: Easy Debugging of Microservices Running on Kubernetes with Te...
Container Days: Easy Debugging of Microservices Running on Kubernetes with Te...Daniel Bryant
 
DevoxxUK 2014 "Moving to a DevOps Mode: Easy, Hard, or Just Plain Terrifying?"
DevoxxUK 2014 "Moving to a DevOps Mode: Easy, Hard, or Just Plain Terrifying?"DevoxxUK 2014 "Moving to a DevOps Mode: Easy, Hard, or Just Plain Terrifying?"
DevoxxUK 2014 "Moving to a DevOps Mode: Easy, Hard, or Just Plain Terrifying?"Daniel Bryant
 
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid CloudUsing AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Clouddboze
 
Devoxx UK 22: Debugging Java Microservices "Remocally" in Kubernetes with Tel...
Devoxx UK 22: Debugging Java Microservices "Remocally" in Kubernetes with Tel...Devoxx UK 22: Debugging Java Microservices "Remocally" in Kubernetes with Tel...
Devoxx UK 22: Debugging Java Microservices "Remocally" in Kubernetes with Tel...Daniel Bryant
 

Ähnlich wie Devoxx 2014 michael_neale (20)

Moving to a DevOps mode - easy, hard or just plain terrifying? - Daniel Bryan...
Moving to a DevOps mode - easy, hard or just plain terrifying? - Daniel Bryan...Moving to a DevOps mode - easy, hard or just plain terrifying? - Daniel Bryan...
Moving to a DevOps mode - easy, hard or just plain terrifying? - Daniel Bryan...
 
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"
JAX London 2014 "Moving to DevOps Mode: easy, hard or just plain terrifying?"
 
11 Goals of High Functioning SQL Developers
11 Goals of High Functioning SQL Developers11 Goals of High Functioning SQL Developers
11 Goals of High Functioning SQL Developers
 
What we talk about when we talk about DevOps
What we talk about when we talk about DevOpsWhat we talk about when we talk about DevOps
What we talk about when we talk about DevOps
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
 
LJC 4/21"Easy Debugging of Java Microservices Running on Kubernetes with Tele...
LJC 4/21"Easy Debugging of Java Microservices Running on Kubernetes with Tele...LJC 4/21"Easy Debugging of Java Microservices Running on Kubernetes with Tele...
LJC 4/21"Easy Debugging of Java Microservices Running on Kubernetes with Tele...
 
DevOps Overview
DevOps OverviewDevOps Overview
DevOps Overview
 
Project Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on DockerProject Sherpa: How RightScale Went All in on Docker
Project Sherpa: How RightScale Went All in on Docker
 
The DevOps Journey at bwin.party
The DevOps Journey at bwin.partyThe DevOps Journey at bwin.party
The DevOps Journey at bwin.party
 
Tech writing in a continuous deployment environment
Tech writing in a continuous deployment environmentTech writing in a continuous deployment environment
Tech writing in a continuous deployment environment
 
Deployment is the new build
Deployment is the new buildDeployment is the new build
Deployment is the new build
 
Intro to DevOps
Intro to DevOpsIntro to DevOps
Intro to DevOps
 
Vagrant for Effective DevOps Culture
Vagrant for Effective DevOps CultureVagrant for Effective DevOps Culture
Vagrant for Effective DevOps Culture
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
MJC 2021: "Debugging Java Microservices Running on Kubernetes with Telepresence"
MJC 2021: "Debugging Java Microservices Running on Kubernetes with Telepresence"MJC 2021: "Debugging Java Microservices Running on Kubernetes with Telepresence"
MJC 2021: "Debugging Java Microservices Running on Kubernetes with Telepresence"
 
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps JourneyGartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
Gartner Infrastructure and Operations Summit Berlin 2015 - DevOps Journey
 
Container Days: Easy Debugging of Microservices Running on Kubernetes with Te...
Container Days: Easy Debugging of Microservices Running on Kubernetes with Te...Container Days: Easy Debugging of Microservices Running on Kubernetes with Te...
Container Days: Easy Debugging of Microservices Running on Kubernetes with Te...
 
DevoxxUK 2014 "Moving to a DevOps Mode: Easy, Hard, or Just Plain Terrifying?"
DevoxxUK 2014 "Moving to a DevOps Mode: Easy, Hard, or Just Plain Terrifying?"DevoxxUK 2014 "Moving to a DevOps Mode: Easy, Hard, or Just Plain Terrifying?"
DevoxxUK 2014 "Moving to a DevOps Mode: Easy, Hard, or Just Plain Terrifying?"
 
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid CloudUsing AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
 
Devoxx UK 22: Debugging Java Microservices "Remocally" in Kubernetes with Tel...
Devoxx UK 22: Debugging Java Microservices "Remocally" in Kubernetes with Tel...Devoxx UK 22: Debugging Java Microservices "Remocally" in Kubernetes with Tel...
Devoxx UK 22: Debugging Java Microservices "Remocally" in Kubernetes with Tel...
 

Mehr von Michael Neale

Jenkins X intro (from google app dev conference)
Jenkins X intro (from google app dev conference)Jenkins X intro (from google app dev conference)
Jenkins X intro (from google app dev conference)Michael Neale
 
Cross site calls with javascript - the right way with CORS
Cross site calls with javascript - the right way with CORSCross site calls with javascript - the right way with CORS
Cross site calls with javascript - the right way with CORSMichael Neale
 
Microservices and functional programming
Microservices and functional programmingMicroservices and functional programming
Microservices and functional programmingMichael Neale
 
Osdc 2011 michael_neale
Osdc 2011 michael_nealeOsdc 2011 michael_neale
Osdc 2011 michael_nealeMichael Neale
 
Java one 2011_michaelneale
Java one 2011_michaelnealeJava one 2011_michaelneale
Java one 2011_michaelnealeMichael Neale
 
Errors and handling them. YOW nights Sydney 2011
Errors and handling them. YOW nights Sydney 2011Errors and handling them. YOW nights Sydney 2011
Errors and handling them. YOW nights Sydney 2011Michael Neale
 
SJUG March 2010 Restful design
SJUG March 2010 Restful designSJUG March 2010 Restful design
SJUG March 2010 Restful designMichael Neale
 
On Scala Slides - OSDC 2009
On Scala Slides - OSDC 2009On Scala Slides - OSDC 2009
On Scala Slides - OSDC 2009Michael Neale
 
Osdc Complex Event Processing
Osdc Complex Event ProcessingOsdc Complex Event Processing
Osdc Complex Event ProcessingMichael Neale
 
Jaoo Michael Neale 09
Jaoo Michael Neale 09Jaoo Michael Neale 09
Jaoo Michael Neale 09Michael Neale
 
Osdc Michael Neale 2008
Osdc Michael Neale 2008Osdc Michael Neale 2008
Osdc Michael Neale 2008Michael Neale
 

Mehr von Michael Neale (16)

Jenkins X intro (from google app dev conference)
Jenkins X intro (from google app dev conference)Jenkins X intro (from google app dev conference)
Jenkins X intro (from google app dev conference)
 
Cd syd
Cd sydCd syd
Cd syd
 
Cross site calls with javascript - the right way with CORS
Cross site calls with javascript - the right way with CORSCross site calls with javascript - the right way with CORS
Cross site calls with javascript - the right way with CORS
 
Microservices and functional programming
Microservices and functional programmingMicroservices and functional programming
Microservices and functional programming
 
Cors michael
Cors michaelCors michael
Cors michael
 
Osdc 2011 michael_neale
Osdc 2011 michael_nealeOsdc 2011 michael_neale
Osdc 2011 michael_neale
 
Scala sydoct2011
Scala sydoct2011Scala sydoct2011
Scala sydoct2011
 
Java one 2011_michaelneale
Java one 2011_michaelnealeJava one 2011_michaelneale
Java one 2011_michaelneale
 
Errors and handling them. YOW nights Sydney 2011
Errors and handling them. YOW nights Sydney 2011Errors and handling them. YOW nights Sydney 2011
Errors and handling them. YOW nights Sydney 2011
 
Sjug aug 2010_cloud
Sjug aug 2010_cloudSjug aug 2010_cloud
Sjug aug 2010_cloud
 
SJUG March 2010 Restful design
SJUG March 2010 Restful designSJUG March 2010 Restful design
SJUG March 2010 Restful design
 
On Scala Slides - OSDC 2009
On Scala Slides - OSDC 2009On Scala Slides - OSDC 2009
On Scala Slides - OSDC 2009
 
Osdc Complex Event Processing
Osdc Complex Event ProcessingOsdc Complex Event Processing
Osdc Complex Event Processing
 
Scala Sjug 09
Scala Sjug 09Scala Sjug 09
Scala Sjug 09
 
Jaoo Michael Neale 09
Jaoo Michael Neale 09Jaoo Michael Neale 09
Jaoo Michael Neale 09
 
Osdc Michael Neale 2008
Osdc Michael Neale 2008Osdc Michael Neale 2008
Osdc Michael Neale 2008
 

Kürzlich hochgeladen

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Kürzlich hochgeladen (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Devoxx 2014 michael_neale

  • 1. Changing wheels of a moving car Replacing core technologies in a growing startup Michael Neale CloudBees #DV14 @michaelneale
  • 2. This talk • lucky early decisions • transitions and containers • lessons learned on changing continuously • finally: monitoring, alerting, health - ops for devs. (rarely talked about) #DV14 @michaelneale
  • 3. ABOUT ME • Co-founder CloudBees (the Jenkins company) • Developer with an interest in Ops • built DEV@cloud RUN@cloud #DV14 @michaelneale
  • 4. Working with Cloud Platforms • not as “friendly” as traditional hosting: • Awesome power at fingertips: try everything, try all hardware • Iterate rapidly • But: • APIs have lower QoS than hosts • Servers are cattle, not pets • Jenkins (and others) still need filesystems (not always easy on cloud) • multi tenancy for scale/cost #DV14 @michaelneale
  • 5. Lucky decisions we made • Isolate EC2 apis with fault tolerant REST app for provisioning • API can behave strangely - backoff and retry, API limits and more • Build pathological API simulator • Enable replacement of servers via termination • “chaos monkey” approach • Reality: I didn’t understand chef. So replace AMI by terminating, new latest takes its place • Done as a “hack” but core platform value today • ie. we are always changing, always replacing “naturally” #DV14 @michaelneale
  • 6. NetflixOSS productised this! https://github.com/Netflix/Hystrix https://github.com/Netflix/SimianArmy http://netflix.github.io netflixoss.ci.cloudbees.com #DV14 @michaelneale
  • 7. Chaos monkeying to upgrade • OS change: new AMI == terminate, let system replace • (in ec2: autoscale groups can do this for you) • Security patch? == terminate. • Server a bit sick? TERMINATE • (we actually use chef for minor config changes and some app level upgrades… relax…) • If in doubt.. you get the idea… #DV14 @michaelneale
  • 8. A bad year for security • Heartbleed • Shell-shock • POODLE • XEN guest flaw, aws-reboot-a-thon #DV14 @michaelneale
  • 9. But a great year for logos: Xen #DV14 @michaelneale
  • 10. Upgrades… • In place or… TERMINATE? • Often easier and safer to swap out: • eg revproxy (nginx) cluster replacement process: • warm new server, cut over IP and traffic, terminate old • No half-measures, half-upgrades, clean slate • (elastic IP helped in this case) #DV14 @michaelneale
  • 11. More benefits of terminate … • “Retirement notices” from AWS - daily event! • Even “new” servers - 3 days until “retire” • No you can’t see the server in retirement home. • Reboot at some vague time - TERMINATE • Encourages immutable servers • predictable state • security advantages of being “locked down” in image #DV14 @michaelneale
  • 12. But what about data… • Some say filesystem dependency is “legacy” • I say “you aren’t trying hard enough” • APIs such as EBS allow quick volume creation based on snapshots: • Continuous (delta) snapshotting of data • Can quickly restore service in healthy data centers • Faster time to recovery, route around failing zones • Ideal: use distributed data in all forms if you can! #DV14 @michaelneale
  • 13. Containment challenge: APPS JENKINS MASTERS BUILD EXECUTORS #DV14 @michaelneale
  • 14. Containment • Apps (paas) can do anything • Builds DO do anything • Need a clean slate for users • Process cleanup • Jenkins masters have plugins • Multi tenancy: cost effective, higher density, better elasticity (fine grained processes vs autoscale groups) #DV14 @michaelneale
  • 15. Containment Evolution • Unix user isolation + cgroups • LXC (builds on cgroups, namespaces) • Docker (builds on cgroups, namespaces, NOT lxc) • Natural current end point and so hot right now: #DV14 @michaelneale
  • 17. Security benefits of containers? • Not complete • Not a replacement for current measures, but help • Lots of (changing) content online • Next: linux user-namespace for “fake root user” • “coming real soon now??” already in lxc, not in docker at this time. #DV14 @michaelneale
  • 18. Transition of a build service • Initial: discrete build nodes, “recycled” between use • Pools with “mark and sweep” garbage collection of unused build servers • unix user and cgroup/namespace isolation • Attach build data from snapshots #DV14 @michaelneale
  • 19. Transition of a build service • Next: use LXC for containment isolation • Finally: Use multi-tenant pools with full container isolation • Pool disks for IO and EBS resilience (ZFS) • Use larger more economical server (more burst power) • Consistent hashing to get server with warm “build cache” • (sorry if your maven re-downloads the world, hopefully not all the time) #DV14 @michaelneale
  • 20. Transition of a build service • Done continually over a year • Limited user opt-in/out, majority do not notice • Strategy options: • roll out to 10%, 50% • roll out to tiered users (ie freemium users get new/unstable?) • roll out to all - incremental uptake due to natural restarting/ reprovisioning • ALWAYS dog food #DV14 @michaelneale
  • 21. Dog food • Always roll out to self first • (occasionally joyously discover bootstrapping problem if it goes bad!) • True indicator of confidence • We get used to change, from users point of view #DV14 @michaelneale
  • 22. • How we apply Jenkins with CD: UPSTREAM CHANGE terminate at any time CHEF RECIPE MASTER BRANCH TEST ENV CHEF RECIPE PRODUCTION BRANCH PROD ENV rollout strategy #DV14 @michaelneale
  • 24. Wide feedback • Provide something community want to try: • https://registry.hub.docker.com/_/jenkins/ • Helps them, helps us learn #DV14 @michaelneale
  • 25. Lessons on continual change • Cost of change == F(gap between deployments) • CD etc etc (you will hear a lot elsewhere) • Keep MTTR (mean time to recovery) low • If short enough, people will blame internet connection (ssshhhh) #DV14 @michaelneale
  • 26. Lessons on continual change • Always be doing DR • People ask about “DR” strategy • If you DR often, then it isn’t really DR - just BAU*, TMA*? • Normal service restoration and termination exercises “backups” #DV14 @michaelneale
  • 27. Changes in a SaaS • If people use a SaaS, upgrades/change expected • Communicate to users on changes, let them know how much work you do for them! It isn’t easy! • Some changes visible, some not (some you thought invisible, but were visible) - let people know. • Even outages can create good will: • Explanations and understanding == appreciation, it happens • Proactive security patching this year • “we don’t want to run this ourselves” #DV14 @michaelneale
  • 28. Monitoring and alerting • Not often talked about in classic dev circles • Increasingly passionate in “devops” circles (monitorama) • Alerting a staple of traditional ops and “on call” • These roles now smearing out amongst all devs #DV14 @michaelneale
  • 29. Why monitoring? • SaaS always changing • The Question: • Are things better or worse than before? • Did the change make things better or worse • Not so much: • Is everything perfect (it won’t be) #DV14 @michaelneale
  • 30. Monitoring and alerting • Roughly split: • “check engines” (nagios, pingdom etc) • receive events, work if service up/down • “notifications” - pagerduty and email, sms • tell people about things • analytics and monitoring (librato, boundary, new relic and more) • DASHBOARDS AND GRAPHS EVERYWHERE #DV14 @michaelneale
  • 31. Analytics Checks #DV14 #YourTag @michaelneale
  • 32. All exist to inform you • Graphic dashboards can overwhelm • Some people treat them as end goal • Too much information often - are things OK Y/N? • Aim is to get insight (eg new relic like an online profiler) WHEN problems are happening • Aim is to tell people when problems are happening • Reports/graphs can be useful, but not at the expense of “health” monitoring #DV14 @michaelneale
  • 33. If you must graph, a most important feature: Deploy happened here! #DV14 #YourTag @michaelneale
  • 34. Alert and information fatigue • A real (world) problem: • http://fractio.nl/2014/08/26/cardiac-alarms-and-ops/ • Eg: cardiac monitors: • Thresholds adjusted until only life critical • No “ACK” of noisy alerts (no “WARNING”) • Increased urgency, but reduced volume • reduced noise, reduced fatigue and fatalities! (counterintuitive?) #DV14 @michaelneale
  • 35. Alert and information fatigue • Avoid “warnings” that interrupt people • (remember each interruption is > 1 hour really) • Push messages to chat rooms “chat ops” • Allow people already distracted to act • Alerts/info as “streams” people can dip into and help out • Avoid escalation • Follow the sun support! (if your team has it! Great!) #DV14 @michaelneale
  • 36. End to End test monitor • Why save testing for dev time only • Apply a kind of integration test to production • Can be a “synthetic transaction” • eg: signup, run some process, exit • Run it continually • Increases confidence • “Out Of Band End To End Test” “oobetet” • technically monitoring, not testing! #DV14 @michaelneale
  • 37. Codahale metrics • https://dropwizard.github.io/metrics/3.1.0/ • Simple metrics to your app: • Binary health checks “foo.widget.thing is OK” • Numerical metrics: • Gauges, meters, histograms and more • Lots of statistical goodness baked in (so you don’t have to) • Expose via /health URL and JSON, push to metrics services and more (can use a servlet): #DV14 @michaelneale
  • 38. Gauge measurement: metrics.register(“important thing”, ”size”), new Gauge<Integer>() { @Override public Integer getValue() { return queue.size(); } }); #DV14 #YourTag @YourTwitterHandle
  • 39. trace percentile of times spent in.. private final Timer responses = metrics.timer(“important thing”); public String handleRequest(Request request, Response response) { final Timer.Context context = responses.time(); try { // do some work; return "OK"; } finally { context.stop(); } } #DV14 #YourTag @YourTwitterHandle
  • 40. Minimal points to take away • Give codahale/dropwizard stuff a good look! • Instrument at least a /health check that can be wired in later • *think* about monitoring • Replace/restore as matter of “routine” • Change becomes the normal • Terminate, restart, are often an OK way to recover! #DV14 @michaelneale
  • 41. Thank you! Questions? @michaelneale developer-blog.cloudbees.com #DV14 @michaelneale