SlideShare ist ein Scribd-Unternehmen logo
1 von 29
DevOps Transformations
at National Instruments
and Bazaarvoice
ERNEST MUELLER @ERNESTMUELLER
THEAGILEADMIN.COM ERNEST.MUELLER@GMAIL.COM
Who Am I?
YES
 B.S. Electrical Engineering, Rice
University
 Programmer, Webmaster at FedEx
 VP of IT at Towery Publishing
 Web Systems Manager, SaaS
Systems Architect at National
Instruments
 Release Manager, Engineering
Manager at Bazaarvoice
NO
National Instruments
 NATI – founded in 1976, $1.24bn
revenue, 7249 employees, 10
sites worldwide (HQ: Austin)
 Makes hardware and software for
data acquisition, virtual
instrumentation, automated test,
instrument control
 Frequently on Fortune’s 100 Best
Companies to Work For list
NI Web Systems (2002-2008)
 Managed systems team supporting NI’s Web presence (ni.com) and other
Web technology based assets.
 ni.com is the primary source of product information (site visits were a key
KPI on the overall lead to sales conversion pipeline), half of all lead
generation, and a primary sales channel – about 12% of revenue was direct
e-commerce.
 Key goals were all growth - globalization, driving traffic, increasing
conversion, and e-commerce sales.
 Team grew from 4 to 8 sysadmins, supporting 80-100 developers to fulfill
their needs and coordinate with the other IT Infrastructure teams to build
out Web systems.
The Problems
 Previous team philosophy had been actively anti-automation. All systems
manually built and deployed.
 Stability was low and site uptime hovered around 97% with constant work.
 Web Admin quality of life was poor and turnover was high. Some weeks we
had more than 200 oncall pages (resulting in a total of two destroyed oncall
pagers).
 Application teams were siloed by business initiative, infrastructure teams
siloed by technology.
 Large and diverse tech stack – ni.com had hundreds of applications comprised
of Java, Oracle PL/SQL, Lotus Notes Domino, and more.
 Monthly code releases were “all hands on deck,” beginning late Friday evening
and going well into Saturday.
What We Did
Immediate (3-6 mo):
 Stop the bleeding – triage issues, get QoL metrics, operations shield
Mid term (1-2 year):
 Automate – “Monolith,” “Autodeploys,” “Redirect Manager,” APM
 Process – “Systems Development Framework”, operational rotation
Long term (2+ years):
 Partner with Apps and Business
 Drive overall goal roadmap (performance, availability, budget, maintenance%,
agility)
What We Did, Part 2
Security Practice
 NI had no full time security staff of any sort
 We developed expertise and implemented both infrastructure and application security
programs
 Hosted local OWASP user group
Application Performance Management Practice
 After basic improvements, the majority of stability and performance issues were
application side
 We implemented a wide range of monitoring tools (synthetic, RUM, log analysis, APM)
 Reduced our production issues by 30% and reduced troubleshooting time by 90%
Results
“The value of the Web Systems group to the Web
Team is that it understands what we’re trying to
achieve, not just what tasks need completing.”
-Christer Ljungdahl, Director, Web Marketing
But…
We were still “the bottleneck.”
NI R&D Cloud Architect (2009-2011)
 About this time R&D decided to branch out into SaaS
products.
 Formed a greenfield team with myself as systems architect and
5 other key devs and ops from IT
 Experimental, had some initial product ideas but also wanted
to see what we could develop
 LabVIEW Cloud UI Builder (Silverlight client, save/compile/share in
cloud)
 LabVIEW FPGA Compile Cloud (FPGA compiles in cloud from LV
product)
 Technical Data Cloud (IoT data upload)
What We Did - Initial Decisions
Cloud Hosting
 Do not use existing IT systems or processes, go integrated self-contained team
 Adapt R&D NPI and design review processes to agile
 All our old tools wouldn’t work (dynamic, REST)
Security First
 We knew security would be one of the key barriers to adoption
 Doubled down on threat modeling, scanning, documentation
Automation First
 PIE, the Programmable Infrastructure Environment
What We Did, Part 2
 Close dev/ops collaboration was initially rocky, but we powered
through it
 REST not-quite-microservices approach
 Educate desktop software devs on Web-scale operational needs
 Used “Operations as the Secret Sauce” mentality to add value:
 Transparent uptime/status page
 Rapid deployment anytime
 Incident handling, follow-the-sun ops
 Monitoring/logging metrics feedback to developers and business
Results
 Average NI new software product time to market –
3 years
 Average NI Cloud new product time to market –
1 year
Meanwhile… DevOps!
 Velocity 2009, June 2009
 DevOpsDays Ghent, Nov 2009
 OpsCamp Austin, Feb 2010
 Velocity 2010/DevOpsDays Mountain View, June
2010
 Helped launch CloudAustin July 2010
 We hosted DevOpsDays Austin 2012 at NI!
Bazaarvoice
 BV – founded in 2005, $191M revenue,
826 employees, Austin Ventures backed
startup went public in 2012
 SaaS provider of ratings and reviews and
analytics and syndication products, media
 Very large scale – customers are 30% of
leading brands, retailers like Walmart,
services like OpenTable – 600M
impressions/day, 450M unique
users/month
BV Release Manager (2012)
 Bazaarvoice was making the transition into an engineering-heavy company,
adopting Agile and looking to rearchitect their aging core platform into a
microservice architecture.
 They were on 10-week release cycles that were delayed virtually 100% of
the time
 They moved to 2 week sprints but when they tried to release to production
at the end of them they had downtime and a variety of issues (44
customers reported breakages)
 I was brought on with the goal of “get us to two week releases in a month”
(I started Jan 30, we launched March 1)
 I had no direct reports, just broad cooperation and some key engineers
seconded to me
The Problems
 Slow testing (low percentage of automated regression testing)
 Lots of branching
 Lots of branch checkins after testing begins
 Creates vicious cycle of more to test/more delays
 Monolithic architecture (15,000 files, 4.9M lines of code, running on 1200
hosts in 4 datacenters)
 Unclear ownership of components
 Release pipeline very “throw over the wall”-y
 Concerns from Implementation, Support, and other teams about “not
knowing what’s going on”
What We Did
 With Product support, product teams took several (~3) sprints off to automate
regression testing – “as many as you need to hit the bar”
 Core QA tooling investment (JUnit, TeamCity CIT, Selenium)
 Review of assets and determining ownership (start of a service catalog)
 New branch strategy – trunk and single release branch per release only, no
checkins to release branch except to fix critical bugs
 “If the build is broken and you had a change in, it’s your job to fix it”
 Release process requiring explicit signoffs in Confluence/JIRA (every svn checkin
was linked to a ticket) promoting ownership “through the pipeline”
 Testing process changed (feature test in dev, regress in QA, smoke in stage)
What We Did, Part 2
 Also a feature flagging system was added, so new features a) could
be included in builds immediately and b) would be launched dark.
 “Just enough process” – go/no-go meetings and master signoffs
were important early
 Communication, communication, communication. Met with all
stakeholders (in a company whose only product is the service,
that’s everyone) and set up all-company service status and release
notification emails.
Results
 First release: “no go” – delayed 5 days, 5 customer issues
 Second release: on time, 1 customer issue
 Third release: on time, 0 customer issues
 Kept detailed metrics on # checkins, delays, process faults (mainly branch
checkins), support tickets
 We had issues (big blowup in May from a major change) and then we’d change
the process
 Running the releases was handed off to a rotation through the dev teams
 After all the heavy lifting to get to biweekly releases, we went to weekly releases
with little fanfare in September – average customer reported issues of 0
BV Engineering Manager (2012-2014)
 We had a core team working on the legacy ratings & reviews system
while new microservice teams embarked on the rewrite
 We still had a centralized operations team serving them all and knew
we wanted to distribute them into the teams so that all those teams
were operationally ready from early in their cycle
 We declared “the death of the ops team” mid-2012 and disseminated
them into the dev teams, initially reporting to a DevOps manager
 I took over the various engineers working on the existing product –
about 40 engineers, 2/3 of which were offshore outsourcers from
Softserve
The Problems
 With the assumption that “the legacy stack” would only be around a short
time, they had not been moved to agile and were very understaffed.
 Volume (hits and data volume) continued to grow at a blistering pace
necessitating continuous expansion and rearchitecture
 Black Friday peaks stressed our architecture, with us serving 2.6 billion review
impressions on Black Friday and Cyber Monday.
 Relationships with our outsourcers was strained
 Escalated support tickets were hitting SLA about 72% of the time
 Hundreds to thousands of alerts a day, 750-item product backlog
 Growing security and privacy compliance needs – SOX, ISO, TÜV, AFNOR
What We Did
 Moved the teams to agile, embedded DevOps
 Started onshore rotations and better communication with
outsourcers, empowered their engineers
 Trying to break the team into 4 sprint teams failed initially because
there weren’t enough trained leads/scrum masters, had to regroup
and try again to get success
 Balancing four split product backlogs with a Kanban-style “triage
process” to handle incidents and support tickets
 Metrics and custom visualizations across the board – from support
SLA percentage to current requests and performance per cluster
What We Did, Part 2
 Enhanced communication with Support, embedded Product Manager
 “Merit Badge” style crosstraining program to bring up new SMEs to
reduce load on “the one person that knows how to do that”
 “Dynamic Duo” day/night dev/ops operational rotation
 Joint old team/new team Black Friday planning, game days
 Worked with auditors and InfoSec team – everything automated and
auditable in source control, documentation in wiki. Integrated security
team findings into product backlog.
Results
 Customer support SLA went steadily up quarter by quarter until it
hit 100%
 Steadily increasing velocity across all 4 sprint teams
 Great value from outsourcer partner
 We continued to weather Black Friday spikes year after year on
the old stack
 Incorporation of “new stack” services, while slower than initially
desired, happened in a more measured manner
 Some retention issues with completely embedded DevOps on
“new stack” teams however
Questions?
ERNEST MUELLER @ERNESTMUELLER
THEAGILEADMIN.COM ERNEST.MUELLER@GMAIL.COM

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

DevOps Explained
DevOps ExplainedDevOps Explained
DevOps Explained
 
Implementing DevOps In Practice
Implementing DevOps In PracticeImplementing DevOps In Practice
Implementing DevOps In Practice
 
KEYNOTE | WHAT'S COMING IN THE NEXT 10 YEARS OF DEVOPS? // ELLEN CHISA, bolds...
KEYNOTE | WHAT'S COMING IN THE NEXT 10 YEARS OF DEVOPS? // ELLEN CHISA, bolds...KEYNOTE | WHAT'S COMING IN THE NEXT 10 YEARS OF DEVOPS? // ELLEN CHISA, bolds...
KEYNOTE | WHAT'S COMING IN THE NEXT 10 YEARS OF DEVOPS? // ELLEN CHISA, bolds...
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
DevOps 101 - an Introduction to DevOps
DevOps 101  - an Introduction to DevOpsDevOps 101  - an Introduction to DevOps
DevOps 101 - an Introduction to DevOps
 
DevOps: Process, Tool or Mindset?
DevOps: Process, Tool or Mindset?DevOps: Process, Tool or Mindset?
DevOps: Process, Tool or Mindset?
 
DevOps State of the Union 2015
DevOps State of the Union 2015DevOps State of the Union 2015
DevOps State of the Union 2015
 
DevOps 101
DevOps 101DevOps 101
DevOps 101
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
 
About DevOps in simple steps
About DevOps in simple stepsAbout DevOps in simple steps
About DevOps in simple steps
 
DevOps Beyond the Buzzwords: What it Means to Embrace the DevOps Lifestyle
DevOps Beyond the Buzzwords: What it Means to Embrace the DevOps LifestyleDevOps Beyond the Buzzwords: What it Means to Embrace the DevOps Lifestyle
DevOps Beyond the Buzzwords: What it Means to Embrace the DevOps Lifestyle
 
DevOps 2016 summit
DevOps 2016 summitDevOps 2016 summit
DevOps 2016 summit
 
Devops
DevopsDevops
Devops
 
The Four Keys - Measuring DevOps Success
The Four Keys - Measuring DevOps SuccessThe Four Keys - Measuring DevOps Success
The Four Keys - Measuring DevOps Success
 
Devops: A History
Devops: A HistoryDevops: A History
Devops: A History
 
Devops at SlideShare: Talk at Devopsdays Bangalore 2011
Devops at SlideShare: Talk at Devopsdays Bangalore 2011Devops at SlideShare: Talk at Devopsdays Bangalore 2011
Devops at SlideShare: Talk at Devopsdays Bangalore 2011
 
The Devops Handbook
The Devops HandbookThe Devops Handbook
The Devops Handbook
 
Continuous Delivery e-book
Continuous Delivery e-bookContinuous Delivery e-book
Continuous Delivery e-book
 
How to choose tools for DevOps and Continuous Delivery - Unicom DevOps Summit...
How to choose tools for DevOps and Continuous Delivery - Unicom DevOps Summit...How to choose tools for DevOps and Continuous Delivery - Unicom DevOps Summit...
How to choose tools for DevOps and Continuous Delivery - Unicom DevOps Summit...
 
CampDevOps keynote - DevOps: Using 'Lean' to eliminate Bottlenecks
CampDevOps keynote - DevOps: Using 'Lean' to eliminate BottlenecksCampDevOps keynote - DevOps: Using 'Lean' to eliminate Bottlenecks
CampDevOps keynote - DevOps: Using 'Lean' to eliminate Bottlenecks
 

Andere mochten auch

Andere mochten auch (20)

Shirt Ops: How to make awesome t-shirts for your conference
Shirt Ops: How to make awesome t-shirts for your conferenceShirt Ops: How to make awesome t-shirts for your conference
Shirt Ops: How to make awesome t-shirts for your conference
 
Why to docker
Why to dockerWhy to docker
Why to docker
 
RSA Conference 2016: Who Are You? From Meat to Electrons and Back Again
RSA Conference 2016: Who Are You? From Meat to Electrons and Back AgainRSA Conference 2016: Who Are You? From Meat to Electrons and Back Again
RSA Conference 2016: Who Are You? From Meat to Electrons and Back Again
 
Containers - (Austin Cloud Meetup April 2016)
Containers - (Austin Cloud Meetup April 2016)Containers - (Austin Cloud Meetup April 2016)
Containers - (Austin Cloud Meetup April 2016)
 
Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015Pragmatic Security and Rugged DevOps - SXSW 2015
Pragmatic Security and Rugged DevOps - SXSW 2015
 
Adobe Presents Internal Service Delivery Platform at Velocity 13 Santa Clara
Adobe Presents Internal Service Delivery Platform at Velocity 13 Santa ClaraAdobe Presents Internal Service Delivery Platform at Velocity 13 Santa Clara
Adobe Presents Internal Service Delivery Platform at Velocity 13 Santa Clara
 
Support and Initiate a DevOps Transformation
Support and Initiate a DevOps TransformationSupport and Initiate a DevOps Transformation
Support and Initiate a DevOps Transformation
 
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
 
You Can't Change Culture, But You Can Change Behavior (DevOpsDays Rome 2012)
You Can't Change Culture, But You Can Change Behavior (DevOpsDays Rome 2012)You Can't Change Culture, But You Can Change Behavior (DevOpsDays Rome 2012)
You Can't Change Culture, But You Can Change Behavior (DevOpsDays Rome 2012)
 
FLUX - Crash Course in Cloud 2.0
FLUX - Crash Course in Cloud 2.0 FLUX - Crash Course in Cloud 2.0
FLUX - Crash Course in Cloud 2.0
 
Creative Digital Workshop
Creative Digital WorkshopCreative Digital Workshop
Creative Digital Workshop
 
Continous Delivery of your Infrastructure
Continous Delivery of your InfrastructureContinous Delivery of your Infrastructure
Continous Delivery of your Infrastructure
 
Looking back at 7 years of #devopsdays
Looking back at 7 years of #devopsdaysLooking back at 7 years of #devopsdays
Looking back at 7 years of #devopsdays
 
devops is a reorg
devops is a reorgdevops is a reorg
devops is a reorg
 
Run stuff, Deploy Stuff
Run stuff, Deploy StuffRun stuff, Deploy Stuff
Run stuff, Deploy Stuff
 
Nightmare on Docker street
Nightmare on Docker streetNightmare on Docker street
Nightmare on Docker street
 
The influence of "Distributed platforms" on #devops
The influence of "Distributed platforms" on #devopsThe influence of "Distributed platforms" on #devops
The influence of "Distributed platforms" on #devops
 
No, we can't do continuous delivery
No, we can't do continuous deliveryNo, we can't do continuous delivery
No, we can't do continuous delivery
 
Looking back at 6.5 years of #devopsdays
Looking back at 6.5 years of #devopsdaysLooking back at 6.5 years of #devopsdays
Looking back at 6.5 years of #devopsdays
 
Técnicas de gestión del tiempo para Administradores de Sistemas
Técnicas de gestión del tiempo para Administradores de SistemasTécnicas de gestión del tiempo para Administradores de Sistemas
Técnicas de gestión del tiempo para Administradores de Sistemas
 

Ähnlich wie DevOps Transformations

DO5T17S_T5 Thur 430 GilesE_BR_20151114_012422
DO5T17S_T5 Thur 430 GilesE_BR_20151114_012422DO5T17S_T5 Thur 430 GilesE_BR_20151114_012422
DO5T17S_T5 Thur 430 GilesE_BR_20151114_012422
Erik Giles
 

Ähnlich wie DevOps Transformations (20)

DOES15 - Ernest Mueller - DevOps Transformations At National Instruments and...
DOES15 - Ernest Mueller - DevOps Transformations At National Instruments and...DOES15 - Ernest Mueller - DevOps Transformations At National Instruments and...
DOES15 - Ernest Mueller - DevOps Transformations At National Instruments and...
 
Continuous Delivery in a Legacy Shop—One Step at a Time
Continuous Delivery in a Legacy Shop—One Step at a TimeContinuous Delivery in a Legacy Shop—One Step at a Time
Continuous Delivery in a Legacy Shop—One Step at a Time
 
2012 - A Release Odyssey
2012 - A Release Odyssey2012 - A Release Odyssey
2012 - A Release Odyssey
 
Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April Lean product management for web2.0 by Sujoy Bhatacharjee, April
Lean product management for web2.0 by Sujoy Bhatacharjee, April
 
DevOps and Build Automation
DevOps and Build AutomationDevOps and Build Automation
DevOps and Build Automation
 
Continuous Delivery in a Legacy Shop - One Step at a Time
Continuous Delivery in a Legacy Shop - One Step at a TimeContinuous Delivery in a Legacy Shop - One Step at a Time
Continuous Delivery in a Legacy Shop - One Step at a Time
 
Continuous Delivery in a Legacy Shop - One Step at a Time
Continuous Delivery in a Legacy Shop - One Step at a Time Continuous Delivery in a Legacy Shop - One Step at a Time
Continuous Delivery in a Legacy Shop - One Step at a Time
 
DevOps for the Discouraged
DevOps for the Discouraged DevOps for the Discouraged
DevOps for the Discouraged
 
Continuous delivery best practices and essential tools
Continuous delivery best practices and essential toolsContinuous delivery best practices and essential tools
Continuous delivery best practices and essential tools
 
Innovate Better Through Machine data Analytics
Innovate Better Through Machine data AnalyticsInnovate Better Through Machine data Analytics
Innovate Better Through Machine data Analytics
 
Waldo Wollrabe Resume
Waldo Wollrabe ResumeWaldo Wollrabe Resume
Waldo Wollrabe Resume
 
Enforcing Quality with DevOps Pipeline Gates
Enforcing Quality with DevOps Pipeline GatesEnforcing Quality with DevOps Pipeline Gates
Enforcing Quality with DevOps Pipeline Gates
 
Leveraging Analytics for DevOps
Leveraging Analytics for DevOpsLeveraging Analytics for DevOps
Leveraging Analytics for DevOps
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.au
 
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
Data-Driven DevOps: Mining Machine Data for 'Metrics that Matter' in a DevOps...
 
Accelerate User Driven Innovation [Webinar]
Accelerate User Driven Innovation [Webinar]Accelerate User Driven Innovation [Webinar]
Accelerate User Driven Innovation [Webinar]
 
DO5T17S_T5 Thur 430 GilesE_BR_20151114_012422
DO5T17S_T5 Thur 430 GilesE_BR_20151114_012422DO5T17S_T5 Thur 430 GilesE_BR_20151114_012422
DO5T17S_T5 Thur 430 GilesE_BR_20151114_012422
 
Workshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank EnglishWorkshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank English
 
Andy singleton continuous delivery-fcb - nov 2014
Andy singleton   continuous delivery-fcb - nov 2014Andy singleton   continuous delivery-fcb - nov 2014
Andy singleton continuous delivery-fcb - nov 2014
 
2011 06 15 velocity conf from visible ops to dev ops final
2011 06 15 velocity conf   from visible ops to dev ops final2011 06 15 velocity conf   from visible ops to dev ops final
2011 06 15 velocity conf from visible ops to dev ops final
 

Mehr von Ernest Mueller

Mehr von Ernest Mueller (18)

DevOps at a Distance
DevOps at a DistanceDevOps at a Distance
DevOps at a Distance
 
AlienVault USM Anywhere: Building a Security SaaS in AWS in Six Months
AlienVault USM Anywhere: Building a Security SaaS in AWS in Six MonthsAlienVault USM Anywhere: Building a Security SaaS in AWS in Six Months
AlienVault USM Anywhere: Building a Security SaaS in AWS in Six Months
 
Intro to DevOps
Intro to DevOpsIntro to DevOps
Intro to DevOps
 
The DevOps Panel - Innotech Austin CD Summit
The DevOps Panel - Innotech Austin CD SummitThe DevOps Panel - Innotech Austin CD Summit
The DevOps Panel - Innotech Austin CD Summit
 
Lean Security - LASCON 2016
Lean Security - LASCON 2016Lean Security - LASCON 2016
Lean Security - LASCON 2016
 
Lean Security - OWASP Austin March 2016
Lean Security - OWASP Austin March 2016Lean Security - OWASP Austin March 2016
Lean Security - OWASP Austin March 2016
 
Lean Security - RSA 2016
Lean Security - RSA 2016Lean Security - RSA 2016
Lean Security - RSA 2016
 
App Assessments Reloaded
App Assessments ReloadedApp Assessments Reloaded
App Assessments Reloaded
 
Metrics Driven Development and DevOps - Agile 2014
Metrics Driven Development and DevOps - Agile 2014Metrics Driven Development and DevOps - Agile 2014
Metrics Driven Development and DevOps - Agile 2014
 
The DevOps Centipede
The DevOps CentipedeThe DevOps Centipede
The DevOps Centipede
 
Mobile and the Cloud
Mobile and the CloudMobile and the Cloud
Mobile and the Cloud
 
CloudAustin Black Friday 2013
CloudAustin Black Friday 2013CloudAustin Black Friday 2013
CloudAustin Black Friday 2013
 
Cloud Monitoring
Cloud MonitoringCloud Monitoring
Cloud Monitoring
 
DevOps and Cloud at NI
DevOps and Cloud at NIDevOps and Cloud at NI
DevOps and Cloud at NI
 
Business model driven cloud adoption - what NI is doing in the cloud
Business model driven cloud adoption -  what  NI is doing in the cloudBusiness model driven cloud adoption -  what  NI is doing in the cloud
Business model driven cloud adoption - what NI is doing in the cloud
 
Inside Microsoft Azure
Inside Microsoft AzureInside Microsoft Azure
Inside Microsoft Azure
 
PIE - The Programmable Infrastructure Environment
PIE - The Programmable Infrastructure EnvironmentPIE - The Programmable Infrastructure Environment
PIE - The Programmable Infrastructure Environment
 
Why the cloud is more secure than your existing systems
Why the cloud is more secure than your existing systemsWhy the cloud is more secure than your existing systems
Why the cloud is more secure than your existing systems
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

DevOps Transformations

  • 1. DevOps Transformations at National Instruments and Bazaarvoice ERNEST MUELLER @ERNESTMUELLER THEAGILEADMIN.COM ERNEST.MUELLER@GMAIL.COM
  • 2. Who Am I? YES  B.S. Electrical Engineering, Rice University  Programmer, Webmaster at FedEx  VP of IT at Towery Publishing  Web Systems Manager, SaaS Systems Architect at National Instruments  Release Manager, Engineering Manager at Bazaarvoice NO
  • 3. National Instruments  NATI – founded in 1976, $1.24bn revenue, 7249 employees, 10 sites worldwide (HQ: Austin)  Makes hardware and software for data acquisition, virtual instrumentation, automated test, instrument control  Frequently on Fortune’s 100 Best Companies to Work For list
  • 4. NI Web Systems (2002-2008)  Managed systems team supporting NI’s Web presence (ni.com) and other Web technology based assets.  ni.com is the primary source of product information (site visits were a key KPI on the overall lead to sales conversion pipeline), half of all lead generation, and a primary sales channel – about 12% of revenue was direct e-commerce.  Key goals were all growth - globalization, driving traffic, increasing conversion, and e-commerce sales.  Team grew from 4 to 8 sysadmins, supporting 80-100 developers to fulfill their needs and coordinate with the other IT Infrastructure teams to build out Web systems.
  • 5. The Problems  Previous team philosophy had been actively anti-automation. All systems manually built and deployed.  Stability was low and site uptime hovered around 97% with constant work.  Web Admin quality of life was poor and turnover was high. Some weeks we had more than 200 oncall pages (resulting in a total of two destroyed oncall pagers).  Application teams were siloed by business initiative, infrastructure teams siloed by technology.  Large and diverse tech stack – ni.com had hundreds of applications comprised of Java, Oracle PL/SQL, Lotus Notes Domino, and more.  Monthly code releases were “all hands on deck,” beginning late Friday evening and going well into Saturday.
  • 6. What We Did Immediate (3-6 mo):  Stop the bleeding – triage issues, get QoL metrics, operations shield Mid term (1-2 year):  Automate – “Monolith,” “Autodeploys,” “Redirect Manager,” APM  Process – “Systems Development Framework”, operational rotation Long term (2+ years):  Partner with Apps and Business  Drive overall goal roadmap (performance, availability, budget, maintenance%, agility)
  • 7. What We Did, Part 2 Security Practice  NI had no full time security staff of any sort  We developed expertise and implemented both infrastructure and application security programs  Hosted local OWASP user group Application Performance Management Practice  After basic improvements, the majority of stability and performance issues were application side  We implemented a wide range of monitoring tools (synthetic, RUM, log analysis, APM)  Reduced our production issues by 30% and reduced troubleshooting time by 90%
  • 8. Results “The value of the Web Systems group to the Web Team is that it understands what we’re trying to achieve, not just what tasks need completing.” -Christer Ljungdahl, Director, Web Marketing
  • 9. But… We were still “the bottleneck.”
  • 10. NI R&D Cloud Architect (2009-2011)  About this time R&D decided to branch out into SaaS products.  Formed a greenfield team with myself as systems architect and 5 other key devs and ops from IT  Experimental, had some initial product ideas but also wanted to see what we could develop  LabVIEW Cloud UI Builder (Silverlight client, save/compile/share in cloud)  LabVIEW FPGA Compile Cloud (FPGA compiles in cloud from LV product)  Technical Data Cloud (IoT data upload)
  • 11. What We Did - Initial Decisions Cloud Hosting  Do not use existing IT systems or processes, go integrated self-contained team  Adapt R&D NPI and design review processes to agile  All our old tools wouldn’t work (dynamic, REST) Security First  We knew security would be one of the key barriers to adoption  Doubled down on threat modeling, scanning, documentation Automation First  PIE, the Programmable Infrastructure Environment
  • 12.
  • 13.
  • 14. What We Did, Part 2  Close dev/ops collaboration was initially rocky, but we powered through it  REST not-quite-microservices approach  Educate desktop software devs on Web-scale operational needs  Used “Operations as the Secret Sauce” mentality to add value:  Transparent uptime/status page  Rapid deployment anytime  Incident handling, follow-the-sun ops  Monitoring/logging metrics feedback to developers and business
  • 15. Results  Average NI new software product time to market – 3 years  Average NI Cloud new product time to market – 1 year
  • 16. Meanwhile… DevOps!  Velocity 2009, June 2009  DevOpsDays Ghent, Nov 2009  OpsCamp Austin, Feb 2010  Velocity 2010/DevOpsDays Mountain View, June 2010  Helped launch CloudAustin July 2010  We hosted DevOpsDays Austin 2012 at NI!
  • 17. Bazaarvoice  BV – founded in 2005, $191M revenue, 826 employees, Austin Ventures backed startup went public in 2012  SaaS provider of ratings and reviews and analytics and syndication products, media  Very large scale – customers are 30% of leading brands, retailers like Walmart, services like OpenTable – 600M impressions/day, 450M unique users/month
  • 18. BV Release Manager (2012)  Bazaarvoice was making the transition into an engineering-heavy company, adopting Agile and looking to rearchitect their aging core platform into a microservice architecture.  They were on 10-week release cycles that were delayed virtually 100% of the time  They moved to 2 week sprints but when they tried to release to production at the end of them they had downtime and a variety of issues (44 customers reported breakages)  I was brought on with the goal of “get us to two week releases in a month” (I started Jan 30, we launched March 1)  I had no direct reports, just broad cooperation and some key engineers seconded to me
  • 19. The Problems  Slow testing (low percentage of automated regression testing)  Lots of branching  Lots of branch checkins after testing begins  Creates vicious cycle of more to test/more delays  Monolithic architecture (15,000 files, 4.9M lines of code, running on 1200 hosts in 4 datacenters)  Unclear ownership of components  Release pipeline very “throw over the wall”-y  Concerns from Implementation, Support, and other teams about “not knowing what’s going on”
  • 20. What We Did  With Product support, product teams took several (~3) sprints off to automate regression testing – “as many as you need to hit the bar”  Core QA tooling investment (JUnit, TeamCity CIT, Selenium)  Review of assets and determining ownership (start of a service catalog)  New branch strategy – trunk and single release branch per release only, no checkins to release branch except to fix critical bugs  “If the build is broken and you had a change in, it’s your job to fix it”  Release process requiring explicit signoffs in Confluence/JIRA (every svn checkin was linked to a ticket) promoting ownership “through the pipeline”  Testing process changed (feature test in dev, regress in QA, smoke in stage)
  • 21. What We Did, Part 2  Also a feature flagging system was added, so new features a) could be included in builds immediately and b) would be launched dark.  “Just enough process” – go/no-go meetings and master signoffs were important early  Communication, communication, communication. Met with all stakeholders (in a company whose only product is the service, that’s everyone) and set up all-company service status and release notification emails.
  • 22. Results  First release: “no go” – delayed 5 days, 5 customer issues  Second release: on time, 1 customer issue  Third release: on time, 0 customer issues  Kept detailed metrics on # checkins, delays, process faults (mainly branch checkins), support tickets  We had issues (big blowup in May from a major change) and then we’d change the process  Running the releases was handed off to a rotation through the dev teams  After all the heavy lifting to get to biweekly releases, we went to weekly releases with little fanfare in September – average customer reported issues of 0
  • 23. BV Engineering Manager (2012-2014)  We had a core team working on the legacy ratings & reviews system while new microservice teams embarked on the rewrite  We still had a centralized operations team serving them all and knew we wanted to distribute them into the teams so that all those teams were operationally ready from early in their cycle  We declared “the death of the ops team” mid-2012 and disseminated them into the dev teams, initially reporting to a DevOps manager  I took over the various engineers working on the existing product – about 40 engineers, 2/3 of which were offshore outsourcers from Softserve
  • 24. The Problems  With the assumption that “the legacy stack” would only be around a short time, they had not been moved to agile and were very understaffed.  Volume (hits and data volume) continued to grow at a blistering pace necessitating continuous expansion and rearchitecture  Black Friday peaks stressed our architecture, with us serving 2.6 billion review impressions on Black Friday and Cyber Monday.  Relationships with our outsourcers was strained  Escalated support tickets were hitting SLA about 72% of the time  Hundreds to thousands of alerts a day, 750-item product backlog  Growing security and privacy compliance needs – SOX, ISO, TÜV, AFNOR
  • 25. What We Did  Moved the teams to agile, embedded DevOps  Started onshore rotations and better communication with outsourcers, empowered their engineers  Trying to break the team into 4 sprint teams failed initially because there weren’t enough trained leads/scrum masters, had to regroup and try again to get success  Balancing four split product backlogs with a Kanban-style “triage process” to handle incidents and support tickets  Metrics and custom visualizations across the board – from support SLA percentage to current requests and performance per cluster
  • 26.
  • 27. What We Did, Part 2  Enhanced communication with Support, embedded Product Manager  “Merit Badge” style crosstraining program to bring up new SMEs to reduce load on “the one person that knows how to do that”  “Dynamic Duo” day/night dev/ops operational rotation  Joint old team/new team Black Friday planning, game days  Worked with auditors and InfoSec team – everything automated and auditable in source control, documentation in wiki. Integrated security team findings into product backlog.
  • 28. Results  Customer support SLA went steadily up quarter by quarter until it hit 100%  Steadily increasing velocity across all 4 sprint teams  Great value from outsourcer partner  We continued to weather Black Friday spikes year after year on the old stack  Incorporation of “new stack” services, while slower than initially desired, happened in a more measured manner  Some retention issues with completely embedded DevOps on “new stack” teams however

Hinweis der Redaktion

  1. Hello from Austin, TX! I’m here to share with you a series of four DevOps transformations I’ve been involved in leading in two very different companies. I don’t claim any of these ideas or techniques we used are unique, but I wanted to demonstrate how DevOps concepts can get traction to solve various business problems in a variety of environments.
  2. I spent the 1990s in Memphis, TN, first working in the giant IT shop at FedEx and then taking on technical leadership at a publishing company growing into the Internet, getting experience in coding, system administration, management, and running large scale Web properties. In 2001 I came to Austin and started in on a series of what would become DevOps transformations in four distinct roles, two at National Instruments and two at Bazaarvoice.
  3. NI products help scientists and engineers power everything from the CERN supercollider to LEGO robots. Has anyone used LabVIEW?
  4. My previous company went under in the post-9/11 bust. My passion for Web technology and technical management led to a role back home in Texas at National Instruments. IT at NI was organized around standard enterprise lines. Reporting to the CFO, separate applications and infrastructure organizations. The two primary parts of the applications organization were the groups working on manufacturing and ERP systems based on Oracle E-Business Suite and the Web Team that supported the Web site.
  5. So keep in mind this was “pre-DevOps” (and even pre-Agile, mostly). We had to dig ourselves out of the hole and then work forward while we were growing and adding new technologies and systems all the time. Like many folks, we were discovering DevOps best practices as we went, by the sweat of our brow.
  6. In fact some of my fondest achievements from this gig are that the people I had working for me have all gone on to blossom into the areas they wanted to – Peco was our APM COE and now he’s a PM for Riverbed, Josh and James both wanted to get into security and now Josh is head of security for NI and James is notable in the security world, creator of the open source security/CI integration tool gauntlt and working at a security startup. (see http://www.riverbed.com/customer-stories/National-Instruments.html for the source on the APM benefit stats.)
  7. We got to where we had measurable availability (three nines) and performance goals that we were hitting along with hitting the yearly site growth and sales goals. We had an operational rotation that balanced project work with reactive work for the team. Over 6 years we fixed a lot, started performance and security centers of excellence, had very skilled engineers.
  8. I knew in my heart something was wrong. We were throwing a lot of smart people at the work we had and had good process, good automation, good collaboration – but we had plateaued. Keep in mind, there hadn’t been the first DevOpsDays yet. Continuous Delivery hadn’t been published. “Dev and Ops Cooperation at Flickr” hadn’t been presented at Velocity yet. We had happened on to a lot of the DevOps approach The dev teams were just starting to use Agile and it was giving us heartache. Discussion of infrastructure test was largely met with blank looks. Infrastructure teams especially were not receptive to a faster cadence. We had gone to the first Velocity conference in 2008 and had been filled with all kinds of new ideas… Also, interaction within Infrastructure was a problem. As our approach and goals began to differ from those of the other teams we experienced conflict – we needed OpsOps, you might say.
  9. We didn’t know exactly what to do, but we knew that the existing way we worked IT systems would never be fast or agile enough for product purposes. When we got VMWare in IT, it turned a 6 week server procurement process into 4 weeks, only removing the Dell fulfillment time. So we went all in on cloud up front, even though it meant displacing every single tool we were familiar with (none were cloud friendly at the time). Security wise, we got ahead of that by doing security work proactively and being transparent about doing so. The first question every LabVIEW FPGA customer asked was “wait but what about the security of my IP?” And we had a prepared response assuring them “your data is yours” with our CISSP talking about how we do threat modeling and regular security testing and have an incident reporting. And that satisfied every asker!
  10. We also embarked on writing our own configuration management system, PIE. Why? Simply, chef and puppet existed at the time but didn’t do what we need – Windows support in particular, but also we knew we wanted something that could instantiate and manage an entire system at runtime and understand service dependencies. We figured we’d need to mix Amazon cloud, Azure cloud, NI on premise services, and other SaaS services and a CM system that “puts bits on machines” doesn’t do that. We knew this would be a lot of work that a small team would not be putting towards direct product development and we had to really all look at each other and decide that yes, this was a necessary part of achieving our goal.
  11. So we went in with a very simple but strict goal of “have that entire phylogical diagram be code, and have it be dynamic as systems change, code is deployed, etc.” PIE had an XML model detailing the systems, the services running on those systems, and their interconnections. It used zookeeper as a runtime registry. Dev, test, and production systems were always identical. Changes were made exclusively through version control. We even used PIE for ad hoc command dispatch so that could be logged. This approach is finally gaining currency especially with docker’s more explicit support for service dependencies, but nothing else did this well for a long time.
  12. We had some design reviews where the app architect came to me and said “these ops guys are asking questions we just don’t understand… Maybe we should have separate reviews.” I explained that we could learn each other’s language and set up a virtuous cycle of collaboration or continue to split everything and go down the vicious cycle of tickets and process and bottlenecks. So we kept at it.
  13. Even with building our own automation platform (and other plumbing like login and licensing services), we delivered a product to market each year (plus some other prototypes and internal services), both directly web-facing and integrated with NI’s desktop products. Besides this, our uptime was 100% (vs 98.59% for our online catalog during the same period), we had security we’d never been able to get done (DMZs for example)… It may not be “fair” to compare a greenfield app with our legacy site, but we’d been working on that site for 6 years, and in the end it’s the results that matter. We didn’t have to balance resiliency and innovation, we were getting both with our approach. When we got told “You need to put some of that on Microsoft Azure” it was pretty straightforward to extend PIE to do so. The primary hurdle was more in the sales and marketing of these new solutions – they were coming faster than they were prepared for!
  14. I think it’s important to note that during this period is when DevOps was coined as a term and took off. I was very, very lucky in that many of the principals happened to come to OpsCamp Austin right after the first DevOpsDays happened and were all abuzz about it. Getting plugged into this wave of sharing was extremely energizing – we kept wondering if we were completely crazy and off track until we found out that so many people were finding the same issues and headed in the same direction.
  15. Bazaarvoice provides hosted ratings and reviews, along with related services like authenticity, analytics, content syndication between retailers and brands, etc. They’re one of the big Austin tech startup success stories.
  16. We took a strong stance on dev empowerment and responsibility, and also each team setting their own specific metrics.
  17. A lot of the existing ops team went to the PRR team. I started with them, got them running on Scrum, worked on oncall, incident command, etc. processes.
  18. This “legacy stack” is still around now, of course.
  19. Pretty much it’s all about collaboration, and putting just enough process in place to aid collaboration.
  20. In the end we had large scale integrated DevOps doing a lot of sustaining work as well as continuing heavy infrastructure development successfully.