SlideShare ist ein Scribd-Unternehmen logo
1 von 111
Downloaden Sie, um offline zu lesen
DevOps Kaizen:
Practical Steps to Start & Sustain an Organization’s Transformation
Damon Edwards
@damonedwards
DevOps Consulting
Operational Improvement
Tools
@damonedwards
Community
High
Performing
CompaniesPractices & Behaviors
Gene Kim
High
Performing
CompaniesPractices & Behaviors
Gene Kim
… but WHY are
they different?
The ability to improve.
The unique trait of high-performing companies is
that they are good at getting better.
Improvement already has a well known recipe:
Plan - Do - Study - Act (PDSA)
Other variants:
PDCA
OODA
W. Edwards Deming - 1950
© The Deming Institute
Why are so many organizations unable to improve?
1. The work isn’t visible
Why are so many organizations unable to improve?
1. The work isn’t visible
2. People are working out of context
Why are so many organizations unable to improve?
1. The work isn’t visible
2. People are working out of context
3. Inertia is pulling your org out of alignment
Why are so many organizations unable to improve?
ThomasJefferson
CEO
DwightEisenhower
BusinessOps
FrankRoosevelt
TechOps
SusanAnthony
SoftwareDev.
RonaldReagan
Sales
GeorgeBush
MarCom
AbrahamLincoln
CFO
BarackObama
CIO
BarbaraBush
Finance
LyndonJohnson
BizDev
CalvinCoolridge
Operations
BillClinton
USSales
JackKennedy
EuropeanSales
JimmyCarter
AsiaPacSales
GeraldFord
Product
MichelleObama
Marketing
HarryTruman
HR
HowardTaft
CoreSystems
WarrenHarding
WebSystems
ChesterArthur
Arch&DBA
TedRoosevelt
Infrastructure
BenHarrison
Networks
WilMcKinley
SRE
Thomas Jefferson
CEO
Dwight Eisenhower
Business Ops
Frank Roosevelt
TechOps
Susan Anthony
Software Dev.
Ronald Reagan
Sales
George Bush
MarCom
Abraham Lincoln
CFO
Barack Obama
CIO
Barbara Bush
Finance
Lyndon Johnson
BizDev
Calvin Coolridge
Operations
Bill Clinton
US Sales
Jack Kennedy
European Sales
Jimmy Carter
AsiaPac Sales
Gerald Ford
Product
Michelle Obama
Marketing
Harry Truman
HR
Howard Taft
Core Systems
Warren Harding
Web Systems
Chester Arthur
Arch & DBA
Ted Roosevelt
Infrastructure
Ben Harrison
Networks
Wil McKinley
SRE
Thomas Jefferson
CEO
Dwight Eisenhower
Business Ops
Frank Roosevelt
TechOps
Susan Anthony
Software Dev.
Ronald Reagan
Sales
George Bush
MarCom
Abraham Lincoln
CFO
Barack Obama
CIO
Barbara Bush
Finance
Lyndon Johnson
BizDev
Calvin Coolridge
Operations
Bill Clinton
US Sales
Jack Kennedy
European Sales
Jimmy Carter
AsiaPac Sales
Gerald Ford
Product
Michelle Obama
Marketing
Harry Truman
HR
Howard Taft
Core Systems
Warren Harding
Web Systems
Chester Arthur
Arch & DBA
Ted Roosevelt
Infrastructure
Ben Harrison
Networks
Wil McKinley
SRE
Org Charts
Reference
Architecture
Strategy
&
Budget
Top Secret
Strategy
&
Budget
Top Secret
Documented Processes Project Plans
Traditional “visibility”
Release Trains
ThomasJefferson
CEO
DwightEisenhower
BusinessOps
FrankRoosevelt
TechOps
SusanAnthony
SoftwareDev.
RonaldReagan
Sales
GeorgeBush
MarCom
AbrahamLincoln
CFO
BarackObama
CIO
BarbaraBush
Finance
LyndonJohnson
BizDev
CalvinCoolridge
Operations
BillClinton
USSales
JackKennedy
EuropeanSales
JimmyCarter
AsiaPacSales
GeraldFord
Product
MichelleObama
Marketing
HarryTruman
HR
HowardTaft
CoreSystems
WarrenHarding
WebSystems
ChesterArthur
Arch&DBA
TedRoosevelt
Infrastructure
BenHarrison
Networks
WilMcKinley
SRE
Thomas Jefferson
CEO
Dwight Eisenhower
Business Ops
Frank Roosevelt
TechOps
Susan Anthony
Software Dev.
Ronald Reagan
Sales
George Bush
MarCom
Abraham Lincoln
CFO
Barack Obama
CIO
Barbara Bush
Finance
Lyndon Johnson
BizDev
Calvin Coolridge
Operations
Bill Clinton
US Sales
Jack Kennedy
European Sales
Jimmy Carter
AsiaPac Sales
Gerald Ford
Product
Michelle Obama
Marketing
Harry Truman
HR
Howard Taft
Core Systems
Warren Harding
Web Systems
Chester Arthur
Arch & DBA
Ted Roosevelt
Infrastructure
Ben Harrison
Networks
Wil McKinley
SRE
Thomas Jefferson
CEO
Dwight Eisenhower
Business Ops
Frank Roosevelt
TechOps
Susan Anthony
Software Dev.
Ronald Reagan
Sales
George Bush
MarCom
Abraham Lincoln
CFO
Barack Obama
CIO
Barbara Bush
Finance
Lyndon Johnson
BizDev
Calvin Coolridge
Operations
Bill Clinton
US Sales
Jack Kennedy
European Sales
Jimmy Carter
AsiaPac Sales
Gerald Ford
Product
Michelle Obama
Marketing
Harry Truman
HR
Howard Taft
Core Systems
Warren Harding
Web Systems
Chester Arthur
Arch & DBA
Ted Roosevelt
Infrastructure
Ben Harrison
Networks
Wil McKinley
SRE
Org Charts
Reference
Architecture
Strategy
&
Budget
Top Secret
Strategy
&
Budget
Top Secret
Documented Processes Project Plans
Traditional “visibility”
Release Trains
Meetings
Meetings Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
ThomasJefferson
CEO
DwightEisenhower
BusinessOps
FrankRoosevelt
TechOps
SusanAnthony
SoftwareDev.
RonaldReagan
Sales
GeorgeBush
MarCom
AbrahamLincoln
CFO
BarackObama
CIO
BarbaraBush
Finance
LyndonJohnson
BizDev
CalvinCoolridge
Operations
BillClinton
USSales
JackKennedy
EuropeanSales
JimmyCarter
AsiaPacSales
GeraldFord
Product
MichelleObama
Marketing
HarryTruman
HR
HowardTaft
CoreSystems
WarrenHarding
WebSystems
ChesterArthur
Arch&DBA
TedRoosevelt
Infrastructure
BenHarrison
Networks
WilMcKinley
SRE
Thomas Jefferson
CEO
Dwight Eisenhower
Business Ops
Frank Roosevelt
TechOps
Susan Anthony
Software Dev.
Ronald Reagan
Sales
George Bush
MarCom
Abraham Lincoln
CFO
Barack Obama
CIO
Barbara Bush
Finance
Lyndon Johnson
BizDev
Calvin Coolridge
Operations
Bill Clinton
US Sales
Jack Kennedy
European Sales
Jimmy Carter
AsiaPac Sales
Gerald Ford
Product
Michelle Obama
Marketing
Harry Truman
HR
Howard Taft
Core Systems
Warren Harding
Web Systems
Chester Arthur
Arch & DBA
Ted Roosevelt
Infrastructure
Ben Harrison
Networks
Wil McKinley
SRE
Thomas Jefferson
CEO
Dwight Eisenhower
Business Ops
Frank Roosevelt
TechOps
Susan Anthony
Software Dev.
Ronald Reagan
Sales
George Bush
MarCom
Abraham Lincoln
CFO
Barack Obama
CIO
Barbara Bush
Finance
Lyndon Johnson
BizDev
Calvin Coolridge
Operations
Bill Clinton
US Sales
Jack Kennedy
European Sales
Jimmy Carter
AsiaPac Sales
Gerald Ford
Product
Michelle Obama
Marketing
Harry Truman
HR
Howard Taft
Core Systems
Warren Harding
Web Systems
Chester Arthur
Arch & DBA
Ted Roosevelt
Infrastructure
Ben Harrison
Networks
Wil McKinley
SRE
Org Charts
Reference
Architecture
Strategy
&
Budget
Top Secret
Strategy
&
Budget
Top Secret
Documented Processes Project Plans
Traditional “visibility”
Release Trains
Meetings
Meetings Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings Meetings
Meetings
Meetings
MeetingsMeetings
ThomasJefferson
CEO
DwightEisenhower
BusinessOps
FrankRoosevelt
TechOps
SusanAnthony
SoftwareDev.
RonaldReagan
Sales
GeorgeBush
MarCom
AbrahamLincoln
CFO
BarackObama
CIO
BarbaraBush
Finance
LyndonJohnson
BizDev
CalvinCoolridge
Operations
BillClinton
USSales
JackKennedy
EuropeanSales
JimmyCarter
AsiaPacSales
GeraldFord
Product
MichelleObama
Marketing
HarryTruman
HR
HowardTaft
CoreSystems
WarrenHarding
WebSystems
ChesterArthur
Arch&DBA
TedRoosevelt
Infrastructure
BenHarrison
Networks
WilMcKinley
SRE
Thomas Jefferson
CEO
Dwight Eisenhower
Business Ops
Frank Roosevelt
TechOps
Susan Anthony
Software Dev.
Ronald Reagan
Sales
George Bush
MarCom
Abraham Lincoln
CFO
Barack Obama
CIO
Barbara Bush
Finance
Lyndon Johnson
BizDev
Calvin Coolridge
Operations
Bill Clinton
US Sales
Jack Kennedy
European Sales
Jimmy Carter
AsiaPac Sales
Gerald Ford
Product
Michelle Obama
Marketing
Harry Truman
HR
Howard Taft
Core Systems
Warren Harding
Web Systems
Chester Arthur
Arch & DBA
Ted Roosevelt
Infrastructure
Ben Harrison
Networks
Wil McKinley
SRE
Thomas Jefferson
CEO
Dwight Eisenhower
Business Ops
Frank Roosevelt
TechOps
Susan Anthony
Software Dev.
Ronald Reagan
Sales
George Bush
MarCom
Abraham Lincoln
CFO
Barack Obama
CIO
Barbara Bush
Finance
Lyndon Johnson
BizDev
Calvin Coolridge
Operations
Bill Clinton
US Sales
Jack Kennedy
European Sales
Jimmy Carter
AsiaPac Sales
Gerald Ford
Product
Michelle Obama
Marketing
Harry Truman
HR
Howard Taft
Core Systems
Warren Harding
Web Systems
Chester Arthur
Arch & DBA
Ted Roosevelt
Infrastructure
Ben Harrison
Networks
Wil McKinley
SRE
Org Charts
Reference
Architecture
Strategy
&
Budget
Top Secret
Strategy
&
Budget
Top Secret
Documented Processes Project Plans
Traditional “visibility”
Release Trains
Meetings
Meetings Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings
Meetings Meetings
Meetings
Meetings
MeetingsMeetings
The Illusion of
Control
It’s a complex system2
Complex
System
It’s a complex system2
Complex
System
Complex
System
interacting with a
It’s a complex system2
Enterprises naturally trend towards silos
Dev / Test Release OperatePlanning
Enterprises naturally trend towards silos
Dev / Test Release OperatePlanning
Enterprises naturally trend towards silos
Dev / Test Release OperatePlanning
Handoff
!
Handoff
!
Handoff
!
Enterprises naturally trend towards silos
Dev / Test Release OperatePlanning
Application Knowledge
Handoff
!
Handoff
!
Handoff
!
Enterprises naturally trend towards silos
Dev / Test Release OperatePlanning
Application Knowledge
Operational Knowledge
Handoff
!
Handoff
!
Handoff
!
Enterprises naturally trend towards silos
Dev / Test Release OperatePlanning
Application Knowledge
Operational Knowledge
Business Intent
Handoff
!
Handoff
!
Handoff
!
Enterprises naturally trend towards silos
Dev / Test Release OperatePlanning
Application Knowledge
Operational Knowledge
Business Intent
Handoff
!
Handoff
!
Handoff
!
Ownership
but limited
Accountability
Enterprises naturally trend towards silos
Dev / Test Release OperatePlanning
Application Knowledge
Operational Knowledge
Business Intent
Handoff
!
Handoff
!
Handoff
!
Ownership
but limited
Accountability
Accountability
but no
Ownership
1. The work isn’t visible
2. People are working out of context
3. Inertia is pulling your org out of alignment
++
Why are so many organizations unable to improve?
The only way to fix a sufficiently complex system is
to create the conditions for the system to fix itself.
“I know the answer!…”
Too costly…
outsource more!
Finance
“I know the answer!…”
Too costly…
outsource more!
Finance
We need results…
re-org until we do!
Executive
Committee
“I know the answer!…”
Too costly…
outsource more!
Finance
More discipline…
tighter process and
more approvals
Change
Management
We need results…
re-org until we do!
Executive
Committee
“I know the answer!…”
Too costly…
outsource more!
Finance
More discipline…
tighter process and
more approvals
Change
Management
We need results…
re-org until we do!
Executive
Committee
Need better tools…
new automation and a
new network!!
Engineers
“I know the answer!…”
The “Big Bang” Transformation Dream
Start
Finish
The “Big Bang” Transformation Reality
Start
Goal
Fear
Panic
Abort
Maybe
The “Big Bang” Transformation Reality
Start
Goal
Fear
Panic
Abort
Maybe
People revert to legacy
behaviors
The “Big Bang” Transformation Reality
Start
Goal
Fear
Panic
Abort
Maybe
People revert to legacy
behaviors
Declare “Done”
The “Big Bang” Transformation Reality
Start
Goal
Fear
Panic
Abort
Maybe
People revert to legacy
behaviors
Declare “Done”
More discipline…
tighter process and
more approvals
Need Results…
Re-Org!
Need better tools…
cool automation and a
new network!!
Too costly…
outsource more!
Finance
Change
Management
Executive
Committee
Engineers
More discipline…
tighter process and
more approvals
Need Results…
Re-Org!
Need better tools…
cool automation and a
new network!!
Too costly…
outsource more!
Finance
Change
Management
Executive
Committee
Engineers
“Little J’s” instead of “Big J”
Start
Finish
Start
Finish
“Big Bang” Continuous Improvement
Fear
Panic
Abort
Maybe
“Little J’s” instead of “Big J”
Start
Finish
Start
Finish
“Big Bang” Continuous Improvement
Fear
Panic
Abort
Maybe
“Little J’s” instead of “Big J”
Start
Finish
Start
Finish
“Big Bang” Continuous Improvement
Fear
Panic
Abort
Maybe
Turn Continuous Improvement into an enterprise program
You are going to have to…
Turn Continuous Improvement into an enterprise program
•Keep improvement efforts aligned
You are going to have to…
Turn Continuous Improvement into an enterprise program
•Keep improvement efforts aligned
•Scale quickly
You are going to have to…
Turn Continuous Improvement into an enterprise program
•Keep improvement efforts aligned
•Scale quickly
•Span multiple organizational boundaries
You are going to have to…
Turn Continuous Improvement into an enterprise program
•Keep improvement efforts aligned
•Scale quickly
•Span multiple organizational boundaries
•Work with substantial numbers of legacy technologies
You are going to have to…
Turn Continuous Improvement into an enterprise program
•Keep improvement efforts aligned
•Scale quickly
•Span multiple organizational boundaries
•Work with substantial numbers of legacy technologies
•Develop your existing staff in mass
You are going to have to…
Turn Continuous Improvement into an enterprise program
•Keep improvement efforts aligned
•Scale quickly
•Span multiple organizational boundaries
•Work with substantial numbers of legacy technologies
•Develop your existing staff in mass
•Be self-funding after initial seed investment
You are going to have to…
1. The work isn’t visible
2. People are working out of context
3. Inertia is pulling your org out of alignment
But how do you do that when…
You need a systemic way to teach an organization to
find and fix what is getting in its own way.
“Kaizen”
“Kaizen”
• Kaizen: Japanese word for improvement
“Kaizen”
• Kaizen: Japanese word for improvement
• Modern business context:
• Continuous improvement
• Systematic, scientific-method approach
• Total engagement of the workforce
• Valuing small changes as much as large changes
(outcome is what matters)
“Kaizen”
• Kaizen: Japanese word for improvement
• Modern business context:
• Continuous improvement
• Systematic, scientific-method approach
• Total engagement of the workforce
• Valuing small changes as much as large changes
(outcome is what matters)
• Kaizen in a DevOps context:
• Continuously improve the flow of work through the full
value stream in order to improve customer outcomes
Proven Lean Techniques
+
DevOps Context
“If I have seen further, it is by standing on the shoulders of giants.”
-Sir Isaac Newton
“DevOps Kaizen”
Service
Delivery
Metrics
Kaizen
Program
Oversight
Planning
&
Retrospectives
Informs Informs
Countermeasures &
Blockers
Elements of a DevOps Kaizen Program
Service
Delivery
Metrics
Kaizen
Program
Oversight
Planning
&
Retrospectives
Informs Informs
Countermeasures &
Blockers
Elements of a DevOps Kaizen Program
Organization-wide focus on service delivery metrics
• Lead Time (Duration and Predictability)
• MTTD (Mean Time To Detect)

• MTTR (Mean Time to Repair, Mean Time to Fix)

• Quality at the Source (Scrap/Rework)

OpsDev
Customer
Outcome
Business
Goal
Service
Delivery
Metrics
Kaizen
Program
Oversight
Planning
&
Retrospectives
Informs Informs
Countermeasures &
Blockers
Elements of a DevOps Kaizen Program
Service
Delivery
Metrics
Kaizen
Program
Oversight
Planning
&
Retrospectives
Informs Informs
Countermeasures &
Blockers
Elements of a DevOps Kaizen Program
This is where the
work becomes
visible!
Retrospectives are a per value stream tool
Value
Stream
A
Value
Stream
B
Value
Stream
C
Key:
“horizontal
thinking”
Map end-to-end process1
Include key process metrics:
Lead Time
Processing Time
Scrap Rate
Head Count
DevOps Kaizen: Retrospective Technique
Map end-to-end process1
Include key process metrics:
Lead Time
Processing Time
Scrap Rate
Head Count
DevOps Kaizen: Retrospective Technique
Note: “going to the
gemba” requires
making it visible
together
Map end-to-end process1
Include key process metrics:
Lead Time
Processing Time
Scrap Rate
Head Count
DevOps Kaizen: Retrospective Technique
Key: graphical
facilitation above
all else!
Note: “going to the
gemba” requires
making it visible
together
Map end-to-end process1
Include key process metrics:
Lead Time
Processing Time
Scrap Rate
Head Count
DevOps Kaizen: Retrospective Technique
Key: graphical
facilitation above
all else!
Note: “going to the
gemba” requires
making it visible
together
Inspiration: value stream mapping
Identify wastes, inefficiencies, bottlenecks
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
HB - Heroics
Structured approach building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
2
DevOps Kaizen: Retrospective Technique
Identify wastes, inefficiencies, bottlenecks
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
HB - Heroics
Structured approach building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
2
DevOps Kaizen: Retrospective Technique
Key: focus on
flow of value…
not gripes
Identify wastes, inefficiencies, bottlenecks
PD - Partially Done
TS - Task Switching
W - Waiting
M - Motion / Manual
D - Defects
EP - Extra Process
EF - Extra Features
HB - Heroics
Structured approach building on DevOps
adaptation of “7 deadly wastes” from Lean / Agile:
2
DevOps Kaizen: Retrospective Technique
Key: focus on
flow of value…
not gripes
Inspiration: 7 Wastes of Software Development
DevOps Kaizen: Retrospective Technique
Identify countermeasures
Countermeasures must be actionable, backlog ready.
Focus on short-term “baby steps”. Note broader, strategic
recommendations.
3
DevOps Kaizen: Retrospective Technique
Identify countermeasures
Countermeasures must be actionable, backlog ready.
Focus on short-term “baby steps”. Note broader, strategic
recommendations.
3
Key: “small j’s,
not big j’s”
DevOps Kaizen: Retrospective Technique
Create Improvement Storyboards (Kata Style)4
DevOps Kaizen: Retrospective Technique
Create Improvement Storyboards (Kata Style)4
Key: actionable short-term “baby steps”…
“what are we going to do next?”
DevOps Kaizen: Retrospective Technique
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
Business Trigger
Major Feature for Partner Markets
(if new arch)
40 weeks total
17 weeks 16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Requirements / Planning Dev / Test Staging Release Prod Release
Data Setup Integration Testing
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
D
TS
EP
M
D
PD M W
TS
TS D
M TS
PD
M
M
1.
2.
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
S/R - 71%
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development D
In
D
fo
Scrum
Dev/QA
Business Trigger
Major Feature for Partner Markets
17 weeks 16 weeks
6 weeks H/C: 6 3 w
Requirements / Planning Dev / Test
Data Setup Inte
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
Service Verification test writing: shif
(test early)
Remove Bottleneck and Environme
(test more)
EP
D
TS
EP
M
D
PD M W
TS
1.
2.
3.
S/R - 90%
S/R - 71%
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
Business Trigger
Major Feature for Partner Markets
(if new arch)
40 weeks total
17 weeks 16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Requirements / Planning Dev / Test Staging Release Prod Release
Data Setup Integration Testing
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
D
TS
EP
M
D
PD M W
TS
TS D
M TS
PD
M
M
1.
2.
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
S/R - 71%
Leads
PMs
gn
ments
etail for their team
e
rd
Chief
chitect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
CA
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Jira
(if new arch)
16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks
Dev / Test Staging
Data Setup Integration Testing
4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
M
D
PD M W
TS
TS D
M TS
3.
4.
5.
6.
S/R - 90%
S/R - 55%
S/R - 32%
D
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
Business Trigger
Major Feature for Partner Markets
(if new arch)
40 weeks total
17 weeks 16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Requirements / Planning Dev / Test Staging Release Prod Release
Data Setup Integration Testing
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
D
TS
EP
M
D
PD M W
TS
TS D
M TS
PD
M
M
1.
2.
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
S/R - 71%
ire /
needed
a
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
(if new arch)
40 weeks total
16 weeks
weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Dev / Test Staging Release Prod Release
Data Setup Integration Testing
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
vice Verification test writing: shift left to Dev
(test early)
move Bottleneck and Environment Contention
(test more)
D
PD M W
TS
TS D
M TS
PD
M
M
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
Business Trigger
Major Feature for Partner Markets
(if new arch)
40 weeks total
17 weeks 16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Requirements / Planning Dev / Test Staging Release Prod Release
Data Setup Integration Testing
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
D
TS
EP
M
D
PD M W
TS
TS D
M TS
PD
M
M
1.
2.
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
S/R - 71%
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
Business Trigger
Major Feature for Partner Markets
(if new arch)
40 weeks total
17 weeks 16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Requirements / Planning Dev / Test Staging Release Prod Release
Data Setup Integration Testing
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
D
TS
EP
M
D
PD M W
TS
TS D
M TS
PD
M
M
1.
2.
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
S/R - 71%
+ Work in small batches
+ Early Ops Involvement
+ Standardized Catalog (with design
standards built-in)
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
Business Trigger
Major Feature for Partner Markets
(if new arch)
40 weeks total
17 weeks 16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Requirements / Planning Dev / Test Staging Release Prod Release
Data Setup Integration Testing
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
D
TS
EP
M
D
PD M W
TS
TS D
M TS
PD
M
M
1.
2.
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
S/R - 71%
+ Work in small batches
+ Early Ops Involvement
+ Standardized Catalog (with design
standards built-in)
+Dev provide service verification tests
+Ops provide environment verification
tests (used by Dev and QA)
+Service service test data setup
(including mainframe)
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
Business Trigger
Major Feature for Partner Markets
(if new arch)
40 weeks total
17 weeks 16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Requirements / Planning Dev / Test Staging Release Prod Release
Data Setup Integration Testing
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
D
TS
EP
M
D
PD M W
TS
TS D
M TS
PD
M
M
1.
2.
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
S/R - 71%
+ Work in small batches
+ Early Ops Involvement
+ Standardized Catalog (with design
standards built-in)
+Dev provide service verification tests
+Ops provide environment verification
tests (used by Dev and QA)
+Service service test data setup
(including mainframe)
+All deploys use Dev provided
service verification tests
+Deployment rehearsals using
service verification & environment
verification tests in all lower
environments
Customers
and Clients
Upcoming
Market Event
Chief
Strategy
Officer
Strategize
Product
Roadmap
Weekly Exec
Meeting
Ad-Hoc
P.M
"John"
Program
"Susan"
Product
Requirements
Defined
Jira
"Stories"
ARB
Queue
Track Ticket
Dependencies
Confluence
Pages
design
docs
Team Leads
and PMs
Assign
Requirements
more detail for their team
Architecture
Review Board
Chief
Architect
"Linda"
Working
Group
Ops
? (sometimes)
Devs, PM, Engr, QA
Development
Sprint
2 week c/t
Existing Dev
Environments
Acquire /
Prepare needed
data
DBA in
TechOps
Data Setup
"Jennifer"
Test Data
Configuration
Manager
Development Deploy to
Integration
Dev, QA
Integration &
Regression
Testing
focused on service
Scrum
Dev/QA
Integ03
Scrum
Dev/QA
Test
Link
Sprint
Review
Release
to Prod
Product Owners
(Using own
criteria)
Create
CAB ticket
or
Scrum Team Ops Team
(if legacy)
Push Deployment
to Stage
Stage
Email Notification
Jira
NewArch
Build
VMs
Jira
Ops
Service
Now
Legacy
QA Lead
PMs
QAs
End to end
testing in Prod
Prod Env
Prd
DB
Go-No Go
decision
meeting
Team Leads
Jira
Ops
By Cluster
"Remove
Feature Flag"
Business Trigger
Major Feature for Partner Markets
(if new arch)
40 weeks total
17 weeks 16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Requirements / Planning Dev / Test Staging Release Prod Release
Data Setup Integration Testing
6. Service Aligned
Team Concept? 4. Write end-to-
end customer
func. tests
5. Resolve
interface to
legacy
1. Test data
setup
automation
3. Dev to Prod
for legacy
2. Unify change
management
tools
7. Tool
Service Verification test writing: shift left to Dev
(test early)
Remove Bottleneck and Environment Contention
(test more)
EP
D
TS
EP
M
D
PD M W
TS
TS D
M TS
PD
M
M
1.
2.
3.
4.
5.
6.
7.
8.
S/R - 90%
S/R - 55%
S/R - 32%
D
S/R - 43%
S/R - 71%
+ Work in small batches
+ Early Ops Involvement
+ Standardized Catalog (with design
standards built-in)
Key: “What can we do next?” NOT “what is nirvana?”
+Dev provide service verification tests
+Ops provide environment verification
tests (used by Dev and QA)
+Service service test data setup
(including mainframe)
+All deploys use Dev provided
service verification tests
+Deployment rehearsals using
service verification & environment
verification tests in all lower
environments
"MyAccount Perf Incident"
Customer
Call Contact
Center
!!!
Contact
Center
Call Service
Desk
Service
Desk
History:
12/20/15 - Started seeing problems
12/29/15 - Swarm of problems then everyone on vacation
9am Monday 1/4/16 - everyone back to work and problems spike
Try to
recreate
errors
Contact
Service Desk
Try to
recreate
errors
Check
Monitoring
BMC
"All green"
(up/down
monitoring)
TixTixTix
"stuff isn't
working!"
Many tickets
Many calls
Service
Now
Tickets
Start
Bridge
Callcall
someone
call
someone
else
Call
Vendor
Support
Start Biz
Management
Bridge
Assemble
War Room
"Terri"
Dir.
Account
Reps
Incident
Commander
NOC
2 hours
1/5/16
9 AM 11 AM
Web Dev
Fix
Test
Services
Fix
Test
Portal Ops
Fix
Test
DBA Ops
Fix
Test
Network Fix Test
Vendor
Partners
Fix
Test
Investigate
DDOS
(Security)
Test
Incident
Commander
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/7/16
Send Biz
Communication
Emails
Update BRM
Bridge Call
Update Update Update Update
"Scribes"
Notes
.doc Chat
Repeat
War
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Repeat
War
Room
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
UpdateUpdate Update
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent Fix Options
(assumptions):
• Upgrade to latest version
(~Q3 2016 go live?)
• Rewrite DB layer of App
to remove stored
procedures
PD
No user experience monitoring in place
No standard application performance tools
available early and often
Limited database diagnostic tools
TS
Lots of teams spending
many cycles to "prove
it wasn't them"
TS
Too much noise on bridge.
Engineers can't focus.
Open up side bridges
because of difficulty of
noise / task switching on
main bridge
Vendor reps joining calls
aren't correct people
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
H
M
PD
TS
H
M
PD
TS
Can't trace user transactions through the stack
No controlled "prod-like" stage env for troubleshooting
Vendor provided "fix" is
really just a workaround
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
PD
TS
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
TS
H
M
PD
TS
H
M
PD
TS
No controlled "prod-like" stage env for troubleshooting
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
1. Service performance monitoring,
error detection, and automated
health checks starting Dev
6. Platform App Dev leading new
deployment strategy to improve
traceability and ease rollback
2. Improved vendor SLA
management
7. Improved / Streamlined
Communication in War Room / Bridge
3. Improved code
review
5. "Prod-Like" Pre-Prod
environments (with load
testing)
8. Follow through on
resolution
Also In-flight:
• Peer review for every check-in
• Automation tooling and SDLC for
DB changes (no more adhoc scripts)
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF INCIDENT:
$1,012,000 ++
every 4 hours
Request
"MyAccount Perf Incident"
Customer
Call Contact
Center
!!!
Contact
Center
Call Service
Desk
Service
Desk
History:
12/20/15 - Started seeing problems
12/29/15 - Swarm of problems then everyone on vacation
9am Monday 1/4/16 - everyone back to work and problems spike
Try to
recreate
errors
Contact
Service Desk
Try to
recreate
errors
Check
Monitoring
BMC
"All green"
(up/down
monitoring)
TixTixTix
"stuff isn't
working!"
Many tickets
Many calls
Service
Now
Tickets
Start
Bridge
Callcall
someone
call
someone
else
Call
Vendor
Support
Start Biz
Management
Bridge
Assemble
War Room
"Terri"
Dir.
Account
Reps
Incident
Commander
NOC
2 hours
1/5/16
9 AM 11 AM
Web Dev
Fix
Test
Services
Fix
Test
Portal Ops
Fix
Test
DBA Ops
Fix
Test
Network Fix Test
Vendor
Partners
Fix
Test
Investigate
DDOS
(Security)
Test
Incident
Commander
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
~2p
Send Biz
Communication
Emails
Update BRM
Bridge Call
Update Update Update
"Scribes"
Notes
.doc
Update
PD
No user experience monitoring in place
No standard application performance tools
available early and often
Limited database diagnostic tools
TS
Lots of teams spending
many cycles to "prove
it wasn't them"
TS
Too much noise on bridge.
Engineers can't focus.
Open up side bridges
because of difficulty of
noise / task switching on
main bridge
Vendor reps joining calls
aren't correct people
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people
H
M
PD
TS
PD
TS
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people
H
M
PD
TS
1. Service performance monitoring,
error detection, and automated
health checks starting Dev
6. Platform App Dev leading new
deployment strategy to improve
traceability and ease rollback
2. Improved vendor SLA
management
7. Improved / Stre
Communication i
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
every 4 hours
Request
"MyAccount Perf Incident"
Customer
Call Contact
Center
!!!
Contact
Center
Call Service
Desk
Service
Desk
History:
12/20/15 - Started seeing problems
12/29/15 - Swarm of problems then everyone on vacation
9am Monday 1/4/16 - everyone back to work and problems spike
Try to
recreate
errors
Contact
Service Desk
Try to
recreate
errors
Check
Monitoring
BMC
"All green"
(up/down
monitoring)
TixTixTix
"stuff isn't
working!"
Many tickets
Many calls
Service
Now
Tickets
Start
Bridge
Callcall
someone
call
someone
else
Call
Vendor
Support
Start Biz
Management
Bridge
Assemble
War Room
"Terri"
Dir.
Account
Reps
Incident
Commander
NOC
2 hours
1/5/16
9 AM 11 AM
Web Dev
Fix
Test
Services
Fix
Test
Portal Ops
Fix
Test
DBA Ops
Fix
Test
Network Fix Test
Vendor
Partners
Fix
Test
Investigate
DDOS
(Security)
Test
Incident
Commander
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/7/16
Send Biz
Communication
Emails
Update BRM
Bridge Call
Update Update Update Update
"Scribes"
Notes
.doc Chat
Repeat
War
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Repeat
War
Room
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
UpdateUpdate Update
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent Fix Options
(assumptions):
• Upgrade to latest version
(~Q3 2016 go live?)
• Rewrite DB layer of App
to remove stored
procedures
PD
No user experience monitoring in place
No standard application performance tools
available early and often
Limited database diagnostic tools
TS
Lots of teams spending
many cycles to "prove
it wasn't them"
TS
Too much noise on bridge.
Engineers can't focus.
Open up side bridges
because of difficulty of
noise / task switching on
main bridge
Vendor reps joining calls
aren't correct people
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
H
M
PD
TS
H
M
PD
TS
Can't trace user transactions through the stack
No controlled "prod-like" stage env for troubleshooting
Vendor provided "fix" is
really just a workaround
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
PD
TS
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
TS
H
M
PD
TS
H
M
PD
TS
No controlled "prod-like" stage env for troubleshooting
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
1. Service performance monitoring,
error detection, and automated
health checks starting Dev
6. Platform App Dev leading new
deployment strategy to improve
traceability and ease rollback
2. Improved vendor SLA
management
7. Improved / Streamlined
Communication in War Room / Bridge
3. Improved code
review
5. "Prod-Like" Pre-Prod
environments (with load
testing)
8. Follow through on
resolution
Also In-flight:
• Peer review for every check-in
• Automation tooling and SDLC for
DB changes (no more adhoc scripts)
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF INCIDENT:
$1,012,000 ++
every 4 hours
Request
Web Dev
Fix
Test
Services
Fix
Test
Portal Ops
Fix
Test
DBA Ops
Fix
Test
Network Fix Test
Vendor
Partners
Fix
Test
Investigate
DDOS
(Security)
Test
Incident
Commander
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/7/16
Send Biz
Communication
Emails
Update BRM
Bridge Call
Update Update Update Update
"Scribes"
Notes
.doc Chat
Repeat
War
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Repeat
War
Room
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
UpdateUpdate Update
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent Fix Options
(assumptions):
• Upgrade to latest version
(~Q3 2016 go live?)
• Rewrite DB layer of App
to remove stored
procedures
nability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
H
M
PD
TS
H
M
PD
TS
Can't trace user transactions through the stack
No controlled "prod-like" stage env for troubleshooting
Vendor provided "fix" is
really just a workaround
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
nability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
TS
H
M
PD
TS
H
M
PD
TS
No controlled "prod-like" stage env for troubleshooting
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
7. Improved / Streamlined
Communication in War Room / Bridge
3. Improved code
review
5. "Prod-Like" Pre-Prod
environments (with load
testing)
8. Follow through on
resolution
Also In-flight:
• Peer review for every check-in
• Automation tooling and SDLC for
DB changes (no more adhoc scripts)
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF INCIDENT:
$1,012,000 ++
every 4 hours
"MyAccount Perf Incident"
Customer
Call Contact
Center
!!!
Contact
Center
Call Service
Desk
Service
Desk
History:
12/20/15 - Started seeing problems
12/29/15 - Swarm of problems then everyone on vacation
9am Monday 1/4/16 - everyone back to work and problems spike
Try to
recreate
errors
Contact
Service Desk
Try to
recreate
errors
Check
Monitoring
BMC
"All green"
(up/down
monitoring)
TixTixTix
"stuff isn't
working!"
Many tickets
Many calls
Service
Now
Tickets
Start
Bridge
Callcall
someone
call
someone
else
Call
Vendor
Support
Start Biz
Management
Bridge
Assemble
War Room
"Terri"
Dir.
Account
Reps
Incident
Commander
NOC
2 hours
1/5/16
9 AM 11 AM
Web Dev
Fix
Test
Services
Fix
Test
Portal Ops
Fix
Test
DBA Ops
Fix
Test
Network Fix Test
Vendor
Partners
Fix
Test
Investigate
DDOS
(Security)
Test
Incident
Commander
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/7/16
Send Biz
Communication
Emails
Update BRM
Bridge Call
Update Update Update Update
"Scribes"
Notes
.doc Chat
Repeat
War
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Repeat
War
Room
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
UpdateUpdate Update
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent Fix Options
(assumptions):
• Upgrade to latest version
(~Q3 2016 go live?)
• Rewrite DB layer of App
to remove stored
procedures
PD
No user experience monitoring in place
No standard application performance tools
available early and often
Limited database diagnostic tools
TS
Lots of teams spending
many cycles to "prove
it wasn't them"
TS
Too much noise on bridge.
Engineers can't focus.
Open up side bridges
because of difficulty of
noise / task switching on
main bridge
Vendor reps joining calls
aren't correct people
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
H
M
PD
TS
H
M
PD
TS
Can't trace user transactions through the stack
No controlled "prod-like" stage env for troubleshooting
Vendor provided "fix" is
really just a workaround
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
PD
TS
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
TS
H
M
PD
TS
H
M
PD
TS
No controlled "prod-like" stage env for troubleshooting
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
1. Service performance monitoring,
error detection, and automated
health checks starting Dev
6. Platform App Dev leading new
deployment strategy to improve
traceability and ease rollback
2. Improved vendor SLA
management
7. Improved / Streamlined
Communication in War Room / Bridge
3. Improved code
review
5. "Prod-Like" Pre-Prod
environments (with load
testing)
8. Follow through on
resolution
Also In-flight:
• Peer review for every check-in
• Automation tooling and SDLC for
DB changes (no more adhoc scripts)
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF INCIDENT:
$1,012,000 ++
every 4 hours
Request
ProblemSubside
ProblemSpik
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubside
ProblemSpik
~2pm ~5am
1/7/16
"Scribes"
Notes
.doc Chat
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent
(assum
• Upgrade t
(~Q3 2016
• Rewrite D
to remove
procedure
s with any
"). "Fight
place!
s with any
"). "Fight
place!
3. Improved code
review
Also In-flight:
• Peer review for every chec
• Automation tooling and SD
DB changes (no more adho
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF IN
$1,012,000 ++
ProblemSubside
ProblemSpik
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubside
ProblemSpik
~2pm ~5am
1/7/16
"Scribes"
Notes
.doc Chat
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent
(assum
• Upgrade t
(~Q3 2016
• Rewrite D
to remove
procedure
s with any
"). "Fight
place!
s with any
"). "Fight
place!
3. Improved code
review
Also In-flight:
• Peer review for every chec
• Automation tooling and SD
DB changes (no more adho
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF IN
$1,012,000 ++
"MyAccount Perf Incident"
Customer
Call Contact
Center
!!!
Contact
Center
Call Service
Desk
Service
Desk
History:
12/20/15 - Started seeing problems
12/29/15 - Swarm of problems then everyone on vacation
9am Monday 1/4/16 - everyone back to work and problems spike
Try to
recreate
errors
Contact
Service Desk
Try to
recreate
errors
Check
Monitoring
BMC
"All green"
(up/down
monitoring)
TixTixTix
"stuff isn't
working!"
Many tickets
Many calls
Service
Now
Tickets
Start
Bridge
Callcall
someone
call
someone
else
Call
Vendor
Support
Start Biz
Management
Bridge
Assemble
War Room
"Terri"
Dir.
Account
Reps
Incident
Commander
NOC
2 hours
1/5/16
9 AM 11 AM
Web Dev
Fix
Test
Services
Fix
Test
Portal Ops
Fix
Test
DBA Ops
Fix
Test
Network Fix Test
Vendor
Partners
Fix
Test
Investigate
DDOS
(Security)
Test
Incident
Commander
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/7/16
Send Biz
Communication
Emails
Update BRM
Bridge Call
Update Update Update Update
"Scribes"
Notes
.doc Chat
Repeat
War
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Repeat
War
Room
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
UpdateUpdate Update
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent Fix Options
(assumptions):
• Upgrade to latest version
(~Q3 2016 go live?)
• Rewrite DB layer of App
to remove stored
procedures
PD
No user experience monitoring in place
No standard application performance tools
available early and often
Limited database diagnostic tools
TS
Lots of teams spending
many cycles to "prove
it wasn't them"
TS
Too much noise on bridge.
Engineers can't focus.
Open up side bridges
because of difficulty of
noise / task switching on
main bridge
Vendor reps joining calls
aren't correct people
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
H
M
PD
TS
H
M
PD
TS
Can't trace user transactions through the stack
No controlled "prod-like" stage env for troubleshooting
Vendor provided "fix" is
really just a workaround
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
PD
TS
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
TS
H
M
PD
TS
H
M
PD
TS
No controlled "prod-like" stage env for troubleshooting
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
1. Service performance monitoring,
error detection, and automated
health checks starting Dev
6. Platform App Dev leading new
deployment strategy to improve
traceability and ease rollback
2. Improved vendor SLA
management
7. Improved / Streamlined
Communication in War Room / Bridge
3. Improved code
review
5. "Prod-Like" Pre-Prod
environments (with load
testing)
8. Follow through on
resolution
Also In-flight:
• Peer review for every check-in
• Automation tooling and SDLC for
DB changes (no more adhoc scripts)
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF INCIDENT:
$1,012,000 ++
every 4 hours
Request
"MyAccount Perf Incident"
Customer
Call Contact
Center
!!!
Contact
Center
Call Service
Desk
Service
Desk
History:
12/20/15 - Started seeing problems
12/29/15 - Swarm of problems then everyone on vacation
9am Monday 1/4/16 - everyone back to work and problems spike
Try to
recreate
errors
Contact
Service Desk
Try to
recreate
errors
Check
Monitoring
BMC
"All green"
(up/down
monitoring)
TixTixTix
"stuff isn't
working!"
Many tickets
Many calls
Service
Now
Tickets
Start
Bridge
Callcall
someone
call
someone
else
Call
Vendor
Support
Start Biz
Management
Bridge
Assemble
War Room
"Terri"
Dir.
Account
Reps
Incident
Commander
NOC
2 hours
1/5/16
9 AM 11 AM
Web Dev
Fix
Test
Services
Fix
Test
Portal Ops
Fix
Test
DBA Ops
Fix
Test
Network Fix Test
Vendor
Partners
Fix
Test
Investigate
DDOS
(Security)
Test
Incident
Commander
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/7/16
Send Biz
Communication
Emails
Update BRM
Bridge Call
Update Update Update Update
"Scribes"
Notes
.doc Chat
Repeat
War
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Repeat
War
Room
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
UpdateUpdate Update
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent Fix Options
(assumptions):
• Upgrade to latest version
(~Q3 2016 go live?)
• Rewrite DB layer of App
to remove stored
procedures
PD
No user experience monitoring in place
No standard application performance tools
available early and often
Limited database diagnostic tools
TS
Lots of teams spending
many cycles to "prove
it wasn't them"
TS
Too much noise on bridge.
Engineers can't focus.
Open up side bridges
because of difficulty of
noise / task switching on
main bridge
Vendor reps joining calls
aren't correct people
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
H
M
PD
TS
H
M
PD
TS
Can't trace user transactions through the stack
No controlled "prod-like" stage env for troubleshooting
Vendor provided "fix" is
really just a workaround
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
PD
TS
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
TS
H
M
PD
TS
H
M
PD
TS
No controlled "prod-like" stage env for troubleshooting
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
1. Service performance monitoring,
error detection, and automated
health checks starting Dev
6. Platform App Dev leading new
deployment strategy to improve
traceability and ease rollback
2. Improved vendor SLA
management
7. Improved / Streamlined
Communication in War Room / Bridge
3. Improved code
review
5. "Prod-Like" Pre-Prod
environments (with load
testing)
8. Follow through on
resolution
Also In-flight:
• Peer review for every check-in
• Automation tooling and SDLC for
DB changes (no more adhoc scripts)
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF INCIDENT:
$1,012,000 ++
every 4 hours
Request
+ Automated service verification
and performance health checks
starting in Dev
+Ops added to code peer reviews
+Streamlined war rom and bridge
call communication
"MyAccount Perf Incident"
Customer
Call Contact
Center
!!!
Contact
Center
Call Service
Desk
Service
Desk
History:
12/20/15 - Started seeing problems
12/29/15 - Swarm of problems then everyone on vacation
9am Monday 1/4/16 - everyone back to work and problems spike
Try to
recreate
errors
Contact
Service Desk
Try to
recreate
errors
Check
Monitoring
BMC
"All green"
(up/down
monitoring)
TixTixTix
"stuff isn't
working!"
Many tickets
Many calls
Service
Now
Tickets
Start
Bridge
Callcall
someone
call
someone
else
Call
Vendor
Support
Start Biz
Management
Bridge
Assemble
War Room
"Terri"
Dir.
Account
Reps
Incident
Commander
NOC
2 hours
1/5/16
9 AM 11 AM
Web Dev
Fix
Test
Services
Fix
Test
Portal Ops
Fix
Test
DBA Ops
Fix
Test
Network Fix Test
Vendor
Partners
Fix
Test
Investigate
DDOS
(Security)
Test
Incident
Commander
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/6/16
Repeat
War
Room
Cycles
ProblemSubsidesforDay
ProblemSpikes!!
~2pm ~5am
1/7/16
Send Biz
Communication
Emails
Update BRM
Bridge Call
Update Update Update Update
"Scribes"
Notes
.doc Chat
Repeat
War
Room
Cycles
"Focused on
Stored
Procedures
but not
confident
why"
Repeat
War
Room
Cycles
"Vendor
consultant
provides
config
changes"
"Stored
Procedure s
used in way
never seen
before!"
• Vendor Consultant onsite
for 3 days
• QA tried to simulate load
• Tried adding capacity (3
app servers and web)
• Tried disabling
M-F
Every Day!
1/13/16
UpdateUpdate Update
QA Run Load
Tests to
Confirm Fix
(After Hours)
Prod
1/13/16-1/15/16
Additional
Fix from
Vendor
Consultant
Permanent Fix Options
(assumptions):
• Upgrade to latest version
(~Q3 2016 go live?)
• Rewrite DB layer of App
to remove stored
procedures
PD
No user experience monitoring in place
No standard application performance tools
available early and often
Limited database diagnostic tools
TS
Lots of teams spending
many cycles to "prove
it wasn't them"
TS
Too much noise on bridge.
Engineers can't focus.
Open up side bridges
because of difficulty of
noise / task switching on
main bridge
Vendor reps joining calls
aren't correct people
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
H
M
PD
TS
H
M
PD
TS
Can't trace user transactions through the stack
No controlled "prod-like" stage env for troubleshooting
Vendor provided "fix" is
really just a workaround
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
PD
TS
Inability to rollback recent changes with any
confidence ("probably impossible?"). "Fight
forward" only.
No service owner to manage biz
communication and determine if resolved
HBM PD
TS
Experimenting
directly in production
Documentation not
up to date
Lots of people coming in and out.
H
M
PD
TS
TS
H
M
PD
TS
H
M
PD
TS
No controlled "prod-like" stage env for troubleshooting
Resolution loop was
never really closed.
"Bandaid fix" is still in
place!
1. Service performance monitoring,
error detection, and automated
health checks starting Dev
6. Platform App Dev leading new
deployment strategy to improve
traceability and ease rollback
2. Improved vendor SLA
management
7. Improved / Streamlined
Communication in War Room / Bridge
3. Improved code
review
5. "Prod-Like" Pre-Prod
environments (with load
testing)
8. Follow through on
resolution
Also In-flight:
• Peer review for every check-in
• Automation tooling and SDLC for
DB changes (no more adhoc scripts)
War Room and Bridge Cost (direct labor):
M-F: 35 h/c per day for 7 days = $195,000
S,S: 6 h/c per day for 2 days = $11,400
Total: $206,400
Impact on Call Centers:
1500 Agents 30% degraded 8 hrs for
7 days = $806,000
Cost of impact/delay on other inflight projects:
Unknown ("feels large", estimated at 20% - 100%)
Cost of impact/delay on compliance issues:
Unknown
Cost of brand damage:
Unknown
TOTAL COST OF INCIDENT:
$1,012,000 ++
every 4 hours
Request
+ Automated service verification
and performance health checks
starting in Dev
+Ops added to code peer reviews
+Streamlined war rom and bridge
call communication
+ “Prod-like” testing environments
(with load testing)
+Automation and SDLC for
database changes
+Follow through on resolution to
achieve permanent fixes
Improvement Storyboards
Process Name Challenge/Key Pain
Target
Condition
Current
Condition
Improvement
Metrics
Work ToDo
(Baby Steps)
Blockers
Changes are being introduced
/ tested in production the first time
causing delays, rework, outages
Target Condition
Improvement Metrics
Work ToDo (Baby Steps)
Blockers
GTM/LTM (Traffic manager
configuration process)
• Apps are not developed in
production-like environments (not
testing F5 behavior)
• Ops teams cannot practice or learn
• App teams have no visibility into
constraints
• No remediation capabilities for app
support teams
• No repeatable pattern for GM health
activity
• 80% S/R with 2-3 rework cycles
• 50% cause outages
Current Condition
• GM/TLM functionality across all
SDLC environments (capex request
needed)
• change window reduction for non-
prod environments (turn those
around instantaneously less than 13
days)
• Provide read-only to all F5 consoles
• Standardize GM pattern
• Lead Time (post-dev to prod)
• Scrap Rate
• Acquire the F5 hardware or
software to support envs
throughout SDLC
• Make these changes L3 or 5
change requests
• Write automation scripts
• provide read only access to all
environments… can include API
access to facilitate automation
script writing
• Create design template with
customer pattern
◦ Financial approval (Jennifer)
◦ Segregation between environments
(Mark)
◦ Non-standard request types (Susan)
◦ Two network teams with different
rules (Mark)
Process Challenge/Key Pain
Template Example
Improvement Storyboards
Process Name Challenge/Key Pain
Target
Condition
Current
Condition
Improvement
Metrics
Work ToDo
(Baby Steps)
Blockers
Changes are being introduced
/ tested in production the first time
causing delays, rework, outages
Target Condition
Improvement Metrics
Work ToDo (Baby Steps)
Blockers
GTM/LTM (Traffic manager
configuration process)
• Apps are not developed in
production-like environments (not
testing F5 behavior)
• Ops teams cannot practice or learn
• App teams have no visibility into
constraints
• No remediation capabilities for app
support teams
• No repeatable pattern for GM health
activity
• 80% S/R with 2-3 rework cycles
• 50% cause outages
Current Condition
• GM/TLM functionality across all
SDLC environments (capex request
needed)
• change window reduction for non-
prod environments (turn those
around instantaneously less than 13
days)
• Provide read-only to all F5 consoles
• Standardize GM pattern
• Lead Time (post-dev to prod)
• Scrap Rate
• Acquire the F5 hardware or
software to support envs
throughout SDLC
• Make these changes L3 or 5
change requests
• Write automation scripts
• provide read only access to all
environments… can include API
access to facilitate automation
script writing
• Create design template with
customer pattern
◦ Financial approval (Jennifer)
◦ Segregation between environments
(Mark)
◦ Non-standard request types (Susan)
◦ Two network teams with different
rules (Mark)
Process Challenge/Key Pain
Inspiration: A3 management process
Template Example
Using Storyboards: Part “Sales”, Part Coaching
Coaching
(Not Solutions)
Facts and
Next Steps
Learner
Coach / Leader
What is the target condition?
What is the actual condition now?
What obstacles do you think are stopping
you from reaching target condition?
What is your next step?
When will we know what was learned from
the next step?
Asks
Questions
Maintain
Storyboard
Answer / Explain
Process Name Challenge/Key Pain
Target
Condition
Current
Condition
Improvement
Metrics
Work ToDo
(Baby Steps)
Blockers
Using Storyboards: Part “Sales”, Part Coaching
Coaching
(Not Solutions)
Facts and
Next Steps
Learner
Coach / Leader
What is the target condition?
What is the actual condition now?
What obstacles do you think are stopping
you from reaching target condition?
What is your next step?
When will we know what was learned from
the next step?
Asks
Questions
Maintain
Storyboard
Answer / Explain
Process Name Challenge/Key Pain
Target
Condition
Current
Condition
Improvement
Metrics
Work ToDo
(Baby Steps)
Blockers
Inspiration: Toyota Kata
Service
Delivery
Metrics
Kaizen
Program
Oversight
Planning
&
Retrospectives
Informs Informs
Countermeasures &
Blockers
Elements of a DevOps Kaizen Program
1. The will to make change happen
2. The resources to make change happen
3. Drive follow-through / clear obstacles
Kaizen Program Oversight
1. The will to make change happen
2. The resources to make change happen
3. Drive follow-through / clear obstacles
Kaizen Program Oversight
This (and only this) is what the Kaizen
Program Oversight Group does!
1. The will to make change happen
2. The resources to make change happen
3. Drive follow-through / clear obstacles
Kaizen Program Oversight
1. The will to make change happen
2. The resources to make change happen
3. Drive follow-through / clear obstacles
Kaizen Program Oversight
Inspire Executives with:
Service
Delivery
Metrics
Kaizen
Program
Oversight
Planning
&
Retrospectives
Informs Informs
Countermeasures &
Blockers
Elements of a DevOps Kaizen Program
DevOps Kaizen Program is an overlay for any delivery methodology
Project 1 Project 2 Project 3
Sprint 1
Improvement Program Oversight (Strategy)
Team
A
Full Retrospective
& Planning
Sprint 2 Sprint 3 Sprint 4 Sprint 5 Sprint 6 Sprint 7 Sprint 8
Team
B
Refresh
Retrospective
DevOps Kaizen: Let’s Recap!
Make the work visible Focus on Continuous Improvement
Service
Delivery
Metrics
Kaizen
Program
Oversight
Planning
&
Retrospectives
Informs Informs
Countermeasures &
Blockers
Establish program elements
Project 1 Project 2 Project 3
Sprint 1
Improvement Program Oversight (Strategy)
Team
A
Full Retrospective
& Planning
Sprint 2 Sprint 3 Sprint 4 Sprint 5 Sprint 6 Sprint 7 Sprint 8
Team
B
Refresh
Retrospective
Build into your operating model
Join me tomorrow! 10:15 in Victoria Suite
Helping Ops Help You: Development’s Role in Enabling Self-Service Operations
@damonedwards
Damon Edwards
damon@rundeck.com

Weitere ähnliche Inhalte

Was ist angesagt?

DevOps unraveled - Nyenrode masterclass on Agile Management
DevOps unraveled - Nyenrode masterclass on Agile ManagementDevOps unraveled - Nyenrode masterclass on Agile Management
DevOps unraveled - Nyenrode masterclass on Agile ManagementInspectie van het Onderwijs
 
Introducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanIntroducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanDimitri Ponomareff
 
Driving Change with Data: Getting Started with Continuous Improvement
Driving Change with Data: Getting Started with Continuous ImprovementDriving Change with Data: Getting Started with Continuous Improvement
Driving Change with Data: Getting Started with Continuous ImprovementLeanKit
 
Salesforce Agile Transformation - Agile 2007 Conference
Salesforce Agile Transformation - Agile 2007 ConferenceSalesforce Agile Transformation - Agile 2007 Conference
Salesforce Agile Transformation - Agile 2007 ConferenceSteve Greene
 
cPrime Agile Enterprise Transformation
cPrime Agile Enterprise TransformationcPrime Agile Enterprise Transformation
cPrime Agile Enterprise TransformationCprime
 
Understanding the Relationship Between Agile, Lean and DevOps
Understanding the Relationship Between Agile, Lean and DevOps Understanding the Relationship Between Agile, Lean and DevOps
Understanding the Relationship Between Agile, Lean and DevOps LeanKit
 
DevOps Approach (Point of View by Ravi Tadwalkar)
DevOps Approach (Point of View by Ravi Tadwalkar)DevOps Approach (Point of View by Ravi Tadwalkar)
DevOps Approach (Point of View by Ravi Tadwalkar)Ravi Tadwalkar
 
Agile 2013 - Lean Change for Enabling Agile Transformations
Agile 2013 - Lean Change for Enabling Agile TransformationsAgile 2013 - Lean Change for Enabling Agile Transformations
Agile 2013 - Lean Change for Enabling Agile TransformationsAlexis Hui
 
Agile Transformations that Stick
Agile Transformations that StickAgile Transformations that Stick
Agile Transformations that StickNicola Dourambeis
 
Intro to Lean Software Development
Intro to Lean Software DevelopmentIntro to Lean Software Development
Intro to Lean Software Developmentgcaprio
 
Lean and Kanban: An Alternative Path to Agility -Gartner PPM Summit 2014
Lean and Kanban: An Alternative Path to Agility -Gartner PPM Summit 2014Lean and Kanban: An Alternative Path to Agility -Gartner PPM Summit 2014
Lean and Kanban: An Alternative Path to Agility -Gartner PPM Summit 2014LeanKit
 
Does this FizzGood? Improve velocity, predictability & agility by asking a si...
Does this FizzGood? Improve velocity, predictability & agility by asking a si...Does this FizzGood? Improve velocity, predictability & agility by asking a si...
Does this FizzGood? Improve velocity, predictability & agility by asking a si...Jon Terry
 
Beginning the Kanban journey at an Enterprise IT - Case study - Pelephone
Beginning the Kanban journey at an Enterprise IT - Case study - Pelephone Beginning the Kanban journey at an Enterprise IT - Case study - Pelephone
Beginning the Kanban journey at an Enterprise IT - Case study - Pelephone AgileSparks
 
Does this Fizz Good?
Does this Fizz Good?Does this Fizz Good?
Does this Fizz Good?LeanKit
 
Introduction to scrum at scale
Introduction to scrum at scaleIntroduction to scrum at scale
Introduction to scrum at scaleMahmoud Ghoz
 
Scrum of Scrums Patterns Library
Scrum of Scrums Patterns LibraryScrum of Scrums Patterns Library
Scrum of Scrums Patterns LibraryDavid Hanson
 
DevOps Primer : Presented by Uday Kumar
DevOps Primer : Presented by Uday KumarDevOps Primer : Presented by Uday Kumar
DevOps Primer : Presented by Uday KumaroGuild .
 
How to Adopt Agile at Your Organization
How to Adopt Agile at Your OrganizationHow to Adopt Agile at Your Organization
How to Adopt Agile at Your OrganizationRaimonds Simanovskis
 
Scrum. XP. Lean. Kanban - Be Agile
Scrum. XP. Lean. Kanban - Be Agile Scrum. XP. Lean. Kanban - Be Agile
Scrum. XP. Lean. Kanban - Be Agile Andreea Visanoiu
 

Was ist angesagt? (20)

DevOps unraveled - Nyenrode masterclass on Agile Management
DevOps unraveled - Nyenrode masterclass on Agile ManagementDevOps unraveled - Nyenrode masterclass on Agile Management
DevOps unraveled - Nyenrode masterclass on Agile Management
 
Introducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and KanbanIntroducing Agile Scrum XP and Kanban
Introducing Agile Scrum XP and Kanban
 
Driving Change with Data: Getting Started with Continuous Improvement
Driving Change with Data: Getting Started with Continuous ImprovementDriving Change with Data: Getting Started with Continuous Improvement
Driving Change with Data: Getting Started with Continuous Improvement
 
Salesforce Agile Transformation - Agile 2007 Conference
Salesforce Agile Transformation - Agile 2007 ConferenceSalesforce Agile Transformation - Agile 2007 Conference
Salesforce Agile Transformation - Agile 2007 Conference
 
cPrime Agile Enterprise Transformation
cPrime Agile Enterprise TransformationcPrime Agile Enterprise Transformation
cPrime Agile Enterprise Transformation
 
Understanding the Relationship Between Agile, Lean and DevOps
Understanding the Relationship Between Agile, Lean and DevOps Understanding the Relationship Between Agile, Lean and DevOps
Understanding the Relationship Between Agile, Lean and DevOps
 
DevOps Approach (Point of View by Ravi Tadwalkar)
DevOps Approach (Point of View by Ravi Tadwalkar)DevOps Approach (Point of View by Ravi Tadwalkar)
DevOps Approach (Point of View by Ravi Tadwalkar)
 
Agile 2013 - Lean Change for Enabling Agile Transformations
Agile 2013 - Lean Change for Enabling Agile TransformationsAgile 2013 - Lean Change for Enabling Agile Transformations
Agile 2013 - Lean Change for Enabling Agile Transformations
 
Agile Transformations that Stick
Agile Transformations that StickAgile Transformations that Stick
Agile Transformations that Stick
 
Agile and DevOps revealed
Agile and DevOps revealed Agile and DevOps revealed
Agile and DevOps revealed
 
Intro to Lean Software Development
Intro to Lean Software DevelopmentIntro to Lean Software Development
Intro to Lean Software Development
 
Lean and Kanban: An Alternative Path to Agility -Gartner PPM Summit 2014
Lean and Kanban: An Alternative Path to Agility -Gartner PPM Summit 2014Lean and Kanban: An Alternative Path to Agility -Gartner PPM Summit 2014
Lean and Kanban: An Alternative Path to Agility -Gartner PPM Summit 2014
 
Does this FizzGood? Improve velocity, predictability & agility by asking a si...
Does this FizzGood? Improve velocity, predictability & agility by asking a si...Does this FizzGood? Improve velocity, predictability & agility by asking a si...
Does this FizzGood? Improve velocity, predictability & agility by asking a si...
 
Beginning the Kanban journey at an Enterprise IT - Case study - Pelephone
Beginning the Kanban journey at an Enterprise IT - Case study - Pelephone Beginning the Kanban journey at an Enterprise IT - Case study - Pelephone
Beginning the Kanban journey at an Enterprise IT - Case study - Pelephone
 
Does this Fizz Good?
Does this Fizz Good?Does this Fizz Good?
Does this Fizz Good?
 
Introduction to scrum at scale
Introduction to scrum at scaleIntroduction to scrum at scale
Introduction to scrum at scale
 
Scrum of Scrums Patterns Library
Scrum of Scrums Patterns LibraryScrum of Scrums Patterns Library
Scrum of Scrums Patterns Library
 
DevOps Primer : Presented by Uday Kumar
DevOps Primer : Presented by Uday KumarDevOps Primer : Presented by Uday Kumar
DevOps Primer : Presented by Uday Kumar
 
How to Adopt Agile at Your Organization
How to Adopt Agile at Your OrganizationHow to Adopt Agile at Your Organization
How to Adopt Agile at Your Organization
 
Scrum. XP. Lean. Kanban - Be Agile
Scrum. XP. Lean. Kanban - Be Agile Scrum. XP. Lean. Kanban - Be Agile
Scrum. XP. Lean. Kanban - Be Agile
 

Ähnlich wie DevOps Kaizen: Practical Steps to Start & Sustain an Organization’s Transformation

William MacLeod Professional Profile
William MacLeod Professional ProfileWilliam MacLeod Professional Profile
William MacLeod Professional ProfileWilliam MacLeod
 
Randy Powerpoint1v.1
Randy Powerpoint1v.1Randy Powerpoint1v.1
Randy Powerpoint1v.1Randy Wald
 
Randy Powerpoint1v.1
Randy Powerpoint1v.1Randy Powerpoint1v.1
Randy Powerpoint1v.1Randy Wald
 
The Best Laid Incentive Plans
The Best Laid Incentive PlansThe Best Laid Incentive Plans
The Best Laid Incentive PlansSandy Harwell
 
Top tools for virtual organisation
Top tools for virtual organisationTop tools for virtual organisation
Top tools for virtual organisationJerome Kenneth
 
Ip Summit Thought Leadership
Ip Summit Thought LeadershipIp Summit Thought Leadership
Ip Summit Thought Leadershipcmdagency
 
Enterprise 2.0: The new face of CRM
Enterprise 2.0: The new face of CRMEnterprise 2.0: The new face of CRM
Enterprise 2.0: The new face of CRMDipock Das
 
Patternbuilders Founder Showcase Deck
Patternbuilders Founder Showcase DeckPatternbuilders Founder Showcase Deck
Patternbuilders Founder Showcase DeckMaryLudloff
 
Dr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismDr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismUwe Friedrichsen
 
Improving Nonprofit CRM Data Management in 2019 - Build Consulting and Commun...
Improving Nonprofit CRM Data Management in 2019 - Build Consulting and Commun...Improving Nonprofit CRM Data Management in 2019 - Build Consulting and Commun...
Improving Nonprofit CRM Data Management in 2019 - Build Consulting and Commun...Community IT Innovators
 
Selecting Nonprofit Software: Technology Comes Last
Selecting Nonprofit Software: Technology Comes LastSelecting Nonprofit Software: Technology Comes Last
Selecting Nonprofit Software: Technology Comes LastCommunity IT Innovators
 
Scale at Reddit: Triple Your Team Size Without Losing Control
Scale at Reddit: Triple Your Team Size Without Losing ControlScale at Reddit: Triple Your Team Size Without Losing Control
Scale at Reddit: Triple Your Team Size Without Losing ControlAtlassian
 
Power the Connected Enterprise with Cloud Integration and Master Data Managem...
Power the Connected Enterprise with Cloud Integration and Master Data Managem...Power the Connected Enterprise with Cloud Integration and Master Data Managem...
Power the Connected Enterprise with Cloud Integration and Master Data Managem...Darren Cunningham
 
Control System Operated by Computer.pptx
Control System Operated by Computer.pptxControl System Operated by Computer.pptx
Control System Operated by Computer.pptxSonetgin
 

Ähnlich wie DevOps Kaizen: Practical Steps to Start & Sustain an Organization’s Transformation (20)

William MacLeod Professional Profile
William MacLeod Professional ProfileWilliam MacLeod Professional Profile
William MacLeod Professional Profile
 
Ce Bit Sydney May2009 V1
Ce Bit Sydney May2009 V1Ce Bit Sydney May2009 V1
Ce Bit Sydney May2009 V1
 
Sfo
SfoSfo
Sfo
 
It Assessment
It AssessmentIt Assessment
It Assessment
 
Randy Powerpoint1v.1
Randy Powerpoint1v.1Randy Powerpoint1v.1
Randy Powerpoint1v.1
 
Randy Powerpoint1v.1
Randy Powerpoint1v.1Randy Powerpoint1v.1
Randy Powerpoint1v.1
 
The Best Laid Incentive Plans
The Best Laid Incentive PlansThe Best Laid Incentive Plans
The Best Laid Incentive Plans
 
Top tools for virtual organisation
Top tools for virtual organisationTop tools for virtual organisation
Top tools for virtual organisation
 
Ip Summit Thought Leadership
Ip Summit Thought LeadershipIp Summit Thought Leadership
Ip Summit Thought Leadership
 
Enterprise 2.0: The new face of CRM
Enterprise 2.0: The new face of CRMEnterprise 2.0: The new face of CRM
Enterprise 2.0: The new face of CRM
 
Patternbuilders Founder Showcase Deck
Patternbuilders Founder Showcase DeckPatternbuilders Founder Showcase Deck
Patternbuilders Founder Showcase Deck
 
HWZ-Darden Konferenz: Strategy Execution
HWZ-Darden Konferenz: Strategy ExecutionHWZ-Darden Konferenz: Strategy Execution
HWZ-Darden Konferenz: Strategy Execution
 
Dr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismDr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinism
 
Improving Nonprofit CRM Data Management in 2019 - Build Consulting and Commun...
Improving Nonprofit CRM Data Management in 2019 - Build Consulting and Commun...Improving Nonprofit CRM Data Management in 2019 - Build Consulting and Commun...
Improving Nonprofit CRM Data Management in 2019 - Build Consulting and Commun...
 
Selecting Nonprofit Software: Technology Comes Last
Selecting Nonprofit Software: Technology Comes LastSelecting Nonprofit Software: Technology Comes Last
Selecting Nonprofit Software: Technology Comes Last
 
Scale at Reddit: Triple Your Team Size Without Losing Control
Scale at Reddit: Triple Your Team Size Without Losing ControlScale at Reddit: Triple Your Team Size Without Losing Control
Scale at Reddit: Triple Your Team Size Without Losing Control
 
Accenture
AccentureAccenture
Accenture
 
Power the Connected Enterprise with Cloud Integration and Master Data Managem...
Power the Connected Enterprise with Cloud Integration and Master Data Managem...Power the Connected Enterprise with Cloud Integration and Master Data Managem...
Power the Connected Enterprise with Cloud Integration and Master Data Managem...
 
Sightix
SightixSightix
Sightix
 
Control System Operated by Computer.pptx
Control System Operated by Computer.pptxControl System Operated by Computer.pptx
Control System Operated by Computer.pptx
 

Mehr von Rundeck

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps Rundeck
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationRundeck
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckRundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & AnsibleRundeck
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...Rundeck
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control PoliciesRundeck
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckRundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4 Rundeck
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...Rundeck
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Rundeck
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuRundeck
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response Rundeck
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Rundeck
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Rundeck
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck OverviewRundeck
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationRundeck
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings Rundeck
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Rundeck
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
 

Mehr von Rundeck (20)

Rundeck Community Office Hours: Using Variables with Job Steps
Rundeck Community Office Hours:  Using Variables with Job Steps Rundeck Community Office Hours:  Using Variables with Job Steps
Rundeck Community Office Hours: Using Variables with Job Steps
 
Introducing PagerDuty Process Automation
Introducing PagerDuty Process AutomationIntroducing PagerDuty Process Automation
Introducing PagerDuty Process Automation
 
How to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in RundeckHow to Build a Custom Plugin in Rundeck
How to Build a Custom Plugin in Rundeck
 
Lunch and learn: Getting started with Rundeck & Ansible
Lunch and learn:  Getting started with Rundeck & AnsibleLunch and learn:  Getting started with Rundeck & Ansible
Lunch and learn: Getting started with Rundeck & Ansible
 
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...Self Service Cloud Operations:  Safely Delegate the Management of your Cloud ...
Self Service Cloud Operations: Safely Delegate the Management of your Cloud ...
 
Rundeck Office Hours: Best Practices Access Control Policies
Rundeck Office Hours:  Best Practices Access Control PoliciesRundeck Office Hours:  Best Practices Access Control Policies
Rundeck Office Hours: Best Practices Access Control Policies
 
Mastering Secrets Management in Rundeck
Mastering Secrets Management in RundeckMastering Secrets Management in Rundeck
Mastering Secrets Management in Rundeck
 
What's New in Rundeck 3.4
What's New in Rundeck 3.4   What's New in Rundeck 3.4
What's New in Rundeck 3.4
 
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...Automate Yourself Out of a Job:  Safely Delegate the Management of your Azure...
Automate Yourself Out of a Job: Safely Delegate the Management of your Azure...
 
Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation Super-Charge Your Site Reliability Practices with Runbook Automation
Super-Charge Your Site Reliability Practices with Runbook Automation
 
Introduction to Rundeck
Introduction to Rundeck Introduction to Rundeck
Introduction to Rundeck
 
Automated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + SensuAutomated Remediation with Rundeck + Sensu
Automated Remediation with Rundeck + Sensu
 
Modernizing Incident Response
Modernizing Incident Response Modernizing Incident Response
Modernizing Incident Response
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
 
Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020Datadog + Rundeck at DASH 2020
Datadog + Rundeck at DASH 2020
 
Rundeck Overview
Rundeck OverviewRundeck Overview
Rundeck Overview
 
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital TransformationEmpower Devs, Simplify Ops, and Accelerate your Digital Transformation
Empower Devs, Simplify Ops, and Accelerate your Digital Transformation
 
Advanced Cluster Settings
Advanced Cluster Settings Advanced Cluster Settings
Advanced Cluster Settings
 
Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration Maximizing Your Rundeck Migration
Maximizing Your Rundeck Migration
 
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...
 

Kürzlich hochgeladen

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

DevOps Kaizen: Practical Steps to Start & Sustain an Organization’s Transformation

  • 1. DevOps Kaizen: Practical Steps to Start & Sustain an Organization’s Transformation Damon Edwards @damonedwards
  • 4. High Performing CompaniesPractices & Behaviors Gene Kim … but WHY are they different?
  • 5. The ability to improve.
  • 6. The unique trait of high-performing companies is that they are good at getting better.
  • 7. Improvement already has a well known recipe: Plan - Do - Study - Act (PDSA) Other variants: PDCA OODA W. Edwards Deming - 1950 © The Deming Institute
  • 8. Why are so many organizations unable to improve?
  • 9. 1. The work isn’t visible Why are so many organizations unable to improve?
  • 10. 1. The work isn’t visible 2. People are working out of context Why are so many organizations unable to improve?
  • 11. 1. The work isn’t visible 2. People are working out of context 3. Inertia is pulling your org out of alignment Why are so many organizations unable to improve?
  • 12. ThomasJefferson CEO DwightEisenhower BusinessOps FrankRoosevelt TechOps SusanAnthony SoftwareDev. RonaldReagan Sales GeorgeBush MarCom AbrahamLincoln CFO BarackObama CIO BarbaraBush Finance LyndonJohnson BizDev CalvinCoolridge Operations BillClinton USSales JackKennedy EuropeanSales JimmyCarter AsiaPacSales GeraldFord Product MichelleObama Marketing HarryTruman HR HowardTaft CoreSystems WarrenHarding WebSystems ChesterArthur Arch&DBA TedRoosevelt Infrastructure BenHarrison Networks WilMcKinley SRE Thomas Jefferson CEO Dwight Eisenhower Business Ops Frank Roosevelt TechOps Susan Anthony Software Dev. Ronald Reagan Sales George Bush MarCom Abraham Lincoln CFO Barack Obama CIO Barbara Bush Finance Lyndon Johnson BizDev Calvin Coolridge Operations Bill Clinton US Sales Jack Kennedy European Sales Jimmy Carter AsiaPac Sales Gerald Ford Product Michelle Obama Marketing Harry Truman HR Howard Taft Core Systems Warren Harding Web Systems Chester Arthur Arch & DBA Ted Roosevelt Infrastructure Ben Harrison Networks Wil McKinley SRE Thomas Jefferson CEO Dwight Eisenhower Business Ops Frank Roosevelt TechOps Susan Anthony Software Dev. Ronald Reagan Sales George Bush MarCom Abraham Lincoln CFO Barack Obama CIO Barbara Bush Finance Lyndon Johnson BizDev Calvin Coolridge Operations Bill Clinton US Sales Jack Kennedy European Sales Jimmy Carter AsiaPac Sales Gerald Ford Product Michelle Obama Marketing Harry Truman HR Howard Taft Core Systems Warren Harding Web Systems Chester Arthur Arch & DBA Ted Roosevelt Infrastructure Ben Harrison Networks Wil McKinley SRE Org Charts Reference Architecture Strategy & Budget Top Secret Strategy & Budget Top Secret Documented Processes Project Plans Traditional “visibility” Release Trains
  • 13. ThomasJefferson CEO DwightEisenhower BusinessOps FrankRoosevelt TechOps SusanAnthony SoftwareDev. RonaldReagan Sales GeorgeBush MarCom AbrahamLincoln CFO BarackObama CIO BarbaraBush Finance LyndonJohnson BizDev CalvinCoolridge Operations BillClinton USSales JackKennedy EuropeanSales JimmyCarter AsiaPacSales GeraldFord Product MichelleObama Marketing HarryTruman HR HowardTaft CoreSystems WarrenHarding WebSystems ChesterArthur Arch&DBA TedRoosevelt Infrastructure BenHarrison Networks WilMcKinley SRE Thomas Jefferson CEO Dwight Eisenhower Business Ops Frank Roosevelt TechOps Susan Anthony Software Dev. Ronald Reagan Sales George Bush MarCom Abraham Lincoln CFO Barack Obama CIO Barbara Bush Finance Lyndon Johnson BizDev Calvin Coolridge Operations Bill Clinton US Sales Jack Kennedy European Sales Jimmy Carter AsiaPac Sales Gerald Ford Product Michelle Obama Marketing Harry Truman HR Howard Taft Core Systems Warren Harding Web Systems Chester Arthur Arch & DBA Ted Roosevelt Infrastructure Ben Harrison Networks Wil McKinley SRE Thomas Jefferson CEO Dwight Eisenhower Business Ops Frank Roosevelt TechOps Susan Anthony Software Dev. Ronald Reagan Sales George Bush MarCom Abraham Lincoln CFO Barack Obama CIO Barbara Bush Finance Lyndon Johnson BizDev Calvin Coolridge Operations Bill Clinton US Sales Jack Kennedy European Sales Jimmy Carter AsiaPac Sales Gerald Ford Product Michelle Obama Marketing Harry Truman HR Howard Taft Core Systems Warren Harding Web Systems Chester Arthur Arch & DBA Ted Roosevelt Infrastructure Ben Harrison Networks Wil McKinley SRE Org Charts Reference Architecture Strategy & Budget Top Secret Strategy & Budget Top Secret Documented Processes Project Plans Traditional “visibility” Release Trains Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings
  • 14. ThomasJefferson CEO DwightEisenhower BusinessOps FrankRoosevelt TechOps SusanAnthony SoftwareDev. RonaldReagan Sales GeorgeBush MarCom AbrahamLincoln CFO BarackObama CIO BarbaraBush Finance LyndonJohnson BizDev CalvinCoolridge Operations BillClinton USSales JackKennedy EuropeanSales JimmyCarter AsiaPacSales GeraldFord Product MichelleObama Marketing HarryTruman HR HowardTaft CoreSystems WarrenHarding WebSystems ChesterArthur Arch&DBA TedRoosevelt Infrastructure BenHarrison Networks WilMcKinley SRE Thomas Jefferson CEO Dwight Eisenhower Business Ops Frank Roosevelt TechOps Susan Anthony Software Dev. Ronald Reagan Sales George Bush MarCom Abraham Lincoln CFO Barack Obama CIO Barbara Bush Finance Lyndon Johnson BizDev Calvin Coolridge Operations Bill Clinton US Sales Jack Kennedy European Sales Jimmy Carter AsiaPac Sales Gerald Ford Product Michelle Obama Marketing Harry Truman HR Howard Taft Core Systems Warren Harding Web Systems Chester Arthur Arch & DBA Ted Roosevelt Infrastructure Ben Harrison Networks Wil McKinley SRE Thomas Jefferson CEO Dwight Eisenhower Business Ops Frank Roosevelt TechOps Susan Anthony Software Dev. Ronald Reagan Sales George Bush MarCom Abraham Lincoln CFO Barack Obama CIO Barbara Bush Finance Lyndon Johnson BizDev Calvin Coolridge Operations Bill Clinton US Sales Jack Kennedy European Sales Jimmy Carter AsiaPac Sales Gerald Ford Product Michelle Obama Marketing Harry Truman HR Howard Taft Core Systems Warren Harding Web Systems Chester Arthur Arch & DBA Ted Roosevelt Infrastructure Ben Harrison Networks Wil McKinley SRE Org Charts Reference Architecture Strategy & Budget Top Secret Strategy & Budget Top Secret Documented Processes Project Plans Traditional “visibility” Release Trains Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings MeetingsMeetings
  • 15. ThomasJefferson CEO DwightEisenhower BusinessOps FrankRoosevelt TechOps SusanAnthony SoftwareDev. RonaldReagan Sales GeorgeBush MarCom AbrahamLincoln CFO BarackObama CIO BarbaraBush Finance LyndonJohnson BizDev CalvinCoolridge Operations BillClinton USSales JackKennedy EuropeanSales JimmyCarter AsiaPacSales GeraldFord Product MichelleObama Marketing HarryTruman HR HowardTaft CoreSystems WarrenHarding WebSystems ChesterArthur Arch&DBA TedRoosevelt Infrastructure BenHarrison Networks WilMcKinley SRE Thomas Jefferson CEO Dwight Eisenhower Business Ops Frank Roosevelt TechOps Susan Anthony Software Dev. Ronald Reagan Sales George Bush MarCom Abraham Lincoln CFO Barack Obama CIO Barbara Bush Finance Lyndon Johnson BizDev Calvin Coolridge Operations Bill Clinton US Sales Jack Kennedy European Sales Jimmy Carter AsiaPac Sales Gerald Ford Product Michelle Obama Marketing Harry Truman HR Howard Taft Core Systems Warren Harding Web Systems Chester Arthur Arch & DBA Ted Roosevelt Infrastructure Ben Harrison Networks Wil McKinley SRE Thomas Jefferson CEO Dwight Eisenhower Business Ops Frank Roosevelt TechOps Susan Anthony Software Dev. Ronald Reagan Sales George Bush MarCom Abraham Lincoln CFO Barack Obama CIO Barbara Bush Finance Lyndon Johnson BizDev Calvin Coolridge Operations Bill Clinton US Sales Jack Kennedy European Sales Jimmy Carter AsiaPac Sales Gerald Ford Product Michelle Obama Marketing Harry Truman HR Howard Taft Core Systems Warren Harding Web Systems Chester Arthur Arch & DBA Ted Roosevelt Infrastructure Ben Harrison Networks Wil McKinley SRE Org Charts Reference Architecture Strategy & Budget Top Secret Strategy & Budget Top Secret Documented Processes Project Plans Traditional “visibility” Release Trains Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings Meetings MeetingsMeetings The Illusion of Control
  • 16. It’s a complex system2
  • 19. Enterprises naturally trend towards silos Dev / Test Release OperatePlanning
  • 20. Enterprises naturally trend towards silos Dev / Test Release OperatePlanning
  • 21. Enterprises naturally trend towards silos Dev / Test Release OperatePlanning Handoff ! Handoff ! Handoff !
  • 22. Enterprises naturally trend towards silos Dev / Test Release OperatePlanning Application Knowledge Handoff ! Handoff ! Handoff !
  • 23. Enterprises naturally trend towards silos Dev / Test Release OperatePlanning Application Knowledge Operational Knowledge Handoff ! Handoff ! Handoff !
  • 24. Enterprises naturally trend towards silos Dev / Test Release OperatePlanning Application Knowledge Operational Knowledge Business Intent Handoff ! Handoff ! Handoff !
  • 25. Enterprises naturally trend towards silos Dev / Test Release OperatePlanning Application Knowledge Operational Knowledge Business Intent Handoff ! Handoff ! Handoff ! Ownership but limited Accountability
  • 26. Enterprises naturally trend towards silos Dev / Test Release OperatePlanning Application Knowledge Operational Knowledge Business Intent Handoff ! Handoff ! Handoff ! Ownership but limited Accountability Accountability but no Ownership
  • 27. 1. The work isn’t visible 2. People are working out of context 3. Inertia is pulling your org out of alignment ++ Why are so many organizations unable to improve?
  • 28. The only way to fix a sufficiently complex system is to create the conditions for the system to fix itself.
  • 29.
  • 30. “I know the answer!…”
  • 32. Too costly… outsource more! Finance We need results… re-org until we do! Executive Committee “I know the answer!…”
  • 33. Too costly… outsource more! Finance More discipline… tighter process and more approvals Change Management We need results… re-org until we do! Executive Committee “I know the answer!…”
  • 34. Too costly… outsource more! Finance More discipline… tighter process and more approvals Change Management We need results… re-org until we do! Executive Committee Need better tools… new automation and a new network!! Engineers “I know the answer!…”
  • 35. The “Big Bang” Transformation Dream Start Finish
  • 36. The “Big Bang” Transformation Reality Start Goal Fear Panic Abort Maybe
  • 37. The “Big Bang” Transformation Reality Start Goal Fear Panic Abort Maybe People revert to legacy behaviors
  • 38. The “Big Bang” Transformation Reality Start Goal Fear Panic Abort Maybe People revert to legacy behaviors Declare “Done”
  • 39. The “Big Bang” Transformation Reality Start Goal Fear Panic Abort Maybe People revert to legacy behaviors Declare “Done”
  • 40. More discipline… tighter process and more approvals Need Results… Re-Org! Need better tools… cool automation and a new network!! Too costly… outsource more! Finance Change Management Executive Committee Engineers
  • 41. More discipline… tighter process and more approvals Need Results… Re-Org! Need better tools… cool automation and a new network!! Too costly… outsource more! Finance Change Management Executive Committee Engineers
  • 42. “Little J’s” instead of “Big J” Start Finish Start Finish “Big Bang” Continuous Improvement Fear Panic Abort Maybe
  • 43. “Little J’s” instead of “Big J” Start Finish Start Finish “Big Bang” Continuous Improvement Fear Panic Abort Maybe
  • 44. “Little J’s” instead of “Big J” Start Finish Start Finish “Big Bang” Continuous Improvement Fear Panic Abort Maybe
  • 45. Turn Continuous Improvement into an enterprise program You are going to have to…
  • 46. Turn Continuous Improvement into an enterprise program •Keep improvement efforts aligned You are going to have to…
  • 47. Turn Continuous Improvement into an enterprise program •Keep improvement efforts aligned •Scale quickly You are going to have to…
  • 48. Turn Continuous Improvement into an enterprise program •Keep improvement efforts aligned •Scale quickly •Span multiple organizational boundaries You are going to have to…
  • 49. Turn Continuous Improvement into an enterprise program •Keep improvement efforts aligned •Scale quickly •Span multiple organizational boundaries •Work with substantial numbers of legacy technologies You are going to have to…
  • 50. Turn Continuous Improvement into an enterprise program •Keep improvement efforts aligned •Scale quickly •Span multiple organizational boundaries •Work with substantial numbers of legacy technologies •Develop your existing staff in mass You are going to have to…
  • 51. Turn Continuous Improvement into an enterprise program •Keep improvement efforts aligned •Scale quickly •Span multiple organizational boundaries •Work with substantial numbers of legacy technologies •Develop your existing staff in mass •Be self-funding after initial seed investment You are going to have to…
  • 52. 1. The work isn’t visible 2. People are working out of context 3. Inertia is pulling your org out of alignment But how do you do that when…
  • 53. You need a systemic way to teach an organization to find and fix what is getting in its own way.
  • 55. “Kaizen” • Kaizen: Japanese word for improvement
  • 56. “Kaizen” • Kaizen: Japanese word for improvement • Modern business context: • Continuous improvement • Systematic, scientific-method approach • Total engagement of the workforce • Valuing small changes as much as large changes (outcome is what matters)
  • 57. “Kaizen” • Kaizen: Japanese word for improvement • Modern business context: • Continuous improvement • Systematic, scientific-method approach • Total engagement of the workforce • Valuing small changes as much as large changes (outcome is what matters) • Kaizen in a DevOps context: • Continuously improve the flow of work through the full value stream in order to improve customer outcomes
  • 58. Proven Lean Techniques + DevOps Context “If I have seen further, it is by standing on the shoulders of giants.” -Sir Isaac Newton “DevOps Kaizen”
  • 61. Organization-wide focus on service delivery metrics • Lead Time (Duration and Predictability) • MTTD (Mean Time To Detect) • MTTR (Mean Time to Repair, Mean Time to Fix) • Quality at the Source (Scrap/Rework) OpsDev Customer Outcome Business Goal
  • 64. Retrospectives are a per value stream tool Value Stream A Value Stream B Value Stream C Key: “horizontal thinking”
  • 65. Map end-to-end process1 Include key process metrics: Lead Time Processing Time Scrap Rate Head Count DevOps Kaizen: Retrospective Technique
  • 66. Map end-to-end process1 Include key process metrics: Lead Time Processing Time Scrap Rate Head Count DevOps Kaizen: Retrospective Technique Note: “going to the gemba” requires making it visible together
  • 67. Map end-to-end process1 Include key process metrics: Lead Time Processing Time Scrap Rate Head Count DevOps Kaizen: Retrospective Technique Key: graphical facilitation above all else! Note: “going to the gemba” requires making it visible together
  • 68. Map end-to-end process1 Include key process metrics: Lead Time Processing Time Scrap Rate Head Count DevOps Kaizen: Retrospective Technique Key: graphical facilitation above all else! Note: “going to the gemba” requires making it visible together Inspiration: value stream mapping
  • 69. Identify wastes, inefficiencies, bottlenecks PD - Partially Done TS - Task Switching W - Waiting M - Motion / Manual D - Defects EP - Extra Process EF - Extra Features HB - Heroics Structured approach building on DevOps adaptation of “7 deadly wastes” from Lean / Agile: 2 DevOps Kaizen: Retrospective Technique
  • 70. Identify wastes, inefficiencies, bottlenecks PD - Partially Done TS - Task Switching W - Waiting M - Motion / Manual D - Defects EP - Extra Process EF - Extra Features HB - Heroics Structured approach building on DevOps adaptation of “7 deadly wastes” from Lean / Agile: 2 DevOps Kaizen: Retrospective Technique Key: focus on flow of value… not gripes
  • 71. Identify wastes, inefficiencies, bottlenecks PD - Partially Done TS - Task Switching W - Waiting M - Motion / Manual D - Defects EP - Extra Process EF - Extra Features HB - Heroics Structured approach building on DevOps adaptation of “7 deadly wastes” from Lean / Agile: 2 DevOps Kaizen: Retrospective Technique Key: focus on flow of value… not gripes Inspiration: 7 Wastes of Software Development
  • 72. DevOps Kaizen: Retrospective Technique Identify countermeasures Countermeasures must be actionable, backlog ready. Focus on short-term “baby steps”. Note broader, strategic recommendations. 3
  • 73. DevOps Kaizen: Retrospective Technique Identify countermeasures Countermeasures must be actionable, backlog ready. Focus on short-term “baby steps”. Note broader, strategic recommendations. 3 Key: “small j’s, not big j’s”
  • 74. DevOps Kaizen: Retrospective Technique Create Improvement Storyboards (Kata Style)4
  • 75. DevOps Kaizen: Retrospective Technique Create Improvement Storyboards (Kata Style)4 Key: actionable short-term “baby steps”… “what are we going to do next?”
  • 77. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" Business Trigger Major Feature for Partner Markets (if new arch) 40 weeks total 17 weeks 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Requirements / Planning Dev / Test Staging Release Prod Release Data Setup Integration Testing 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP D TS EP M D PD M W TS TS D M TS PD M M 1. 2. 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43% S/R - 71%
  • 78. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development D In D fo Scrum Dev/QA Business Trigger Major Feature for Partner Markets 17 weeks 16 weeks 6 weeks H/C: 6 3 w Requirements / Planning Dev / Test Data Setup Inte 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation Service Verification test writing: shif (test early) Remove Bottleneck and Environme (test more) EP D TS EP M D PD M W TS 1. 2. 3. S/R - 90% S/R - 71%
  • 79. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" Business Trigger Major Feature for Partner Markets (if new arch) 40 weeks total 17 weeks 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Requirements / Planning Dev / Test Staging Release Prod Release Data Setup Integration Testing 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP D TS EP M D PD M W TS TS D M TS PD M M 1. 2. 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43% S/R - 71%
  • 80. Leads PMs gn ments etail for their team e rd Chief chitect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) CA or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Jira (if new arch) 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks Dev / Test Staging Data Setup Integration Testing 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP M D PD M W TS TS D M TS 3. 4. 5. 6. S/R - 90% S/R - 55% S/R - 32% D
  • 81. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" Business Trigger Major Feature for Partner Markets (if new arch) 40 weeks total 17 weeks 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Requirements / Planning Dev / Test Staging Release Prod Release Data Setup Integration Testing 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP D TS EP M D PD M W TS TS D M TS PD M M 1. 2. 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43% S/R - 71%
  • 82. ire / needed a Data Setup "Jennifer" Test Data Configuration Manager Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" (if new arch) 40 weeks total 16 weeks weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Dev / Test Staging Release Prod Release Data Setup Integration Testing 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool vice Verification test writing: shift left to Dev (test early) move Bottleneck and Environment Contention (test more) D PD M W TS TS D M TS PD M M 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43%
  • 83. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" Business Trigger Major Feature for Partner Markets (if new arch) 40 weeks total 17 weeks 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Requirements / Planning Dev / Test Staging Release Prod Release Data Setup Integration Testing 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP D TS EP M D PD M W TS TS D M TS PD M M 1. 2. 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43% S/R - 71%
  • 84. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" Business Trigger Major Feature for Partner Markets (if new arch) 40 weeks total 17 weeks 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Requirements / Planning Dev / Test Staging Release Prod Release Data Setup Integration Testing 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP D TS EP M D PD M W TS TS D M TS PD M M 1. 2. 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43% S/R - 71% + Work in small batches + Early Ops Involvement + Standardized Catalog (with design standards built-in)
  • 85. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" Business Trigger Major Feature for Partner Markets (if new arch) 40 weeks total 17 weeks 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Requirements / Planning Dev / Test Staging Release Prod Release Data Setup Integration Testing 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP D TS EP M D PD M W TS TS D M TS PD M M 1. 2. 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43% S/R - 71% + Work in small batches + Early Ops Involvement + Standardized Catalog (with design standards built-in) +Dev provide service verification tests +Ops provide environment verification tests (used by Dev and QA) +Service service test data setup (including mainframe)
  • 86. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" Business Trigger Major Feature for Partner Markets (if new arch) 40 weeks total 17 weeks 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Requirements / Planning Dev / Test Staging Release Prod Release Data Setup Integration Testing 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP D TS EP M D PD M W TS TS D M TS PD M M 1. 2. 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43% S/R - 71% + Work in small batches + Early Ops Involvement + Standardized Catalog (with design standards built-in) +Dev provide service verification tests +Ops provide environment verification tests (used by Dev and QA) +Service service test data setup (including mainframe) +All deploys use Dev provided service verification tests +Deployment rehearsals using service verification & environment verification tests in all lower environments
  • 87. Customers and Clients Upcoming Market Event Chief Strategy Officer Strategize Product Roadmap Weekly Exec Meeting Ad-Hoc P.M "John" Program "Susan" Product Requirements Defined Jira "Stories" ARB Queue Track Ticket Dependencies Confluence Pages design docs Team Leads and PMs Assign Requirements more detail for their team Architecture Review Board Chief Architect "Linda" Working Group Ops ? (sometimes) Devs, PM, Engr, QA Development Sprint 2 week c/t Existing Dev Environments Acquire / Prepare needed data DBA in TechOps Data Setup "Jennifer" Test Data Configuration Manager Development Deploy to Integration Dev, QA Integration & Regression Testing focused on service Scrum Dev/QA Integ03 Scrum Dev/QA Test Link Sprint Review Release to Prod Product Owners (Using own criteria) Create CAB ticket or Scrum Team Ops Team (if legacy) Push Deployment to Stage Stage Email Notification Jira NewArch Build VMs Jira Ops Service Now Legacy QA Lead PMs QAs End to end testing in Prod Prod Env Prd DB Go-No Go decision meeting Team Leads Jira Ops By Cluster "Remove Feature Flag" Business Trigger Major Feature for Partner Markets (if new arch) 40 weeks total 17 weeks 16 weeks 6 weeks H/C: 6 3 weeks H/C: 8 4 weeks H/C:8 3 weeks H/C: 14 Requirements / Planning Dev / Test Staging Release Prod Release Data Setup Integration Testing 6. Service Aligned Team Concept? 4. Write end-to- end customer func. tests 5. Resolve interface to legacy 1. Test data setup automation 3. Dev to Prod for legacy 2. Unify change management tools 7. Tool Service Verification test writing: shift left to Dev (test early) Remove Bottleneck and Environment Contention (test more) EP D TS EP M D PD M W TS TS D M TS PD M M 1. 2. 3. 4. 5. 6. 7. 8. S/R - 90% S/R - 55% S/R - 32% D S/R - 43% S/R - 71% + Work in small batches + Early Ops Involvement + Standardized Catalog (with design standards built-in) Key: “What can we do next?” NOT “what is nirvana?” +Dev provide service verification tests +Ops provide environment verification tests (used by Dev and QA) +Service service test data setup (including mainframe) +All deploys use Dev provided service verification tests +Deployment rehearsals using service verification & environment verification tests in all lower environments
  • 88. "MyAccount Perf Incident" Customer Call Contact Center !!! Contact Center Call Service Desk Service Desk History: 12/20/15 - Started seeing problems 12/29/15 - Swarm of problems then everyone on vacation 9am Monday 1/4/16 - everyone back to work and problems spike Try to recreate errors Contact Service Desk Try to recreate errors Check Monitoring BMC "All green" (up/down monitoring) TixTixTix "stuff isn't working!" Many tickets Many calls Service Now Tickets Start Bridge Callcall someone call someone else Call Vendor Support Start Biz Management Bridge Assemble War Room "Terri" Dir. Account Reps Incident Commander NOC 2 hours 1/5/16 9 AM 11 AM Web Dev Fix Test Services Fix Test Portal Ops Fix Test DBA Ops Fix Test Network Fix Test Vendor Partners Fix Test Investigate DDOS (Security) Test Incident Commander ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/7/16 Send Biz Communication Emails Update BRM Bridge Call Update Update Update Update "Scribes" Notes .doc Chat Repeat War Room Cycles "Focused on Stored Procedures but not confident why" Repeat War Room Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 UpdateUpdate Update QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent Fix Options (assumptions): • Upgrade to latest version (~Q3 2016 go live?) • Rewrite DB layer of App to remove stored procedures PD No user experience monitoring in place No standard application performance tools available early and often Limited database diagnostic tools TS Lots of teams spending many cycles to "prove it wasn't them" TS Too much noise on bridge. Engineers can't focus. Open up side bridges because of difficulty of noise / task switching on main bridge Vendor reps joining calls aren't correct people Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS H M PD TS H M PD TS Can't trace user transactions through the stack No controlled "prod-like" stage env for troubleshooting Vendor provided "fix" is really just a workaround Resolution loop was never really closed. "Bandaid fix" is still in place! PD TS Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS TS H M PD TS H M PD TS No controlled "prod-like" stage env for troubleshooting Resolution loop was never really closed. "Bandaid fix" is still in place! 1. Service performance monitoring, error detection, and automated health checks starting Dev 6. Platform App Dev leading new deployment strategy to improve traceability and ease rollback 2. Improved vendor SLA management 7. Improved / Streamlined Communication in War Room / Bridge 3. Improved code review 5. "Prod-Like" Pre-Prod environments (with load testing) 8. Follow through on resolution Also In-flight: • Peer review for every check-in • Automation tooling and SDLC for DB changes (no more adhoc scripts) War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF INCIDENT: $1,012,000 ++ every 4 hours Request
  • 89. "MyAccount Perf Incident" Customer Call Contact Center !!! Contact Center Call Service Desk Service Desk History: 12/20/15 - Started seeing problems 12/29/15 - Swarm of problems then everyone on vacation 9am Monday 1/4/16 - everyone back to work and problems spike Try to recreate errors Contact Service Desk Try to recreate errors Check Monitoring BMC "All green" (up/down monitoring) TixTixTix "stuff isn't working!" Many tickets Many calls Service Now Tickets Start Bridge Callcall someone call someone else Call Vendor Support Start Biz Management Bridge Assemble War Room "Terri" Dir. Account Reps Incident Commander NOC 2 hours 1/5/16 9 AM 11 AM Web Dev Fix Test Services Fix Test Portal Ops Fix Test DBA Ops Fix Test Network Fix Test Vendor Partners Fix Test Investigate DDOS (Security) Test Incident Commander ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/6/16 Repeat War Room Cycles ~2p Send Biz Communication Emails Update BRM Bridge Call Update Update Update "Scribes" Notes .doc Update PD No user experience monitoring in place No standard application performance tools available early and often Limited database diagnostic tools TS Lots of teams spending many cycles to "prove it wasn't them" TS Too much noise on bridge. Engineers can't focus. Open up side bridges because of difficulty of noise / task switching on main bridge Vendor reps joining calls aren't correct people Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people H M PD TS PD TS Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people H M PD TS 1. Service performance monitoring, error detection, and automated health checks starting Dev 6. Platform App Dev leading new deployment strategy to improve traceability and ease rollback 2. Improved vendor SLA management 7. Improved / Stre Communication i War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 every 4 hours Request
  • 90. "MyAccount Perf Incident" Customer Call Contact Center !!! Contact Center Call Service Desk Service Desk History: 12/20/15 - Started seeing problems 12/29/15 - Swarm of problems then everyone on vacation 9am Monday 1/4/16 - everyone back to work and problems spike Try to recreate errors Contact Service Desk Try to recreate errors Check Monitoring BMC "All green" (up/down monitoring) TixTixTix "stuff isn't working!" Many tickets Many calls Service Now Tickets Start Bridge Callcall someone call someone else Call Vendor Support Start Biz Management Bridge Assemble War Room "Terri" Dir. Account Reps Incident Commander NOC 2 hours 1/5/16 9 AM 11 AM Web Dev Fix Test Services Fix Test Portal Ops Fix Test DBA Ops Fix Test Network Fix Test Vendor Partners Fix Test Investigate DDOS (Security) Test Incident Commander ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/7/16 Send Biz Communication Emails Update BRM Bridge Call Update Update Update Update "Scribes" Notes .doc Chat Repeat War Room Cycles "Focused on Stored Procedures but not confident why" Repeat War Room Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 UpdateUpdate Update QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent Fix Options (assumptions): • Upgrade to latest version (~Q3 2016 go live?) • Rewrite DB layer of App to remove stored procedures PD No user experience monitoring in place No standard application performance tools available early and often Limited database diagnostic tools TS Lots of teams spending many cycles to "prove it wasn't them" TS Too much noise on bridge. Engineers can't focus. Open up side bridges because of difficulty of noise / task switching on main bridge Vendor reps joining calls aren't correct people Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS H M PD TS H M PD TS Can't trace user transactions through the stack No controlled "prod-like" stage env for troubleshooting Vendor provided "fix" is really just a workaround Resolution loop was never really closed. "Bandaid fix" is still in place! PD TS Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS TS H M PD TS H M PD TS No controlled "prod-like" stage env for troubleshooting Resolution loop was never really closed. "Bandaid fix" is still in place! 1. Service performance monitoring, error detection, and automated health checks starting Dev 6. Platform App Dev leading new deployment strategy to improve traceability and ease rollback 2. Improved vendor SLA management 7. Improved / Streamlined Communication in War Room / Bridge 3. Improved code review 5. "Prod-Like" Pre-Prod environments (with load testing) 8. Follow through on resolution Also In-flight: • Peer review for every check-in • Automation tooling and SDLC for DB changes (no more adhoc scripts) War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF INCIDENT: $1,012,000 ++ every 4 hours Request
  • 91. Web Dev Fix Test Services Fix Test Portal Ops Fix Test DBA Ops Fix Test Network Fix Test Vendor Partners Fix Test Investigate DDOS (Security) Test Incident Commander ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/7/16 Send Biz Communication Emails Update BRM Bridge Call Update Update Update Update "Scribes" Notes .doc Chat Repeat War Room Cycles "Focused on Stored Procedures but not confident why" Repeat War Room Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 UpdateUpdate Update QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent Fix Options (assumptions): • Upgrade to latest version (~Q3 2016 go live?) • Rewrite DB layer of App to remove stored procedures nability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS H M PD TS H M PD TS Can't trace user transactions through the stack No controlled "prod-like" stage env for troubleshooting Vendor provided "fix" is really just a workaround Resolution loop was never really closed. "Bandaid fix" is still in place! nability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS TS H M PD TS H M PD TS No controlled "prod-like" stage env for troubleshooting Resolution loop was never really closed. "Bandaid fix" is still in place! 7. Improved / Streamlined Communication in War Room / Bridge 3. Improved code review 5. "Prod-Like" Pre-Prod environments (with load testing) 8. Follow through on resolution Also In-flight: • Peer review for every check-in • Automation tooling and SDLC for DB changes (no more adhoc scripts) War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF INCIDENT: $1,012,000 ++ every 4 hours
  • 92. "MyAccount Perf Incident" Customer Call Contact Center !!! Contact Center Call Service Desk Service Desk History: 12/20/15 - Started seeing problems 12/29/15 - Swarm of problems then everyone on vacation 9am Monday 1/4/16 - everyone back to work and problems spike Try to recreate errors Contact Service Desk Try to recreate errors Check Monitoring BMC "All green" (up/down monitoring) TixTixTix "stuff isn't working!" Many tickets Many calls Service Now Tickets Start Bridge Callcall someone call someone else Call Vendor Support Start Biz Management Bridge Assemble War Room "Terri" Dir. Account Reps Incident Commander NOC 2 hours 1/5/16 9 AM 11 AM Web Dev Fix Test Services Fix Test Portal Ops Fix Test DBA Ops Fix Test Network Fix Test Vendor Partners Fix Test Investigate DDOS (Security) Test Incident Commander ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/7/16 Send Biz Communication Emails Update BRM Bridge Call Update Update Update Update "Scribes" Notes .doc Chat Repeat War Room Cycles "Focused on Stored Procedures but not confident why" Repeat War Room Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 UpdateUpdate Update QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent Fix Options (assumptions): • Upgrade to latest version (~Q3 2016 go live?) • Rewrite DB layer of App to remove stored procedures PD No user experience monitoring in place No standard application performance tools available early and often Limited database diagnostic tools TS Lots of teams spending many cycles to "prove it wasn't them" TS Too much noise on bridge. Engineers can't focus. Open up side bridges because of difficulty of noise / task switching on main bridge Vendor reps joining calls aren't correct people Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS H M PD TS H M PD TS Can't trace user transactions through the stack No controlled "prod-like" stage env for troubleshooting Vendor provided "fix" is really just a workaround Resolution loop was never really closed. "Bandaid fix" is still in place! PD TS Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS TS H M PD TS H M PD TS No controlled "prod-like" stage env for troubleshooting Resolution loop was never really closed. "Bandaid fix" is still in place! 1. Service performance monitoring, error detection, and automated health checks starting Dev 6. Platform App Dev leading new deployment strategy to improve traceability and ease rollback 2. Improved vendor SLA management 7. Improved / Streamlined Communication in War Room / Bridge 3. Improved code review 5. "Prod-Like" Pre-Prod environments (with load testing) 8. Follow through on resolution Also In-flight: • Peer review for every check-in • Automation tooling and SDLC for DB changes (no more adhoc scripts) War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF INCIDENT: $1,012,000 ++ every 4 hours Request
  • 93. ProblemSubside ProblemSpik ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubside ProblemSpik ~2pm ~5am 1/7/16 "Scribes" Notes .doc Chat Room Cycles "Focused on Stored Procedures but not confident why" Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent (assum • Upgrade t (~Q3 2016 • Rewrite D to remove procedure s with any "). "Fight place! s with any "). "Fight place! 3. Improved code review Also In-flight: • Peer review for every chec • Automation tooling and SD DB changes (no more adho War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF IN $1,012,000 ++
  • 94. ProblemSubside ProblemSpik ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubside ProblemSpik ~2pm ~5am 1/7/16 "Scribes" Notes .doc Chat Room Cycles "Focused on Stored Procedures but not confident why" Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent (assum • Upgrade t (~Q3 2016 • Rewrite D to remove procedure s with any "). "Fight place! s with any "). "Fight place! 3. Improved code review Also In-flight: • Peer review for every chec • Automation tooling and SD DB changes (no more adho War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF IN $1,012,000 ++
  • 95. "MyAccount Perf Incident" Customer Call Contact Center !!! Contact Center Call Service Desk Service Desk History: 12/20/15 - Started seeing problems 12/29/15 - Swarm of problems then everyone on vacation 9am Monday 1/4/16 - everyone back to work and problems spike Try to recreate errors Contact Service Desk Try to recreate errors Check Monitoring BMC "All green" (up/down monitoring) TixTixTix "stuff isn't working!" Many tickets Many calls Service Now Tickets Start Bridge Callcall someone call someone else Call Vendor Support Start Biz Management Bridge Assemble War Room "Terri" Dir. Account Reps Incident Commander NOC 2 hours 1/5/16 9 AM 11 AM Web Dev Fix Test Services Fix Test Portal Ops Fix Test DBA Ops Fix Test Network Fix Test Vendor Partners Fix Test Investigate DDOS (Security) Test Incident Commander ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/7/16 Send Biz Communication Emails Update BRM Bridge Call Update Update Update Update "Scribes" Notes .doc Chat Repeat War Room Cycles "Focused on Stored Procedures but not confident why" Repeat War Room Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 UpdateUpdate Update QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent Fix Options (assumptions): • Upgrade to latest version (~Q3 2016 go live?) • Rewrite DB layer of App to remove stored procedures PD No user experience monitoring in place No standard application performance tools available early and often Limited database diagnostic tools TS Lots of teams spending many cycles to "prove it wasn't them" TS Too much noise on bridge. Engineers can't focus. Open up side bridges because of difficulty of noise / task switching on main bridge Vendor reps joining calls aren't correct people Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS H M PD TS H M PD TS Can't trace user transactions through the stack No controlled "prod-like" stage env for troubleshooting Vendor provided "fix" is really just a workaround Resolution loop was never really closed. "Bandaid fix" is still in place! PD TS Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS TS H M PD TS H M PD TS No controlled "prod-like" stage env for troubleshooting Resolution loop was never really closed. "Bandaid fix" is still in place! 1. Service performance monitoring, error detection, and automated health checks starting Dev 6. Platform App Dev leading new deployment strategy to improve traceability and ease rollback 2. Improved vendor SLA management 7. Improved / Streamlined Communication in War Room / Bridge 3. Improved code review 5. "Prod-Like" Pre-Prod environments (with load testing) 8. Follow through on resolution Also In-flight: • Peer review for every check-in • Automation tooling and SDLC for DB changes (no more adhoc scripts) War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF INCIDENT: $1,012,000 ++ every 4 hours Request
  • 96. "MyAccount Perf Incident" Customer Call Contact Center !!! Contact Center Call Service Desk Service Desk History: 12/20/15 - Started seeing problems 12/29/15 - Swarm of problems then everyone on vacation 9am Monday 1/4/16 - everyone back to work and problems spike Try to recreate errors Contact Service Desk Try to recreate errors Check Monitoring BMC "All green" (up/down monitoring) TixTixTix "stuff isn't working!" Many tickets Many calls Service Now Tickets Start Bridge Callcall someone call someone else Call Vendor Support Start Biz Management Bridge Assemble War Room "Terri" Dir. Account Reps Incident Commander NOC 2 hours 1/5/16 9 AM 11 AM Web Dev Fix Test Services Fix Test Portal Ops Fix Test DBA Ops Fix Test Network Fix Test Vendor Partners Fix Test Investigate DDOS (Security) Test Incident Commander ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/7/16 Send Biz Communication Emails Update BRM Bridge Call Update Update Update Update "Scribes" Notes .doc Chat Repeat War Room Cycles "Focused on Stored Procedures but not confident why" Repeat War Room Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 UpdateUpdate Update QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent Fix Options (assumptions): • Upgrade to latest version (~Q3 2016 go live?) • Rewrite DB layer of App to remove stored procedures PD No user experience monitoring in place No standard application performance tools available early and often Limited database diagnostic tools TS Lots of teams spending many cycles to "prove it wasn't them" TS Too much noise on bridge. Engineers can't focus. Open up side bridges because of difficulty of noise / task switching on main bridge Vendor reps joining calls aren't correct people Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS H M PD TS H M PD TS Can't trace user transactions through the stack No controlled "prod-like" stage env for troubleshooting Vendor provided "fix" is really just a workaround Resolution loop was never really closed. "Bandaid fix" is still in place! PD TS Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS TS H M PD TS H M PD TS No controlled "prod-like" stage env for troubleshooting Resolution loop was never really closed. "Bandaid fix" is still in place! 1. Service performance monitoring, error detection, and automated health checks starting Dev 6. Platform App Dev leading new deployment strategy to improve traceability and ease rollback 2. Improved vendor SLA management 7. Improved / Streamlined Communication in War Room / Bridge 3. Improved code review 5. "Prod-Like" Pre-Prod environments (with load testing) 8. Follow through on resolution Also In-flight: • Peer review for every check-in • Automation tooling and SDLC for DB changes (no more adhoc scripts) War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF INCIDENT: $1,012,000 ++ every 4 hours Request + Automated service verification and performance health checks starting in Dev +Ops added to code peer reviews +Streamlined war rom and bridge call communication
  • 97. "MyAccount Perf Incident" Customer Call Contact Center !!! Contact Center Call Service Desk Service Desk History: 12/20/15 - Started seeing problems 12/29/15 - Swarm of problems then everyone on vacation 9am Monday 1/4/16 - everyone back to work and problems spike Try to recreate errors Contact Service Desk Try to recreate errors Check Monitoring BMC "All green" (up/down monitoring) TixTixTix "stuff isn't working!" Many tickets Many calls Service Now Tickets Start Bridge Callcall someone call someone else Call Vendor Support Start Biz Management Bridge Assemble War Room "Terri" Dir. Account Reps Incident Commander NOC 2 hours 1/5/16 9 AM 11 AM Web Dev Fix Test Services Fix Test Portal Ops Fix Test DBA Ops Fix Test Network Fix Test Vendor Partners Fix Test Investigate DDOS (Security) Test Incident Commander ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/6/16 Repeat War Room Cycles ProblemSubsidesforDay ProblemSpikes!! ~2pm ~5am 1/7/16 Send Biz Communication Emails Update BRM Bridge Call Update Update Update Update "Scribes" Notes .doc Chat Repeat War Room Cycles "Focused on Stored Procedures but not confident why" Repeat War Room Cycles "Vendor consultant provides config changes" "Stored Procedure s used in way never seen before!" • Vendor Consultant onsite for 3 days • QA tried to simulate load • Tried adding capacity (3 app servers and web) • Tried disabling M-F Every Day! 1/13/16 UpdateUpdate Update QA Run Load Tests to Confirm Fix (After Hours) Prod 1/13/16-1/15/16 Additional Fix from Vendor Consultant Permanent Fix Options (assumptions): • Upgrade to latest version (~Q3 2016 go live?) • Rewrite DB layer of App to remove stored procedures PD No user experience monitoring in place No standard application performance tools available early and often Limited database diagnostic tools TS Lots of teams spending many cycles to "prove it wasn't them" TS Too much noise on bridge. Engineers can't focus. Open up side bridges because of difficulty of noise / task switching on main bridge Vendor reps joining calls aren't correct people Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS H M PD TS H M PD TS Can't trace user transactions through the stack No controlled "prod-like" stage env for troubleshooting Vendor provided "fix" is really just a workaround Resolution loop was never really closed. "Bandaid fix" is still in place! PD TS Inability to rollback recent changes with any confidence ("probably impossible?"). "Fight forward" only. No service owner to manage biz communication and determine if resolved HBM PD TS Experimenting directly in production Documentation not up to date Lots of people coming in and out. H M PD TS TS H M PD TS H M PD TS No controlled "prod-like" stage env for troubleshooting Resolution loop was never really closed. "Bandaid fix" is still in place! 1. Service performance monitoring, error detection, and automated health checks starting Dev 6. Platform App Dev leading new deployment strategy to improve traceability and ease rollback 2. Improved vendor SLA management 7. Improved / Streamlined Communication in War Room / Bridge 3. Improved code review 5. "Prod-Like" Pre-Prod environments (with load testing) 8. Follow through on resolution Also In-flight: • Peer review for every check-in • Automation tooling and SDLC for DB changes (no more adhoc scripts) War Room and Bridge Cost (direct labor): M-F: 35 h/c per day for 7 days = $195,000 S,S: 6 h/c per day for 2 days = $11,400 Total: $206,400 Impact on Call Centers: 1500 Agents 30% degraded 8 hrs for 7 days = $806,000 Cost of impact/delay on other inflight projects: Unknown ("feels large", estimated at 20% - 100%) Cost of impact/delay on compliance issues: Unknown Cost of brand damage: Unknown TOTAL COST OF INCIDENT: $1,012,000 ++ every 4 hours Request + Automated service verification and performance health checks starting in Dev +Ops added to code peer reviews +Streamlined war rom and bridge call communication + “Prod-like” testing environments (with load testing) +Automation and SDLC for database changes +Follow through on resolution to achieve permanent fixes
  • 98. Improvement Storyboards Process Name Challenge/Key Pain Target Condition Current Condition Improvement Metrics Work ToDo (Baby Steps) Blockers Changes are being introduced / tested in production the first time causing delays, rework, outages Target Condition Improvement Metrics Work ToDo (Baby Steps) Blockers GTM/LTM (Traffic manager configuration process) • Apps are not developed in production-like environments (not testing F5 behavior) • Ops teams cannot practice or learn • App teams have no visibility into constraints • No remediation capabilities for app support teams • No repeatable pattern for GM health activity • 80% S/R with 2-3 rework cycles • 50% cause outages Current Condition • GM/TLM functionality across all SDLC environments (capex request needed) • change window reduction for non- prod environments (turn those around instantaneously less than 13 days) • Provide read-only to all F5 consoles • Standardize GM pattern • Lead Time (post-dev to prod) • Scrap Rate • Acquire the F5 hardware or software to support envs throughout SDLC • Make these changes L3 or 5 change requests • Write automation scripts • provide read only access to all environments… can include API access to facilitate automation script writing • Create design template with customer pattern ◦ Financial approval (Jennifer) ◦ Segregation between environments (Mark) ◦ Non-standard request types (Susan) ◦ Two network teams with different rules (Mark) Process Challenge/Key Pain Template Example
  • 99. Improvement Storyboards Process Name Challenge/Key Pain Target Condition Current Condition Improvement Metrics Work ToDo (Baby Steps) Blockers Changes are being introduced / tested in production the first time causing delays, rework, outages Target Condition Improvement Metrics Work ToDo (Baby Steps) Blockers GTM/LTM (Traffic manager configuration process) • Apps are not developed in production-like environments (not testing F5 behavior) • Ops teams cannot practice or learn • App teams have no visibility into constraints • No remediation capabilities for app support teams • No repeatable pattern for GM health activity • 80% S/R with 2-3 rework cycles • 50% cause outages Current Condition • GM/TLM functionality across all SDLC environments (capex request needed) • change window reduction for non- prod environments (turn those around instantaneously less than 13 days) • Provide read-only to all F5 consoles • Standardize GM pattern • Lead Time (post-dev to prod) • Scrap Rate • Acquire the F5 hardware or software to support envs throughout SDLC • Make these changes L3 or 5 change requests • Write automation scripts • provide read only access to all environments… can include API access to facilitate automation script writing • Create design template with customer pattern ◦ Financial approval (Jennifer) ◦ Segregation between environments (Mark) ◦ Non-standard request types (Susan) ◦ Two network teams with different rules (Mark) Process Challenge/Key Pain Inspiration: A3 management process Template Example
  • 100. Using Storyboards: Part “Sales”, Part Coaching Coaching (Not Solutions) Facts and Next Steps Learner Coach / Leader What is the target condition? What is the actual condition now? What obstacles do you think are stopping you from reaching target condition? What is your next step? When will we know what was learned from the next step? Asks Questions Maintain Storyboard Answer / Explain Process Name Challenge/Key Pain Target Condition Current Condition Improvement Metrics Work ToDo (Baby Steps) Blockers
  • 101. Using Storyboards: Part “Sales”, Part Coaching Coaching (Not Solutions) Facts and Next Steps Learner Coach / Leader What is the target condition? What is the actual condition now? What obstacles do you think are stopping you from reaching target condition? What is your next step? When will we know what was learned from the next step? Asks Questions Maintain Storyboard Answer / Explain Process Name Challenge/Key Pain Target Condition Current Condition Improvement Metrics Work ToDo (Baby Steps) Blockers Inspiration: Toyota Kata
  • 103. 1. The will to make change happen 2. The resources to make change happen 3. Drive follow-through / clear obstacles Kaizen Program Oversight
  • 104. 1. The will to make change happen 2. The resources to make change happen 3. Drive follow-through / clear obstacles Kaizen Program Oversight This (and only this) is what the Kaizen Program Oversight Group does!
  • 105. 1. The will to make change happen 2. The resources to make change happen 3. Drive follow-through / clear obstacles Kaizen Program Oversight
  • 106. 1. The will to make change happen 2. The resources to make change happen 3. Drive follow-through / clear obstacles Kaizen Program Oversight Inspire Executives with:
  • 108. DevOps Kaizen Program is an overlay for any delivery methodology Project 1 Project 2 Project 3 Sprint 1 Improvement Program Oversight (Strategy) Team A Full Retrospective & Planning Sprint 2 Sprint 3 Sprint 4 Sprint 5 Sprint 6 Sprint 7 Sprint 8 Team B Refresh Retrospective
  • 109. DevOps Kaizen: Let’s Recap! Make the work visible Focus on Continuous Improvement Service Delivery Metrics Kaizen Program Oversight Planning & Retrospectives Informs Informs Countermeasures & Blockers Establish program elements Project 1 Project 2 Project 3 Sprint 1 Improvement Program Oversight (Strategy) Team A Full Retrospective & Planning Sprint 2 Sprint 3 Sprint 4 Sprint 5 Sprint 6 Sprint 7 Sprint 8 Team B Refresh Retrospective Build into your operating model
  • 110. Join me tomorrow! 10:15 in Victoria Suite Helping Ops Help You: Development’s Role in Enabling Self-Service Operations