3. Swarming involves removing the tiers
of support, and calling on the collective
expertise of a “swarm” of analysts.
https://www.serviceinnovation.org/intelligent-swarming/
Swarming defined
@jonhall_
4. Local Product Line
Support Teams
Severity 1
Swarm
Local Dispatch Swarm
Prioritise
Severity 1
Swarm
Local Dispatch Swarm
Prioritise
Local Product Line
Support Teams
Swarming example: BMC’s Sev-1 and Dispatch Swarms
@jonhall_
5. • Rapid responders
• Three agents on a scheduled one-week rotation
• Primary focus: Provide immediate response, and resolve as soon as
possible
Swarm lead
Communications
Other members
Research, coordinate, test
Severity 1 Swarm
@jonhall_
6. • “Cherry pickers”
• Meet every 60-90 minutes
• Primary focus: Can new tickets be resolved immediately?
• Also: Validation of ticket details before assignment to specialists
Experienced analyst Less-experienced analyst
Dispatch Swarm
@jonhall_
7. Local Product Line Support Teams Local Product Line Support Teams
Backlog Swarm Backlog Swarm Backlog Swarm
Swarming example: BMC’s “Backlog Swarms”
@jonhall_
8. • Global fixers of troublesome tickets
• Meet regularly (often several times a day)
• Primary focus: Challenging 3rd-line tickets
• Replace reassignments and individual assignments
Experienced analysts R&D Engineers
Backlog Swarms
@jonhall_
9. Swarming Example: Drop-in SME support for Service Desk
@jonhall_
CUSTOMER CHAT SESSIONS
Service Desk Agents
CHAT
CHANNEL
Subject Experts
CHAT
CHANNEL
Subject Experts
CHAT
CHANNEL
Subject Experts
• Regional chat-based service
desk at a global Telco
• Agents can put customer on-
hold for 3 minutes
• Subject experts wait in
“always-on” chat channels
10. Swarming Example: Auto manufacturer’s connected cars team
@jonhall_
Engineering Team A
• First responder initiates and
coordinates swarms for big issues
• Other teams have 1 person on
rotation for swarming
• Swarms may also involve 3rd parties
(e.g. Amazon, Microsoft)
• Swarm grows and shrinks as needed
Engineering Team B 3rd Party Suppliers
First Responder
Challenge: Scaling from small beginnings to millions of vehicles
11. Application1 Application2
@jonhall_
Developers
Support SpecialistsOperations Team
Scenario: Government agency with a growing DevOps initiative
Before transformation…
• Traditional tiered teams for
Operations and Support
• Common pool of developers,
assigned and reassigned to tasks
across multiple projects
Swarming Example: “Always-on” Swarming
12. Application1
@jonhall_
Developer
Swarming Example: “Always-on” Swarming
Scenario: Government agency with a growing DevOps initiative
After transformation…
• Product, not project thinking
• Team leaders have autonomy to
create and change teams
• Support professionals embedded
in full-stack teams
Application2
Operator Support Specialist
13. • Work-in-progress queues
• Asynchronous communication
• Single role teams
• Individual over-exposure
• Lack of knowledge sharing
How to annoy a DevOps practitioner
@jonhall_
15. Deployment frequency:
Change lead time:
Mean time to recover:
Change failure rate:
46x higher
2555x faster
2604x faster
7x lower
ITSM is under significant pressure from DevOps…
2018 State of DevOps Report
16. 2018 State of DevOps Report
But… Service Management has a lot to offer to DevOps
@jonhall_
17. • New services and applications suddenly appear
• Lost visibility when issues go to developers
• Lack of knowledge sharing
• New kinds of customer, especially external
DevOps challenges Service Desk orthodoxies…
@jonhall_
18. • Scaling customer support
• Understanding the context of an issue
• Adaptation to life “on call”
• What to prioritise? Fix bugs or build new stuff?
• How to process alerts, particularly if noisy/low-quality.
…but enterprise realities challenge DevOps
@jonhall_
19. DevOps teams aren’t as ITSM-phobic as some think
“I need to understand
drifts, timelines…”
“The person who is on call at
4am needs to know who has
been doing what”
“Context is a trigger word for me...
in a company of 4000 people,
things can get out of hand really
fast if you don't have context”
“What is actually running
on an environment?”
“If you're dropped in the
middle of something,
how did you get here?"
(Real quotes from conversations at Configuration Management Camp, Ghent)
20. “The enterprise space doesn’t move slowly
because they’re stupid, or they hate technology.
It’s because they have users”
Luke Kanies, Puppet Founder, Configuration Management Camp 2015, Belgium.
@jonhall_
21. Swarming aligns really well to DevOps
• Autonomy and self-organisation
• Knowledge transfer and skills development
• ChatOps, not email
• Prevention of accumulation of queued work
• Protection of individuals from burnout
@jonhall_
22. We face an issue:
The tiered support system constrains ITSM’s
ability to adapt to new practices and thinking.
@jonhall_
23. • Pronounced “kuh-nev-in”
• Developed by Dave Snowden while at IBM in 1999
• “A decision support framework which comes from a
mixture of complexity theory and cognitive
science… the opposite of a one-size fits all model”
Cynefin: An example of new thinking
@jonhall_
24. @jonhall_
• Obvious and Complicated domains:
• Repeating relationship between cause and effect
• With Complicated you need to do analysis to find
that relationship
• Complex domain:
• Understanding the problem requires
experimentation and analysis.
• May, over time, be able to move to Complicated
• Chaotic domain:
• Dramatic and unconstrained
• Focus on damage limitation, try to move to
another domain
Cynefin “Domains” – an overview
26. “Complicated” Domain
@jonhall_
• “Sense, Analyse, Respond”
• Good practice.
• Dispatch-type swarm – pair up agents with varied experience
• Capture detailed knowledge for organizational learning
• Suits a “Dispatch Swarm” type approach?
Swarm
Lead
Swarm
Assistant
27. • Not acting is not an option: act immediately, observe impact
• Try to move from Chaotic to Complex by introducing constraints
• Chaos may be an opportunity to innovate
@jonhall_
Response Lead
Customer LiaisonDamage limitation/restoration Innovation
Swarming in response to a Chaotic situation
Planned Response
29. The impact of Complexity
@jonhall_
Charity Majors - Observability for emerging infra
Config Management Camp, Ghent 2019
“Distributed
systems have an
infinite list of
almost impossible
failure scenarios"
30. Some Complexity theory…
@jonhall_
• Complex systems contain mixtures of latent failures
• It’s impossible not to have multiple flaws
• The failures change constantly
• Complex systems run as broken
• Operating complex systems needs human expertise
• Issues have multiple causes, not a single root-cause
“How Complex Systems Fail” (1998) - Richard I. Cook, MD
Cognitive Technologies Laboratory, University of Chicago
31. Complex systems fail in complex ways
@jonhall_
“All twenty app services have 10% of nodes enter a simultaneous crash
loop cycle, about five times a day, at unpredictable intervals.
It clears up before we can debug it, every time”
“We run a platform, and it’s hard to distinguish between problems that
users are inflicting on themselves, and problems in our own code,
since they all manifest as the same errors or timeouts”.
“I have 20 microservices and three datastores across three regions, and
everything seems to be getting a little slower over the past 2 weeks
…but nothing has changed that we know of.
Latency is usually back to the historical norm on Tuesdays”
Who would you assign to? Charity Majors
Observability for emerging infra
Config Management Camp, Ghent 2018
32. Identify
“coherent”
hypotheses
Cynefin approach to a Complex issue
@jonhall_
• “Sense, Analyse, Respond”
• Identify multiple hypotheses
• Gain understanding of the system by interacting with it
• Create predictability, increase constraints, try to move to Complicated
Convene “safe
to fail”
experiments
Observe and
monitor impact
Amplify good
patterns,
dampen bad
34. The way forward
@jonhall_
• ITSM must adapt to retain relevance and credibility
• Over-constrained, inflexible practices will stifle this adaptation
• ITIL® v4 is a good step forward: giving more room to develop new
approaches to practices
• It’s a good time to be an innovative thinker
35. Swarming appearing in ITSM frameworks
ITIL® 4 Foundation (2019)
VeriSM – A service management
approach for the digital age (2017)
Teams map to Applications.
a
Each team is fully repsonsible for an Application.
Teams form around applications.
Team Leaders determine who’s in the team and what they do.
Teams map to Applications.
a
Each team is fully repsonsible for an Application.
Teams form around applications.
Team Leaders determine who’s in the team and what they do.
No more assignment to individuals
signifies the multiple factors in our environment and our experience that influence us in ways we can never understand
Linux, Apache, MySQL, PHP
Experiments should be parallel –
Otherwise, because you’re doing something novel, it is likely to be seen to be successful.
Experiments might be naïve
Enabling constraints: channel activity, focus it, enable people to do what they wouldn’t normally.
Dispositional, not causal. Can make statements about the preset.
Experiments should be parallel –
Otherwise, because you’re doing something novel, it is likely to be seen to be successful.
Experiments might be naïve
Enabling constraints: channel activity, focus it, enable people to do what they wouldn’t normally.
Dispositional, not causal. Can make statements about the preset.
Experiments should be parallel –
Otherwise, because you’re doing something novel, it is likely to be seen to be successful.
Experiments might be naïve
Enabling constraints: channel activity, focus it, enable people to do what they wouldn’t normally.
Dispositional, not causal. Can make statements about the preset.
Experiments should be parallel –
Otherwise, because you’re doing something novel, it is likely to be seen to be successful.
Experiments might be naïve
Enabling constraints: channel activity, focus it, enable people to do what they wouldn’t normally.
Dispositional, not causal. Can make statements about the preset.