6. The Problem of Robust
• Robust is just Fragile with a thicker skin…
• Encourages a defensive, static mindset
• Resistant to change?
• Vulnerable to “Black Swan” events…
– Something we haven’t anticipated
– A failure mode we can’t have foreseen
– A cascade of errors that we did not plan for
9. Anti-fragile
Some things benefit from
shocks…volatility, randomness, disorder,
and stressors and love adventure, risk,
and uncertainty… there is no word for
the exact opposite of fragile. Let’s call it
antifragile.
Nassim N. Taleb, “Antifragile. Things that gain from
disorder”
19. Cross IDC Active - Active
GLSB
DC Aware
Gateway
SOA Edge Service
Service
Registry
Peer Sync
Invoke
Invoke Invoke
Invoke
DC 1 DC 2
SOA Middle Tier Service
DC Aware
Gateway
SOA Edge Service
SOA Middle Tier Service
Service
RegistryDC Aware
Client
DC Aware
Client
Invoke
Invoke
Invoke
Lookup
Lookup
Register Register
Lookup Lookup
RegisterRegister
20. Anti-fragile Continuous
Integration/Delivery Pattern
… to exert a constant stress on your
delivery and deployment process to
reduce its fragility so that releasing
becomes a boring, low-risk activity.
Jez Humble, “On Antifragility in Systems and
Organizational Structure”
http://continuousdelivery.com/2013/01/on-
antifragility-in-systems-and-organizational-
architecture/
21. Building Distributed System is
Extremely Hard
• Even Harder to Test Sufficiently
– Massive data sets and changing shape
– Internet-scale traffic
– Complex interaction and information flow
– Asynchronous nature
– 3rd party services
– All while innovating and building features
Prohibitively expensive, if not impossible, for
most large-scale systems.
22. There is another Way
• Assume everything will fail
• Cause failure to validate resiliency
• Test design assumption by stressing them
• Don’t wait for random failure. Remove its
uncertainty by forcing it periodically.
23. What Netflix has Done – Embrace
Chaos!
“One of the first systems our engineers
built in AWS is called the Chaos Monkey.
The Chaos Monkey’s job is to randomly kill
instances and services within our
architecture.
If we aren’t constantly testing our ability
to succeed despite failure, then it isn’t
likely to work when it matters most – in
the event of an unexpected outage.”
http://luckyrobot.com/netflix-chaos-monkey-keeps-movies-streaming/
http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-
monkey.html
25. Representative Anti-fragile
Organization
The Netflix cloud architecture is anti-fragile… The
Netflix culture is anti-fragile… Getting stronger
through failure is the basis of anti-fragility. Avoiding
failure at all costs… makes you brittle and
vulnerable…
Adrian Cockroft, “Looking back at 2012 with
pointers to 2013”
http://perfcap.blogspot.com/2013/12/looking-
back-at-2013-with-pointers-to.html
26. Architecture for Imperfection
A highly agile and highly available service constructed from ephemeral
and often broken components. It is a service-oriented architecture
built on micro-services, none of which are essential to the operation of
the whole.
The software is written to run across three Amazon datacenters, and
will tolerate the loss of any one. We can lose a third of our
infrastructure without our customers noticing and calling customer
services, it’s no idle claim, Netflix even tests this aspect of its
infrastructure. A few weeks ago the team deliberately killed one of the
three zones, knocking out 3000 servers in one fell swoop, just to prove
that we could do it.
By Adrian Cockcroft, from “Netflix, HANA and the meaning of cloud”
http://diginomica.com/2013/05/13/netflix-hana-and-the-meaning-of-
cloud/
27. Netflix Global Active – Active
Cloud Architecture
http://awsmedia.s3.amazonaws.com/ARC305.pdf
28. What on Earth is DevOps
Devops means giving a sh*t about your job enough
to not pass the buck.
Devops means giving a sh*t about your job enough
to want to learn all the parts and not just your little
world.
Developers need to understand infrastructure.
Operations people need to understand code.
- John E. Vincent(@Lusis)
http://blog.lusis.org/blog/2013/06/04/devops-the-title-
match/
29. “3 Ways” of DevOps
http://itrevolution.com/the-three-ways-
principles-underpinning-devops/
30. The First Way
Silo vs. System Thinking, focus on the end to end
value flow.
31. The Second Way
System improvement via visibility, feedback and
data driven decisions
35. Take Away
1. Obsessive protection of system against
extremely rare events makes it more fragile.
2. Monoculture is fragile, diversity is anti-fragile.
3. If it hurts, do it more often, and bring the
pain forward.
4. To create anti-fragile system, stress to them
continuously so we are forced to simplify and
automate.
36. Reading for System and Architectural
Thinking – recommended by Adrian
Cockroft