Patterns of Resilience
A small pattern language

Uwe Friedrichsen – codecentric AG – 2014-2016
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
Resilience? Never heard of it …
re•sil•ience (rɪˈzɪl yəns) also re•sil′ien•cy, n.

1.  the power or ability to return to the original form, position,
etc....
What’s all the fuss about?
It‘s all about production!
Business
Production
Availability
Availability ≔ 
MTTF
MTTF + MTTR
MTTF: Mean Time To Failure
MTTR: Mean Time To Recovery
How can I maximize availability?
Traditional stability approach
Availability ≔ 
MTTF
MTTF + MTTR
Maximize MTTF
reliability

degree to which a system, product or component
performs specified functions
under specified conditions for a ...
What’s the problem?
(Almost) every system is a distributed system

Chas Emerick
The Eight Fallacies of Distributed Computing

1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. Th...
A distributed system is one in which the failure
of a computer you didn't even know existed
can render your own computer u...
Failures in todays complex, distributed and
interconnected systems are not the exception.

•  They are the normal case

• ...
… and it’s getting “worse”


•  Cloud-based systems
•  Microservices
•  Zero Downtime
•  IoT & Mobile
•  Social

à Ever-in...
Do not try to avoid failures. Embrace them.
Resilience approach
Availability ≔ 
MTTF
MTTF + MTTR
Minimize MTTR
resilience (IT)

the ability of a system to handle unexpected situations
-  without the user noticing it (best case)
-  wi...
Designing for resilience
A small pattern language
Isolation
Isolation
•  System must not fail as a whole
•  Split system in parts and isolate parts against each other
•  Avoid cascad...
Isolation
Bulkheads
Bulkheads

•  Core isolation pattern
•  a.k.a. “failure units” or “units of mitigation”
•  Used as units of redundancy (an...
Isolation
Bulkheads
Complete
Parameter
Checking
Complete Parameter Checking

•  As obvious as it sounds, yet often neglected
•  Protection from broken/malicious calls (an...
Complete Parameter Checking
// How to design request parameters
// Worst variant – requires tons of checks
String buySomet...
Isolation
Bulkheads
Complete
Parameter
Checking
Loose Coupling
Loose Coupling

•  Complements isolation
•  Reduce coupling between failure units
•  Avoid cascading failures
•  Different...
Isolation
Bulkheads
Loose Coupling
Complete
Parameter
Checking
Asynchronous
Communication
Asynchronous Communication

•  Decouples sender from receiver
•  Sender does not need to wait for receiver’s response
•  U...
Isolation
Bulkheads
Loose Coupling
Asynchronous
Communication
Complete
Parameter
Checking
Location
Transparency
Location Transparency

•  Decouples sender from receiver
•  Sender does not need to know receiver’s concrete location
•  U...
Isolation
Bulkheads
Loose Coupling
Asynchronous
Communication
 Location
Transparency
Complete
Parameter
Checking
Event-Dri...
Event-Driven

•  Popular asynchronous communication style
•  Without broker location dependency is reversed
•  With broker...
Request/response
(Sender depends on receiver)
Lookup
Sender
Receiver
Request/
Response
// from sender
receiver = lookup()
...
Isolation
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Driven
Location
Transparency
Complete
Parameter
Checki...
Stateless

•  Supports location transparency (amongst other patterns)
•  Service relocation is hard with state
•  Service ...
Isolation
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Driven
Location
Transparency
Stateless
Complete
Parame...
Relaxed Temporal Constraints

•  Strict consistency requires tight coupling of the involved nodes
•  Any single failure im...
Isolation
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Driven
Relaxed
Temporal
Constraints
Location
Transpare...
Idempotency

•  Non-idempotency is complicated to handle in distributed systems
•  (Usually) increases coupling between pa...
Unique request token (schematic)
// Client/Sender part
// Create request with unique request token (e.g., via UUID)
token ...
Isolation
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Driven
Idempotency
Relaxed
Temporal
Constraints
Locati...
Self-Containment

•  Services are self-contained deployment units
•  No dependencies to other runtime infrastructure compo...
Use a framework …
Spring Boot
Dropwizard
Jackson
…
Metrics
… or do it yourself
Isolation
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Driven
Idempotency
Self-Containment
Relaxed
Temporal
C...
Latency control

•  Complements isolation
•  Detection and handling of non-timely responses
•  Avoid cascading temporal fa...
Isolation
Latency Control
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Driven
Idempotency
Self-Containment
Re...
Timeouts

•  Preserve responsiveness independent of downstream latency
•  Measure response time of downstream calls
•  Sto...
Timeouts with standard library means
// Wrap blocking action in a Callable
Callable<MyActionResult> myAction = <My Blockin...
Isolation
Latency Control
Timeouts
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Driven
Idempotency
Self-Conta...
Circuit Breaker

•  Probably most often cited resilience pattern
•  Extension of the timeout pattern
•  Takes downstream u...
// Hystrix “Hello world”
public class HelloCommand extends HystrixCommand<String> {
private static final String COMMAND_GR...
Source: https://github.com/Netflix/Hystrix/wiki/How-it-Works
Isolation
Latency Control
Circuit Breaker
Timeouts
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Driven
Idempo...
Fail Fast

•  “If you know you’re going to fail, you better fail fast”
•  Avoid foreseeable failures
•  Usually implemente...
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Dri...
Fan out & quickest reply

•  Send request to multiple workers
•  Use quickest reply and discard all other responses
•  Red...
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Bulkheads
Loose Coupling
Asynchronous
Communication
Event-Dri...
Bounded Queues

•  Limit request queue sizes in front of highly utilized resources
•  Avoids latency due to overloaded res...
Bounded Queue Example
// Executor service runs with up to 6 worker threads simultaneously
// When thread pool is exhausted...
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out &
quickest reply
Bounded Queues
Bulkheads
Loose Coupl...
Shed Load

•  Upstream isolation pattern
•  Avoid becoming overloaded due to too many requests
•  Install a gatekeeper in ...
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out &
quickest reply
Bounded Queues
Shed Load
Bulkheads
L...
Supervision

•  Provides failure handling beyond the means of a single failure unit
•  Detect unit failures
•  Provide mea...
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out &
quickest reply
Bounded Queues
Shed Load
Bulkheads
L...
Monitor

•  Observe unit behavior and interactions from the outside
•  Automatically respond to detected failures
•  Part ...
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out &
quickest reply
Bounded Queues
Shed Load
Bulkheads
L...
Error Handler

•  Units often don’t have enough time or information to handle errors
•  Separate business logic and error ...
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out &
quickest reply
Bounded Queues
Shed Load
Bulkheads
L...
Escalation

•  Units often don’t have enough time or information to handle errors
•  Escalation peer with more time and in...
Escalation implementation
using Worker/Supervisor
W
Flow / Process
W
 W
W
 W
 W
 W
W
S
 S
 S
S
S
Escalation
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out &
quickest reply
Bounded Queues
Shed Load
Bulkheads
L...
… and there is more


•  Recovery & mitigation patterns
•  More supervision patterns
•  Architectural patterns
•  Anti-fra...
Wrap-up

•  Today’s systems are distributed ...
•  … and it’s getting “worse”
•  Failures are the normal case
•  Failures ...
Do not avoid failures. Embrace them!
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
Patterns of resilience
Patterns of resilience
Nächste SlideShare
Wird geladen in …5
×

Patterns of resilience

21.712 Aufrufe

Veröffentlicht am

In this slide deck, I first describe what resilience is, what it is about, why it is important and how it is different from traditional stability approaches.

After that introductory part the main part is a "small" pattern language which is organized around isolation, the typical starting point of resilient software design. I used quotation marks for "small" as even this subset of a complete resilience pattern language still consists of around 20 patterns.

All the patterns are briefly described and for some of the patterns I added a bit of detail, but as this is a slide deck, the voice track - as usual - is missing. Also this pattern language is still sort of work in progress, i.e., it has not yet settled and some details are still missing. Yet I think (or at least hope), that the slides might contain a few useful insights for you.

Veröffentlicht in: Software

Patterns of resilience

  1. 1. Patterns of Resilience A small pattern language Uwe Friedrichsen – codecentric AG – 2014-2016
  2. 2. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
  3. 3. Resilience? Never heard of it …
  4. 4. re•sil•ience (rɪˈzɪl yəns) also re•sil′ien•cy, n. 1.  the power or ability to return to the original form, position, etc., after being bent, compressed, or stretched; elasticity. 2.  ability to recover readily from illness, depression, adversity, or the like; buoyancy. Random House Kernerman Webster's College Dictionary, © 2010 K Dictionaries Ltd. Copyright 2005, 1997, 1991 by Random House, Inc. All rights reserved. http://www.thefreedictionary.com/resilience
  5. 5. What’s all the fuss about?
  6. 6. It‘s all about production!
  7. 7. Business Production Availability
  8. 8. Availability ≔ MTTF MTTF + MTTR MTTF: Mean Time To Failure MTTR: Mean Time To Recovery
  9. 9. How can I maximize availability?
  10. 10. Traditional stability approach Availability ≔ MTTF MTTF + MTTR Maximize MTTF
  11. 11. reliability degree to which a system, product or component performs specified functions under specified conditions for a specified period of time ISO/IEC 25010:2011(en) https://www.iso.org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en Underlying assumption
  12. 12. What’s the problem?
  13. 13. (Almost) every system is a distributed system Chas Emerick
  14. 14. The Eight Fallacies of Distributed Computing 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous Peter Deutsch https://blogs.oracle.com/jag/resource/Fallacies.html
  15. 15. A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. Leslie Lamport
  16. 16. Failures in todays complex, distributed and interconnected systems are not the exception. •  They are the normal case •  They are not predictable
  17. 17. … and it’s getting “worse” •  Cloud-based systems •  Microservices •  Zero Downtime •  IoT & Mobile •  Social à Ever-increasing complexity and connectivity
  18. 18. Do not try to avoid failures. Embrace them.
  19. 19. Resilience approach Availability ≔ MTTF MTTF + MTTR Minimize MTTR
  20. 20. resilience (IT) the ability of a system to handle unexpected situations -  without the user noticing it (best case) -  with a graceful degradation of service (worst case)
  21. 21. Designing for resilience A small pattern language
  22. 22. Isolation
  23. 23. Isolation •  System must not fail as a whole •  Split system in parts and isolate parts against each other •  Avoid cascading failures •  Requires set of measures to implement
  24. 24. Isolation Bulkheads
  25. 25. Bulkheads •  Core isolation pattern •  a.k.a. “failure units” or “units of mitigation” •  Used as units of redundancy (and thus, also as units of scalability) •  Pure design issue
  26. 26. Isolation Bulkheads Complete Parameter Checking
  27. 27. Complete Parameter Checking •  As obvious as it sounds, yet often neglected •  Protection from broken/malicious calls (and return values) •  Pay attention to Postel’s law •  Consider specific data types
  28. 28. Complete Parameter Checking // How to design request parameters // Worst variant – requires tons of checks String buySomething(Map<String, String> params); // Still a bad variant – still a lot of checks required String buySomething(String customerId, String productId, int count); // Much better – only null checks required PurchaseStatus buySomething(Customer buyer, Article product, Quantity count);
  29. 29. Isolation Bulkheads Complete Parameter Checking Loose Coupling
  30. 30. Loose Coupling •  Complements isolation •  Reduce coupling between failure units •  Avoid cascading failures •  Different approaches and patterns available
  31. 31. Isolation Bulkheads Loose Coupling Complete Parameter Checking Asynchronous Communication
  32. 32. Asynchronous Communication •  Decouples sender from receiver •  Sender does not need to wait for receiver’s response •  Useful to prevent cascading failures due to failing/latent resources •  Breaks up the call stack paradigm
  33. 33. Isolation Bulkheads Loose Coupling Asynchronous Communication Complete Parameter Checking Location Transparency
  34. 34. Location Transparency •  Decouples sender from receiver •  Sender does not need to know receiver’s concrete location •  Useful to implement redundancy and failover transparently •  Usually implemented using dispatchers or mappers
  35. 35. Isolation Bulkheads Loose Coupling Asynchronous Communication Location Transparency Complete Parameter Checking Event-Driven
  36. 36. Event-Driven •  Popular asynchronous communication style •  Without broker location dependency is reversed •  With broker location transparency is easily achieved •  Very different from request-response paradigm
  37. 37. Request/response (Sender depends on receiver) Lookup Sender Receiver Request/ Response // from sender receiver = lookup() // from sender result = receiver.call() Event-driven without broker (Receiver depends on sender) // from sender queue.send(msg) // from receiver queue = sender.subscribe() msg = queue.receive() Subscribe Sender Receiver Send Receive Event-driven with broker (Sender and receiver decoupled) // from sender broker = lookup() broker.send(msg) // from receiver queue = broker.subscribe() msg = queue.receive() Subscribe Sender Receiver Send Broker Receive Lookup
  38. 38. Isolation Bulkheads Loose Coupling Asynchronous Communication Event-Driven Location Transparency Complete Parameter Checking Stateless
  39. 39. Stateless •  Supports location transparency (amongst other patterns) •  Service relocation is hard with state •  Service failover is hard with state •  Very fundamental resilience and scalability pattern
  40. 40. Isolation Bulkheads Loose Coupling Asynchronous Communication Event-Driven Location Transparency Stateless Complete Parameter Checking Relaxed Temporal Constraints
  41. 41. Relaxed Temporal Constraints •  Strict consistency requires tight coupling of the involved nodes •  Any single failure immediately compromises availability •  Use a more relaxed consistency model to reduce coupling •  The real world is not ACID, it is BASE (at best)!
  42. 42. Isolation Bulkheads Loose Coupling Asynchronous Communication Event-Driven Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Idempotency
  43. 43. Idempotency •  Non-idempotency is complicated to handle in distributed systems •  (Usually) increases coupling between participating parties •  Use idempotent actions to reduce coupling between nodes •  Very fundamental resilience and scalability pattern
  44. 44. Unique request token (schematic) // Client/Sender part // Create request with unique request token (e.g., via UUID) token = createUniqueToken() request = createRequest(token, payload) // Send request until successful while (!successful) send(request, timeout) // Do not forget failure handling // Server/Receiver part // Receive request request = receive() // Process request only if token is unknown if (!lookup(request.token)) // needs to implemented in a CAS way to be safe process(request) store(token) // Store token for lookup (can be garbage collected eventually)
  45. 45. Isolation Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Self-Containment
  46. 46. Self-Containment •  Services are self-contained deployment units •  No dependencies to other runtime infrastructure components •  Reduces coupling at deployment time •  Improves isolation and flexibility
  47. 47. Use a framework … Spring Boot Dropwizard Jackson … Metrics … or do it yourself
  48. 48. Isolation Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Latency Control
  49. 49. Latency control •  Complements isolation •  Detection and handling of non-timely responses •  Avoid cascading temporal failures •  Different approaches and patterns available
  50. 50. Isolation Latency Control Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Timeouts
  51. 51. Timeouts •  Preserve responsiveness independent of downstream latency •  Measure response time of downstream calls •  Stop waiting after a pre-determined timeout •  Take alternate action if timeout was reached
  52. 52. Timeouts with standard library means // Wrap blocking action in a Callable Callable<MyActionResult> myAction = <My Blocking Action> // Use a simple ExecutorService to run the action in its own thread ExecutorService executor = Executors.newSingleThreadExecutor(); Future<MyActionResult> future = executor.submit(myAction); MyActionResult result = null; // Use Future.get() method to limit time to wait for completion try { result = future.get(TIMEOUT, TIMEUNIT); // Action completed in a timely manner – process results } catch (TimeoutException e) { // Handle timeout (e.g., schedule retry, escalate, alternate action, …) } catch (...) { // Handle other exceptions that can be thrown by Future.get() } finally { // Make sure the callable is stopped even in case of a timeout future.cancel(true); }
  53. 53. Isolation Latency Control Timeouts Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Circuit Breaker
  54. 54. Circuit Breaker •  Probably most often cited resilience pattern •  Extension of the timeout pattern •  Takes downstream unit offline if calls fail multiple times •  Specific variant of the fail fast pattern
  55. 55. // Hystrix “Hello world” public class HelloCommand extends HystrixCommand<String> { private static final String COMMAND_GROUP = ”Hello”; // Not important here private final String name; // Request parameters are passed in as constructor parameters public HelloCommand(String name) { super(HystrixCommandGroupKey.Factory.asKey(COMMAND_GROUP)); this.name = name; } @Override protected String run() throws Exception { // Usually here would be the resource call that needs to be guarded return "Hello, " + name; } } // Usage of a Hystrix command – synchronous variant @Test public void shouldGreetWorld() { String result = new HelloCommand("World").execute(); assertEquals("Hello, World", result); }
  56. 56. Source: https://github.com/Netflix/Hystrix/wiki/How-it-Works
  57. 57. Isolation Latency Control Circuit Breaker Timeouts Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Fail Fast
  58. 58. Fail Fast •  “If you know you’re going to fail, you better fail fast” •  Avoid foreseeable failures •  Usually implemented by adding checks in front of costly actions •  Enhances probability of not failing
  59. 59. Isolation Latency Control Fail Fast Circuit Breaker Timeouts Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Fan out & quickest reply
  60. 60. Fan out & quickest reply •  Send request to multiple workers •  Use quickest reply and discard all other responses •  Reduces probability of latent responses •  Tradeoff is “waste” of resources
  61. 61. Isolation Latency Control Fail Fast Circuit Breaker Timeouts Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Bounded Queues Fan out & quickest reply
  62. 62. Bounded Queues •  Limit request queue sizes in front of highly utilized resources •  Avoids latency due to overloaded resources •  Introduces pushback on the callers •  Another variant of the fail fast pattern
  63. 63. Bounded Queue Example // Executor service runs with up to 6 worker threads simultaneously // When thread pool is exhausted, up to 4 tasks will be queued - // additional tasks are rejected triggering the PushbackHandler final int POOL_SIZE = 6; final int QUEUE_SIZE = 4; // Set up a thread pool executor with a bounded queue and a PushbackHandler ExecutorService executor = new ThreadPoolExecutor(POOL_SIZE, POOL_SIZE, // Core pool size, max pool size 0, TimeUnit.SECONDS, // Timeout for unused threads new ArrayBlockingQueue(QUEUE_SIZE), new PushbackHandler); // PushbackHandler - implements the desired pushback behavior public class PushbackHandler implements RejectedExecutionHandler { @Override public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) { // Implement your pushback behavior here } }
  64. 64. Isolation Latency Control Fail Fast Circuit Breaker Timeouts Fan out & quickest reply Bounded Queues Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Shed Load
  65. 65. Shed Load •  Upstream isolation pattern •  Avoid becoming overloaded due to too many requests •  Install a gatekeeper in front of the resource •  Shed requests based on resource load
  66. 66. Isolation Latency Control Fail Fast Circuit Breaker Timeouts Fan out & quickest reply Bounded Queues Shed Load Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Complete Parameter Checking Supervision
  67. 67. Supervision •  Provides failure handling beyond the means of a single failure unit •  Detect unit failures •  Provide means for error escalation •  Different approaches and patterns available
  68. 68. Isolation Latency Control Fail Fast Circuit Breaker Timeouts Fan out & quickest reply Bounded Queues Shed Load Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Supervision Complete Parameter Checking Monitor
  69. 69. Monitor •  Observe unit behavior and interactions from the outside •  Automatically respond to detected failures •  Part of the system – complex failure handling strategies possible •  Outside the system – more robust against system level failures
  70. 70. Isolation Latency Control Fail Fast Circuit Breaker Timeouts Fan out & quickest reply Bounded Queues Shed Load Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Supervision Monitor Complete Parameter Checking Error Handler
  71. 71. Error Handler •  Units often don’t have enough time or information to handle errors •  Separate business logic and error handling •  Business logic just focuses on getting the task done (quickly) •  Error handler has sufficient time and information to handle errors
  72. 72. Isolation Latency Control Fail Fast Circuit Breaker Timeouts Fan out & quickest reply Bounded Queues Shed Load Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Supervision Monitor Error Handler Complete Parameter Checking Escalation
  73. 73. Escalation •  Units often don’t have enough time or information to handle errors •  Escalation peer with more time and information needed •  Often multi-level hierarchies •  Pure design issue
  74. 74. Escalation implementation using Worker/Supervisor W Flow / Process W W W W W W W S S S S S Escalation
  75. 75. Isolation Latency Control Fail Fast Circuit Breaker Timeouts Fan out & quickest reply Bounded Queues Shed Load Bulkheads Loose Coupling Asynchronous Communication Event-Driven Idempotency Self-Containment Relaxed Temporal Constraints Location Transparency Stateless Supervision Monitor Complete Parameter Checking Error Handler Escalation
  76. 76. … and there is more •  Recovery & mitigation patterns •  More supervision patterns •  Architectural patterns •  Anti-fragility patterns •  Fault treatment & prevention patterns A rich pattern family
  77. 77. Wrap-up •  Today’s systems are distributed ... •  … and it’s getting “worse” •  Failures are the normal case •  Failures are not predictable •  Resilient software design needed •  Rich pattern language •  Isolation is a good starting point
  78. 78. Do not avoid failures. Embrace them!
  79. 79. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com

×