Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Going Resilient...

448 Aufrufe

Veröffentlicht am

Why resilience design and how this fits with commands and tasks...

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Going Resilient...

  1. 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Going Resilient... Building blocks of failure-tolerant systems Anatole Tresch, Principal Consultant
  2. 2. Agenda Going Resilient…2 19.10.16  Motivation  Complex Systems  Resilient Design  Resilience in Java  Summary
  3. 3. Bio Going Resilient…3 19.10.16 Anatole Tresch • Principal Consultant • Star Spec Lead JSR 354 Money & Currency • Technical Architect, Lead Engineer • PPMC Member Apache Tamaya • Twitter/Google+: @atsticks • anatole@apache.org • anatole.tresch@trivadis.com
  4. 4. Going Resilient…4 19.10.16 Motivation
  5. 5. Going Resilient…5 19.10.16 “But it ain't how hard you hit; it's about how hard you can get hit, and keep moving forward. How much you can take, and keep moving forward. That's how winning is done. “ Rocky Balboa https://www.youtube.com/watch?v=vJHkTtvnUqA Resilience/Resiliency is ...
  6. 6. Resilience/Resiliency is ... Going Resilient…6 19.10.16 1) the power or ability to return to the original form, position, etc., after being bent, compressed, or stretched; elasticity. 2) ability to recover readily from illness, depression, adversity, or the like; buoyancy. Random House Kernerman Webster's College Dictionary, © 2010 K Dictionaries Ltd. Copyright 2005, 1997, 1991 by Random House, Inc. All rights reserved. http://www.thefreedictionary.com/resilience
  7. 7. Why should I care ? Going Resilient…7 19.10.16
  8. 8. Going Resilient…8 19.10.16 Well...
  9. 9. Going Resilient…9 19.10.16 It‘s all about successful production!
  10. 10. Going Resilient…10 19.10.16 Business Software Availability OpsDev
  11. 11. Going Resilient…11 19.10.16 MTTF Availability = MTTF + MTTR Given MTTF = Mean Time To Failure, and MTTR = Mean Time To Recovery
  12. 12. How can availability maximized ? Going Resilient…12 19.10.16
  13. 13. Going Resilient…13 19.10.16 MTTF Availability = MTTF + MTTR Tradional Approach: Maximize MTTF
  14. 14. This worked for a long time… (a quick look at computing history...) Going Resilient…14 19.10.16
  15. 15. We started relatively primitive... Going Resilient…15 19.10.16
  16. 16. Going Resilient…16 19.10.16
  17. 17. We used mechanics... Going Resilient…17 19.10.16
  18. 18. Going Resilient…18 19.10.16
  19. 19. And then electricity... Going Resilient…19 19.10.16
  20. 20. Going Resilient…20 19.10.16
  21. 21. We started to connect machines... Going Resilient…21 19.10.16
  22. 22. Going Resilient…22 19.10.16
  23. 23. ...until... Going Resilient…23 19.10.16
  24. 24. ...today... Going Resilient…24 19.10.16
  25. 25. We started with one single computer... Going Resilient…25 19.10.16
  26. 26. Then we added more… ...and connected them... Going Resilient…26 19.10.16
  27. 27. Number of computers increased... Going Resilient…27 19.10.16
  28. 28. Going Resilient…28 6.09.16
  29. 29. But sometimes computers break... Going Resilient…29 19.10.16
  30. 30. Going Resilient…30 6.09.16
  31. 31. ... sometimes the break as many... Going Resilient…31 19.10.16
  32. 32. Going Resilient…32 6.09.16
  33. 33. ... sometimes you even dont know... Going Resilient…33 19.10.16
  34. 34. Going Resilient…34 6.09.16
  35. 35. And on top of that ... Going Resilient…35 19.10.16
  36. 36. ...we virtualized machines... Going Resilient…36 19.10.16
  37. 37. Going Resilient…37 6.09.16
  38. 38. ...we added microservices, IoT, ... Going Resilient…38 19.10.16
  39. 39. Going Resilient…39 6.09.16
  40. 40. ...and connected all together... Going Resilient…40 19.10.16
  41. 41. Going Resilient…41 19.10.16
  42. 42. Moving Targets ! Going Resilient…42 6.09.16
  43. 43. ...did I forget to mention... Going Resilient…43 19.10.16
  44. 44. THE NETWORK ! Going Resilient…44 6.09.16
  45. 45. Going Resilient…45 19.10.16 Complex Systems
  46. 46. Almost every system is a distributed system. Going Resilient…46 6.09.16 complex Chas Emerick
  47. 47. Complex systems... Going Resilient…47 19.10.16 «We can model and understand in isolation. But, when released into competitive nominally regulated societies, their connections proliferate, their interactions and interdependencies multiply, their complexities mushroom. And we are caught short.» Sidney Dekker
  48. 48. Do not try to avoid failures. Embrace them! Going Resilient…48 6.09.16
  49. 49. ... Going Resilient…49 6.09.16
  50. 50. Sounds easy ? Going Resilient…50 6.09.16
  51. 51. Unfortunately it‘s not... Going Resilient…51 6.09.16
  52. 52. Going Resilient…52 19.10.16 Resilient Design
  53. 53. Going Resilient…53 19.10.16 MTTF Availability = MTTF + MTTR Resilient Approach: Minimize MTTR
  54. 54. Let‘s start with some basics... Going Resilient…54 6.09.16
  55. 55. A command is a task is a ... Going Resilient…55 19.10.16 Command Input Output Error
  56. 56. Commands can be connected... Going Resilient…56 19.10.16 Command Input Output | Input Error Output Error Command
  57. 57. Command So where „resilience“ must be added? Going Resilient…57 19.10.16 Command Input Command Command ● Isolation ● Decouple communications with events ● Flow Control Manage state
  58. 58. Asynchronous communications, the simple way... Going Resilient…58 19.10.16 Command Input Error Output Command Error InputOutput Bulkhead
  59. 59. Now let‘s apply the model to distributed systems... Going Resilient…59 6.09.16
  60. 60. Event-Driven communications Going Resilient…60 19.10.16 Component Input Error Output ComponentError InputOutput Bulkhead Queue Queue
  61. 61. Additional functionality needed... Going Resilient…61 19.10.16 Component Input Output Component InputOutput Queue Error Queue Error Handler Latency Control Monitor ing Supervision Escalation Error Location Transparency Location Transparency
  62. 62. Latency Control Going Resilient…62 19.10.16 ● Bounded Queues ● Fan out & quickest reply ● Circuit Breakers and Fail Fast ● Timeouts ● Throttling, Semaphores ● Failover ● Degration of service level
  63. 63. Managing shared state: Quorums Going Resilient…63 19.10.16  Ensure decision can be taken at any time  Even number of voters, num >= 3
  64. 64. Kernel Based Architecture Going Resilient…64 19.10.16 Structure systems like onions in layers: • State & failure management in layers • „Kernel“ holds and protects the critical state • Kernel is engaged always through layers of protection
  65. 65. Rounding up... Going Resilient…65 19.10.16 Resilient Systems in IT require  Asynchronous Communications  Idempotent, self-containing events  Location Transparency  Isolation & Recursive Restartability  Complete, unified Input and Output Validation  Common Error Handling and Monitoring  Supervision  Minimal shared state, Redundancy
  66. 66. Going Resilient…66 19.10.16 We require decoupling in time and space !
  67. 67. Going Resilient…67 19.10.16 Resilience in Java
  68. 68. Going Resilient…68 19.10.16 Resilience is a design task. Later improvements mostly focus on latency control.
  69. 69. Going Resilient…69 19.10.16 Example: Let‘s call a service...
  70. 70. Going Resilient…70 19.10.16 @Inject private AService service; public void myMethod(String input){    try{      String result = service.call(service);      // do something with the result    }catch(Exception e){      throw new IllegalStateException(„Server error“, e);    } } Simple Example
  71. 71. Going Resilient…71 19.10.16 public void myMethod(Input input){ Future<String> resultFuture = executor.submit(()->{service.call(input);}); try{ String result = resultFuture.get(4000L, TimeUnit.MILLISECONDS); // do something with the result }catch(Exception e){ throw new IllegalStateException(„Server error“, e); } } Executor Example, using a timeout
  72. 72. Going Resilient…72 19.10.16 https://github.com/Netflix/Hystrix
  73. 73. Going Resilient…73 19.10.16 Using Hystrix „standalone“...
  74. 74. Going Resilient…74 19.10.16 Hystrix – Basic Use  It wraps your code  Adds resilient features Synchronous Asynchronous Reactive use
  75. 75. Going Resilient…75 19.10.16 Hystrix – Fallback
  76. 76. Going Resilient…76 19.10.16 Hystrix – Fallback, cascading
  77. 77. Going Resilient…77 19.10.16 Hystrix – Timeout public class TimeoutCommand extends HystrixCommand<String> { private final Callable<String> task; public TimeoutCommand(int millis, Callable<String> task) { super(Setter.withGroupKey( HystrixCommandGroupKey.Factory.asKey(COMMAND_GROUP)) .andCommandPropertiesDefaults( HystrixCommandProperties.Setter() .withExecutionIsolationThreadTimeoutInMilliseconds( timeout))); this.task = task } @Override protected String run() throws Exception { return task.call(); } @Override protected String getFallback() { return "Resource timed out"; } }
  78. 78. Going Resilient…78 19.10.16 Hystrix – Dashboard
  79. 79. Going Resilient…79 19.10.16 Using Hystrix with Spring Boot… http://cloud.spring.io/spring-cloud-netflix
  80. 80. Going Resilient…80 19.10.16 Spring Boot with Hystrix – CircuitBreaker @SpringBootApplication @EnableCircuitBreaker public class Application { public static void main(String[] args) { new SpringApplicationBuilder(Application.class) .web(true).run(args); } } @Component public class StoreIntegration { @HystrixCommand(fallbackMethod = "defaultStores") public Object getStores(Map<String, Object> parameters) { //do stuff that might fail } public Object defaultStores(M return "something useful"; } }
  81. 81. Going Resilient…81 19.10.16 Using Akka… http://akka.io/
  82. 82. Going Resilient…82 19.10.16 Akka – A simple Agent public class HelloWorld extends UntypedActor { @Override public void preStart() { final ActorRef greeter = getContext().actorOf( Props.create(Greeter.class), "greeter"); greeter.tell(Greeter.Msg.GREET, getSelf()); } @Override public void onReceive(Object msg) { if (msg == Greeter.Msg.DONE) { // when the greeter is done, stop this actor and with it the application getContext().stop(getSelf()); } else { unhandled(msg); } } }
  83. 83. Going Resilient…83 19.10.16 Akka – Supervision Props.create(Greeter.class), "greeter"); Props supervisorProps = BackoffSupervisor.props( Backoff.onStop( childProps, "myEcho", Duration.create(3, TimeUnit.SECONDS), Duration.create(30, TimeUnit.SECONDS), 0.2)); // adds 20% "noise" to // vary the intervals slightly system.actorOf( supervisorProps, "echoSupervisor");
  84. 84. Going Resilient…84 19.10.16 Akka – Fault Tolerance private static SupervisorStrategy strategy = new OneForOneStrategy(10, Duration.create("1 minute"), DeciderBuilder. match(ArithmeticException.class, e -> resume()). match(NullPointerException.class, e -> restart()). match(IllegalArgumentException.class, e -> stop()). matchAny(o -> escalate()).build()); @Override public SupervisorStrategy supervisorStrategy() { return strategy; }
  85. 85. Going Resilient…85 19.10.16 Vertx http://vertx.io/
  86. 86. Going Resilient…86 19.10.16 Vertx – Circuit Breaker CircuitBreaker breaker = CircuitBreaker.create("my-circuit-breaker", vertx, new CircuitBreakerOptions().setMaxFailures(5).setTimeout(2000)); breaker.<String>execute(future -> { vertx.createHttpClient().getNow(8080, "localhost", "/", response -> { if (response.statusCode() != 200) { future.fail("HTTP error"); } else { response .exceptionHandler(future::fail) .bodyHandler(buffer -> { future.complete(buffer.toString()); }); } }); }).setHandler(ar -> { // Do something with the result });
  87. 87. Going Resilient…87 19.10.16 Summary
  88. 88. Summmary Going Resilient…88 19.10.16 Resilience Software Design  ...is a must!  ...is achievable  ...is well supported by frameworks such as Hystrix and Akka  The patterns used used are ubiquious for all kind of distributed systems  ...fits naturally with microservices
  89. 89. THANK YOU ! Going Resilient…89 19.10.16
  90. 90. Going Resilient…90 19.10.16 Going Resilient... • Hystrix Wiki,https://github.com/Netflix/Hystrix/wiki • Jonas Boner: Resilience is by design: http://virtualjug.com/resilience-is-by-design/ • R.Cook, J. Rasmussen: “Going solid”: a model of system dynamics and consequences for patient safety • Reinette Biggs et al.: Applying Resilient Thinking: Toward Resilient Architectures • Michael Mehaffy, Nikos A. Salingaros : 1 - Biology Lessons: • Richard Cook : How complex systems fail • http://www.mindsetonline.com/ • George Candea, Amando Fox : Crash Only Software • George Candea, Amando Fox : Turning the Crash-Only Pattern from a Slash-Hammer to a Scalpell • Michael T. Nygard, Release It!,Pragmatic Bookshelf, 2007 • Robert S. Hanmer, Patterns for Fault Tolerant Software, Wiley, 2007 • Andrew Tanenbaum, Marten van Steen, Distributed Systems – Principles and Paradigms, Prentice Hall, 2nd Edition, 2006 • Uwe Friedrichsen, Slideshare: http://de.slideshare.net/ufried
  91. 91. Going Resilient... Anatole Tresch Principal Consultant Tel. +41 58 459 53 93 anatole.tresch@trivadis.com 19.10.16 Going Resilient…91

×