SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
Fault tolerance made easy
Patterns for fault tolerance implemented surprisingly easy

Uwe Friedrichsen, codecentric AG, 2013-2014
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
It‘s all about production!
Production
Availability
Resilience
Fault Tolerance
Your web server doesn‘t look good 

Pattern #1

Timeouts
Timeouts (1)
// Basics
myObject.wait(); // Do not use this by default
myObject.wait(TIMEOUT); // Better use this
// Some more basics
myThread.join(); // Do not use this by default
myThread.join(TIMEOUT); // Better use this
Timeouts (2)
// Using the Java concurrent library
Callable<MyActionResult> myAction = <My Blocking Action>
ExecutorService executor = Executors.newSingleThreadExecutor();
Future<MyActionResult> future = executor.submit(myAction);
MyActionResult result = null;
try {
result = future.get(); // Do not use this by default
result = future.get(TIMEOUT, TIMEUNIT); // Better use this
} catch (TimeoutException e) { // Only thrown if timeouts are used
...
} catch (...) {
...
}
Timeouts (3)
// Using Guava SimpleTimeLimiter
Callable<MyActionResult> myAction = <My Blocking Action>
SimpleTimeLimiter limiter = new SimpleTimeLimiter();
MyActionResult result = null;
try {
result =
limiter.callWithTimeout(myAction, TIMEOUT, TIMEUNIT, false);
} catch (UncheckedTimeoutException e) {
...
} catch (...) {
...
}
Determining Timeout Duration

Configurable Timeouts

Self-Adapting Timeouts

Timeouts in JavaEE Containers
Pattern #2

Circuit Breaker
Circuit Breaker (1)
Client
 Resource
Circuit Breaker
Request
Resource unavailable
Resource available
Closed
 Open
Half-Open
Lifecycle
Circuit Breaker (2)
Closed

on call / pass through
call succeeds / reset count
call fails / count failure
threshold reached / trip breaker
Open

on call / fail
on timeout / attempt reset
trip breaker
Half-Open

on call / pass through
call succeeds / reset
call fails / trip breaker
trip breaker
 attempt reset
reset
Source: M. Nygard, „Release It!“
Circuit Breaker (3)
public class CircuitBreaker implements MyResource {
public enum State { CLOSED, OPEN, HALF_OPEN }
final MyResource resource;
State state;
int counter;
long tripTime;
public CircuitBreaker(MyResource r) {
resource = r;
state = CLOSED;
counter = 0;
tripTime = 0L;
}
...
Circuit Breaker (4)
...
public Result access(...) { // resource access
Result r = null;
if (state == OPEN) {
checkTimeout();
throw new ResourceUnavailableException();
}
try {
r = r.access(...); // should use timeout
} catch (Exception e) {
fail();
throw e;
}
success();
return r;
}
...
Circuit Breaker (5)
...
private void success() {
reset();
}
private void fail() {
counter++;
if (counter > THRESHOLD) {
tripBreaker();
}
}
private void reset() {
state = CLOSED;
counter = 0;
}
...
Circuit Breaker (6)
...
private void tripBreaker() {
state = OPEN;
tripTime = System.currentTimeMillis();
}
private void checkTimeout() {
if ((System.currentTimeMillis - tripTime) > TIMEOUT) {
state = HALF_OPEN;
counter = THRESHOLD;
}
}
public State getState()
return state;
}
}
Thread-Safe Circuit Breaker

Failure Types

Tuning Circuit Breakers

Available Implementations
Pattern #3

Fail Fast
Fail Fast (1)
Client
 Resources
Expensive Action
Request
Uses
Fail Fast (2)
Client
 Resources
Expensive Action
Request
Fail Fast Guard
Uses
Check availability
Forward
Fail Fast (3)
public class FailFastGuard {
private FailFastGuard() {}
public static void checkResources(Set<CircuitBreaker> resources) {
for (CircuitBreaker r : resources) {
if (r.getState() != CircuitBreaker.CLOSED) {
throw new ResourceUnavailableException(r);
}
}
}
}
Fail Fast (4)
public class MyService {
Set<CircuitBreaker> requiredResources;
// Initialize resources
...
public Result myExpensiveAction(...) {
FailFastGuard.checkResources(requiredResources);
// Execute core action
...
}
}
The dreaded SiteTooSuccessfulException 

Pattern #4

Shed Load
Shed Load (1)
Clients
 Server
Too many Requests
Shed Load (2)
Server
Too many Requests
Gate Keeper
Monitor
Requests
Request Load Data
 Monitor Load
Shedded Requests
Clients
Shed Load (3)
public class ShedLoadFilter implements Filter {
Random random;
public void init(FilterConfig fc) throws ServletException {
random = new Random(System.currentTimeMillis());
}
public void destroy() {
random = null;
}
...
Shed Load (4)
...
public void doFilter(ServletRequest request,
ServletResponse response,
FilterChain chain)
throws java.io.IOException, ServletException {
int load = getLoad();
if (shouldShed(load)) {
HttpServletResponse res = (HttpServletResponse)response;
res.setIntHeader("Retry-After", RECOMMENDATION);
res.sendError(HttpServletResponse.SC_SERVICE_UNAVAILABLE);
return;
}
chain.doFilter(request, response);
}
...
Shed Load (5)
...
private boolean shouldShed(int load) { // Example implementation
if (load < THRESHOLD) {
return false;
}
double shedBoundary =
((double)(load - THRESHOLD))/
((double)(MAX_LOAD - THRESHOLD));
return random.nextDouble() < shedBoundary;
}
}
Shed Load (6)
Shed Load (7)
Shedding Strategy

Retrieving Load

Tuning Load Shedders

Alternative Strategies
Pattern #5

Deferrable Work
Deferrable Work (1)
Client
Requests
Request Processing
Resources
Use
Routine Work
Use
OVERLOAD
Deferrable Work (2)
Without‹
Deferrable Work
100%
OVERLOAD
With‹
Deferrable Work
100%
Request Processing
Routine Work
// Do or wait variant
ProcessingState state = initBatch();
while(!state.done()) {
int load = getLoad();
if (load > THRESHOLD) {
waitFixedDuration();
} else {
state = processNext(state);
}
}
void waitFixedDuration() {
Thread.sleep(DELAY); // try-catch left out for better readability
}
Deferrable Work (3)
// Adaptive load variant
ProcessingState state = initBatch();
while(!state.done()) {
waitLoadBased();
state = processNext(state);
}
void waitLoadBased() {
int load = getLoad();
long delay = calcDelay(load);
Thread.sleep(delay); // try-catch left out for better readability
}
long calcDelay(int load) { // Simple example implementation
if (load < THRESHOLD) {
return 0L;
}
return (load – THRESHOLD) * DELAY_FACTOR;
}
Deferrable Work (4)
Delay Strategy

Retrieving Load

Tuning Deferrable Work
I can hardly hear you 

Pattern #6

Leaky Bucket
Leaky Bucket (1)
Leaky Bucket
Fill
Problem
occured
Periodically
Leak
Error
Handling
Overflowed?
public class LeakyBucket { // Very simple implementation
final private int capacity;
private int level;
private boolean overflow;
public LeakyBucket(int capacity) {
this.capacity = capacity;
drain();
}
public void drain () {
this.level = 0;
this.overflow = false;
}
...
Leaky Bucket (2)
...
public void fill() {
level++;
if (level > capacity) {
overflow = true;
}
}
public void leak() {
level--;
if (level < 0) {
level = 0;
}
}
public boolean overflowed() {
return overflow;
}
}
Leaky Bucket (3)
Thread-Safe Leaky Bucket

Leaking strategies

Tuning Leaky Bucket

Available Implementations
Pattern #7

Limited Retries
// doAction returns true if successful, false otherwise
// General pattern
boolean success = false
int tries = 0;
while (!success && (tries < MAX_TRIES)) {
success = doAction(...);
tries++;
}
// Alternative one-retry-only variant
success = doAction(...) || doAction(...);
Limited Retries (1)
Idempotent Actions

Closures / Lambdas

Tuning Retries
More Patterns






‱  Complete Parameter Checking
‱  Marked Data
‱  Routine Audits
Further reading

1.  Michael T. Nygard, Release It!,
Pragmatic Bookshelf, 2007
2.  Robert S. Hanmer,‹
Patterns for Fault Tolerant Software,
Wiley, 2007
3.  James Hamilton, On Designing and
Deploying Internet-Scale Services,‹
21st LISA Conference 2007
4.  Andrew Tanenbaum, Marten van Steen,
Distributed Systems – Principles and
Paradigms,‹
Prentice Hall, 2nd Edition, 2006
It‘s all about production!
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
Fault tolerance made easy

Weitere Àhnliche Inhalte

Was ist angesagt?

Qunit Java script Un
Qunit Java script UnQunit Java script Un
Qunit Java script Un
akanksha arora
 
J unitá„‰á…łá„á…„á„ƒá…”á„‰á…łá†Żá„…á…Ąá„‹á…”á„ƒá…ł
J unitá„‰á…łá„á…„á„ƒá…”á„‰á…łá†Żá„…á…Ąá„‹á…”á„ƒá…łJ unitá„‰á…łá„á…„á„ƒá…”á„‰á…łá†Żá„…á…Ąá„‹á…”á„ƒá…ł
J unitá„‰á…łá„á…„á„ƒá…”á„‰á…łá†Żá„…á…Ąá„‹á…”á„ƒá…ł
ksain
 

Was ist angesagt? (20)

Load Testing with RedLine13: Or getting paid to DoS your own systems
Load Testing with RedLine13: Or getting paid to DoS your own systemsLoad Testing with RedLine13: Or getting paid to DoS your own systems
Load Testing with RedLine13: Or getting paid to DoS your own systems
 
Defcon_Oracle_The_Making_of_the_2nd_sql_injection_worm
Defcon_Oracle_The_Making_of_the_2nd_sql_injection_wormDefcon_Oracle_The_Making_of_the_2nd_sql_injection_worm
Defcon_Oracle_The_Making_of_the_2nd_sql_injection_worm
 
Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuseOaktable World 2014 Toon Koppelaars: database constraints polite excuse
Oaktable World 2014 Toon Koppelaars: database constraints polite excuse
 
LA Cassandra Day 2015 - Testing Cassandra
LA Cassandra Day 2015  - Testing CassandraLA Cassandra Day 2015  - Testing Cassandra
LA Cassandra Day 2015 - Testing Cassandra
 
Unit testing patterns for concurrent code
Unit testing patterns for concurrent codeUnit testing patterns for concurrent code
Unit testing patterns for concurrent code
 
Unit testing with Spock Framework
Unit testing with Spock FrameworkUnit testing with Spock Framework
Unit testing with Spock Framework
 
Docker and jvm. A good idea?
Docker and jvm. A good idea?Docker and jvm. A good idea?
Docker and jvm. A good idea?
 
Painless JavaScript Testing with Jest
Painless JavaScript Testing with JestPainless JavaScript Testing with Jest
Painless JavaScript Testing with Jest
 
Performance tests with Gatling (extended)
Performance tests with Gatling (extended)Performance tests with Gatling (extended)
Performance tests with Gatling (extended)
 
Connect2017 DEV-1550 Why Java 8? Or, What's a Lambda?
Connect2017 DEV-1550 Why Java 8? Or, What's a Lambda?Connect2017 DEV-1550 Why Java 8? Or, What's a Lambda?
Connect2017 DEV-1550 Why Java 8? Or, What's a Lambda?
 
Real world functional reactive programming
Real world functional reactive programmingReal world functional reactive programming
Real world functional reactive programming
 
Rules With Drools
Rules With DroolsRules With Drools
Rules With Drools
 
Stress test your backend with Gatling
Stress test your backend with GatlingStress test your backend with Gatling
Stress test your backend with Gatling
 
ScalaSwarm 2017 Keynote: Tough this be madness yet theres method in't
ScalaSwarm 2017 Keynote: Tough this be madness yet theres method in'tScalaSwarm 2017 Keynote: Tough this be madness yet theres method in't
ScalaSwarm 2017 Keynote: Tough this be madness yet theres method in't
 
Qunit Java script Un
Qunit Java script UnQunit Java script Un
Qunit Java script Un
 
Spock Framework
Spock FrameworkSpock Framework
Spock Framework
 
Unit testing in JavaScript with Jasmine and Karma
Unit testing in JavaScript with Jasmine and KarmaUnit testing in JavaScript with Jasmine and Karma
Unit testing in JavaScript with Jasmine and Karma
 
C++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing FrameworkC++ Unit Test with Google Testing Framework
C++ Unit Test with Google Testing Framework
 
From Elixir to Akka (and back) - ElixirConf Mx 2017
From Elixir to Akka (and back) - ElixirConf Mx 2017From Elixir to Akka (and back) - ElixirConf Mx 2017
From Elixir to Akka (and back) - ElixirConf Mx 2017
 
J unitá„‰á…łá„á…„á„ƒá…”á„‰á…łá†Żá„…á…Ąá„‹á…”á„ƒá…ł
J unitá„‰á…łá„á…„á„ƒá…”á„‰á…łá†Żá„…á…Ąá„‹á…”á„ƒá…łJ unitá„‰á…łá„á…„á„ƒá…”á„‰á…łá†Żá„…á…Ąá„‹á…”á„ƒá…ł
J unitá„‰á…łá„á…„á„ƒá…”á„‰á…łá†Żá„…á…Ąá„‹á…”á„ƒá…ł
 

Andere mochten auch

The promises and perils of microservices
The promises and perils of microservicesThe promises and perils of microservices
The promises and perils of microservices
Uwe Friedrichsen
 

Andere mochten auch (20)

How to survive in a BASE world
How to survive in a BASE worldHow to survive in a BASE world
How to survive in a BASE world
 
Dr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismDr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinism
 
Self healing data
Self healing dataSelf healing data
Self healing data
 
Devops for Developers
Devops for DevelopersDevops for Developers
Devops for Developers
 
Patterns of resilience
Patterns of resiliencePatterns of resilience
Patterns of resilience
 
Resilience reloaded - more resilience patterns
Resilience reloaded - more resilience patternsResilience reloaded - more resilience patterns
Resilience reloaded - more resilience patterns
 
The promises and perils of microservices
The promises and perils of microservicesThe promises and perils of microservices
The promises and perils of microservices
 
Cloud fuer entscheider
Cloud fuer entscheiderCloud fuer entscheider
Cloud fuer entscheider
 
KomplexitÀt - Na und?
KomplexitÀt - Na und?KomplexitÀt - Na und?
KomplexitÀt - Na und?
 
Cloud Compliance - Bestimmungen, Zertifizierungen und all das
Cloud Compliance - Bestimmungen, Zertifizierungen und all dasCloud Compliance - Bestimmungen, Zertifizierungen und all das
Cloud Compliance - Bestimmungen, Zertifizierungen und all das
 
Scalability patterns
Scalability patternsScalability patterns
Scalability patterns
 
Emergent architecture
Emergent architectureEmergent architecture
Emergent architecture
 
Der Business Case fĂŒr Architektur
Der Business Case fĂŒr ArchitekturDer Business Case fĂŒr Architektur
Der Business Case fĂŒr Architektur
 
The agile Architect - Craftsmanship on a new Level
The agile Architect - Craftsmanship on a new LevelThe agile Architect - Craftsmanship on a new Level
The agile Architect - Craftsmanship on a new Level
 
Down with the_sandals
Down with the_sandalsDown with the_sandals
Down with the_sandals
 
No crash allowed - Fault tolerance patterns
No crash allowed - Fault tolerance patternsNo crash allowed - Fault tolerance patterns
No crash allowed - Fault tolerance patterns
 
Hochskalierbare Cloud-Architekturen
Hochskalierbare Cloud-ArchitekturenHochskalierbare Cloud-Architekturen
Hochskalierbare Cloud-Architekturen
 
Kommunikation und Qualität - Java Forum Nord 2016
Kommunikation und Qualität - Java Forum Nord 2016Kommunikation und Qualität - Java Forum Nord 2016
Kommunikation und Qualität - Java Forum Nord 2016
 
OAuth 2.0
OAuth 2.0OAuth 2.0
OAuth 2.0
 
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
 

Ähnlich wie Fault tolerance made easy

Java Concurrency
Java ConcurrencyJava Concurrency
Java Concurrency
Carol McDonald
 
2012 JDays Bad Tests Good Tests
2012 JDays Bad Tests Good Tests2012 JDays Bad Tests Good Tests
2012 JDays Bad Tests Good Tests
Tomek Kaczanowski
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
Tomek Kaczanowski
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
Holden Karau
 
Java 5 concurrency
Java 5 concurrencyJava 5 concurrency
Java 5 concurrency
priyank09
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in Practice
ericbeyeler
 
Os Practical Assignment 1
Os Practical Assignment 1Os Practical Assignment 1
Os Practical Assignment 1
Emmanuel Garcia
 

Ähnlich wie Fault tolerance made easy (20)

Java Concurrency
Java ConcurrencyJava Concurrency
Java Concurrency
 
2012 JDays Bad Tests Good Tests
2012 JDays Bad Tests Good Tests2012 JDays Bad Tests Good Tests
2012 JDays Bad Tests Good Tests
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
 
Developer Test - Things to Know
Developer Test - Things to KnowDeveloper Test - Things to Know
Developer Test - Things to Know
 
How to Start Test-Driven Development in Legacy Code
How to Start Test-Driven Development in Legacy CodeHow to Start Test-Driven Development in Legacy Code
How to Start Test-Driven Development in Legacy Code
 
The Promised Land (in Angular)
The Promised Land (in Angular)The Promised Land (in Angular)
The Promised Land (in Angular)
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Sane Async Patterns
Sane Async PatternsSane Async Patterns
Sane Async Patterns
 
Java 5 concurrency
Java 5 concurrencyJava 5 concurrency
Java 5 concurrency
 
Resiliency & Security_Ballerina Day CMB 2018
Resiliency & Security_Ballerina Day CMB 2018  Resiliency & Security_Ballerina Day CMB 2018
Resiliency & Security_Ballerina Day CMB 2018
 
Ten useful JavaScript tips & best practices
Ten useful JavaScript tips & best practicesTen useful JavaScript tips & best practices
Ten useful JavaScript tips & best practices
 
Multithreading in Java
Multithreading in JavaMultithreading in Java
Multithreading in Java
 
Confitura 2012 Bad Tests, Good Tests
Confitura 2012 Bad Tests, Good TestsConfitura 2012 Bad Tests, Good Tests
Confitura 2012 Bad Tests, Good Tests
 
Java util concurrent
Java util concurrentJava util concurrent
Java util concurrent
 
GeeCON 2012 Bad Tests, Good Tests
GeeCON 2012 Bad Tests, Good TestsGeeCON 2012 Bad Tests, Good Tests
GeeCON 2012 Bad Tests, Good Tests
 
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...
Tricks to Making a Realtime SurfaceView Actually Perform in Realtime - Maarte...
 
Nevyn — Promise, It's Async! Swift Language User Group Lightning Talk 2015-09-24
Nevyn — Promise, It's Async! Swift Language User Group Lightning Talk 2015-09-24Nevyn — Promise, It's Async! Swift Language User Group Lightning Talk 2015-09-24
Nevyn — Promise, It's Async! Swift Language User Group Lightning Talk 2015-09-24
 
Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016
 
Java Concurrency in Practice
Java Concurrency in PracticeJava Concurrency in Practice
Java Concurrency in Practice
 
Os Practical Assignment 1
Os Practical Assignment 1Os Practical Assignment 1
Os Practical Assignment 1
 

Mehr von Uwe Friedrichsen

Timeless design in a cloud-native world
Timeless design in a cloud-native worldTimeless design in a cloud-native world
Timeless design in a cloud-native world
Uwe Friedrichsen
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
Uwe Friedrichsen
 
Real-world consistency explained
Real-world consistency explainedReal-world consistency explained
Real-world consistency explained
Uwe Friedrichsen
 
The 7 quests of resilient software design
The 7 quests of resilient software designThe 7 quests of resilient software design
The 7 quests of resilient software design
Uwe Friedrichsen
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestors
Uwe Friedrichsen
 
The truth about "You build it, you run it!"
The truth about "You build it, you run it!"The truth about "You build it, you run it!"
The truth about "You build it, you run it!"
Uwe Friedrichsen
 
Resilient Functional Service Design
Resilient Functional Service DesignResilient Functional Service Design
Resilient Functional Service Design
Uwe Friedrichsen
 
DevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader contextDevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader context
Uwe Friedrichsen
 
Modern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of ITModern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of IT
Uwe Friedrichsen
 

Mehr von Uwe Friedrichsen (20)

Timeless design in a cloud-native world
Timeless design in a cloud-native worldTimeless design in a cloud-native world
Timeless design in a cloud-native world
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Life after microservices
Life after microservicesLife after microservices
Life after microservices
 
The hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developerThe hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developer
 
Digitization solutions - A new breed of software
Digitization solutions - A new breed of softwareDigitization solutions - A new breed of software
Digitization solutions - A new breed of software
 
Real-world consistency explained
Real-world consistency explainedReal-world consistency explained
Real-world consistency explained
 
The 7 quests of resilient software design
The 7 quests of resilient software designThe 7 quests of resilient software design
The 7 quests of resilient software design
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestors
 
The truth about "You build it, you run it!"
The truth about "You build it, you run it!"The truth about "You build it, you run it!"
The truth about "You build it, you run it!"
 
Resilient Functional Service Design
Resilient Functional Service DesignResilient Functional Service Design
Resilient Functional Service Design
 
Watch your communication
Watch your communicationWatch your communication
Watch your communication
 
Life, IT and everything
Life, IT and everythingLife, IT and everything
Life, IT and everything
 
DevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader contextDevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader context
 
Production-ready Software
Production-ready SoftwareProduction-ready Software
Production-ready Software
 
Towards complex adaptive architectures
Towards complex adaptive architecturesTowards complex adaptive architectures
Towards complex adaptive architectures
 
Conway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective ITConway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective IT
 
Microservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack riskMicroservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack risk
 
Modern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of ITModern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of IT
 
The Next Generation (of) IT
The Next Generation (of) ITThe Next Generation (of) IT
The Next Generation (of) IT
 
Why resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudesWhy resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudes
 

KĂŒrzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

KĂŒrzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Fault tolerance made easy

  • 1. Fault tolerance made easy Patterns for fault tolerance implemented surprisingly easy Uwe Friedrichsen, codecentric AG, 2013-2014
  • 2. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
  • 3. It‘s all about production!
  • 5. Your web server doesn‘t look good 

  • 7. Timeouts (1) // Basics myObject.wait(); // Do not use this by default myObject.wait(TIMEOUT); // Better use this // Some more basics myThread.join(); // Do not use this by default myThread.join(TIMEOUT); // Better use this
  • 8. Timeouts (2) // Using the Java concurrent library Callable<MyActionResult> myAction = <My Blocking Action> ExecutorService executor = Executors.newSingleThreadExecutor(); Future<MyActionResult> future = executor.submit(myAction); MyActionResult result = null; try { result = future.get(); // Do not use this by default result = future.get(TIMEOUT, TIMEUNIT); // Better use this } catch (TimeoutException e) { // Only thrown if timeouts are used ... } catch (...) { ... }
  • 9. Timeouts (3) // Using Guava SimpleTimeLimiter Callable<MyActionResult> myAction = <My Blocking Action> SimpleTimeLimiter limiter = new SimpleTimeLimiter(); MyActionResult result = null; try { result = limiter.callWithTimeout(myAction, TIMEOUT, TIMEUNIT, false); } catch (UncheckedTimeoutException e) { ... } catch (...) { ... }
  • 10. Determining Timeout Duration Configurable Timeouts Self-Adapting Timeouts Timeouts in JavaEE Containers
  • 12. Circuit Breaker (1) Client Resource Circuit Breaker Request Resource unavailable Resource available Closed Open Half-Open Lifecycle
  • 13. Circuit Breaker (2) Closed on call / pass through call succeeds / reset count call fails / count failure threshold reached / trip breaker Open on call / fail on timeout / attempt reset trip breaker Half-Open on call / pass through call succeeds / reset call fails / trip breaker trip breaker attempt reset reset Source: M. Nygard, „Release It!“
  • 14. Circuit Breaker (3) public class CircuitBreaker implements MyResource { public enum State { CLOSED, OPEN, HALF_OPEN } final MyResource resource; State state; int counter; long tripTime; public CircuitBreaker(MyResource r) { resource = r; state = CLOSED; counter = 0; tripTime = 0L; } ...
  • 15. Circuit Breaker (4) ... public Result access(...) { // resource access Result r = null; if (state == OPEN) { checkTimeout(); throw new ResourceUnavailableException(); } try { r = r.access(...); // should use timeout } catch (Exception e) { fail(); throw e; } success(); return r; } ...
  • 16. Circuit Breaker (5) ... private void success() { reset(); } private void fail() { counter++; if (counter > THRESHOLD) { tripBreaker(); } } private void reset() { state = CLOSED; counter = 0; } ...
  • 17. Circuit Breaker (6) ... private void tripBreaker() { state = OPEN; tripTime = System.currentTimeMillis(); } private void checkTimeout() { if ((System.currentTimeMillis - tripTime) > TIMEOUT) { state = HALF_OPEN; counter = THRESHOLD; } } public State getState() return state; } }
  • 18. Thread-Safe Circuit Breaker Failure Types Tuning Circuit Breakers Available Implementations
  • 20. Fail Fast (1) Client Resources Expensive Action Request Uses
  • 21. Fail Fast (2) Client Resources Expensive Action Request Fail Fast Guard Uses Check availability Forward
  • 22. Fail Fast (3) public class FailFastGuard { private FailFastGuard() {} public static void checkResources(Set<CircuitBreaker> resources) { for (CircuitBreaker r : resources) { if (r.getState() != CircuitBreaker.CLOSED) { throw new ResourceUnavailableException(r); } } } }
  • 23. Fail Fast (4) public class MyService { Set<CircuitBreaker> requiredResources; // Initialize resources ... public Result myExpensiveAction(...) { FailFastGuard.checkResources(requiredResources); // Execute core action ... } }
  • 26. Shed Load (1) Clients Server Too many Requests
  • 27. Shed Load (2) Server Too many Requests Gate Keeper Monitor Requests Request Load Data Monitor Load Shedded Requests Clients
  • 28. Shed Load (3) public class ShedLoadFilter implements Filter { Random random; public void init(FilterConfig fc) throws ServletException { random = new Random(System.currentTimeMillis()); } public void destroy() { random = null; } ...
  • 29. Shed Load (4) ... public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws java.io.IOException, ServletException { int load = getLoad(); if (shouldShed(load)) { HttpServletResponse res = (HttpServletResponse)response; res.setIntHeader("Retry-After", RECOMMENDATION); res.sendError(HttpServletResponse.SC_SERVICE_UNAVAILABLE); return; } chain.doFilter(request, response); } ...
  • 30. Shed Load (5) ... private boolean shouldShed(int load) { // Example implementation if (load < THRESHOLD) { return false; } double shedBoundary = ((double)(load - THRESHOLD))/ ((double)(MAX_LOAD - THRESHOLD)); return random.nextDouble() < shedBoundary; } }
  • 33. Shedding Strategy Retrieving Load Tuning Load Shedders Alternative Strategies
  • 35. Deferrable Work (1) Client Requests Request Processing Resources Use Routine Work Use
  • 36. OVERLOAD Deferrable Work (2) Without‹ Deferrable Work 100% OVERLOAD With‹ Deferrable Work 100% Request Processing Routine Work
  • 37. // Do or wait variant ProcessingState state = initBatch(); while(!state.done()) { int load = getLoad(); if (load > THRESHOLD) { waitFixedDuration(); } else { state = processNext(state); } } void waitFixedDuration() { Thread.sleep(DELAY); // try-catch left out for better readability } Deferrable Work (3)
  • 38. // Adaptive load variant ProcessingState state = initBatch(); while(!state.done()) { waitLoadBased(); state = processNext(state); } void waitLoadBased() { int load = getLoad(); long delay = calcDelay(load); Thread.sleep(delay); // try-catch left out for better readability } long calcDelay(int load) { // Simple example implementation if (load < THRESHOLD) { return 0L; } return (load – THRESHOLD) * DELAY_FACTOR; } Deferrable Work (4)
  • 40. I can hardly hear you 

  • 42. Leaky Bucket (1) Leaky Bucket Fill Problem occured Periodically Leak Error Handling Overflowed?
  • 43. public class LeakyBucket { // Very simple implementation final private int capacity; private int level; private boolean overflow; public LeakyBucket(int capacity) { this.capacity = capacity; drain(); } public void drain () { this.level = 0; this.overflow = false; } ... Leaky Bucket (2)
  • 44. ... public void fill() { level++; if (level > capacity) { overflow = true; } } public void leak() { level--; if (level < 0) { level = 0; } } public boolean overflowed() { return overflow; } } Leaky Bucket (3)
  • 45. Thread-Safe Leaky Bucket Leaking strategies Tuning Leaky Bucket Available Implementations
  • 47. // doAction returns true if successful, false otherwise // General pattern boolean success = false int tries = 0; while (!success && (tries < MAX_TRIES)) { success = doAction(...); tries++; } // Alternative one-retry-only variant success = doAction(...) || doAction(...); Limited Retries (1)
  • 48. Idempotent Actions Closures / Lambdas Tuning Retries
  • 49. More Patterns ‱  Complete Parameter Checking ‱  Marked Data ‱  Routine Audits
  • 50. Further reading 1.  Michael T. Nygard, Release It!, Pragmatic Bookshelf, 2007 2.  Robert S. Hanmer,‹ Patterns for Fault Tolerant Software, Wiley, 2007 3.  James Hamilton, On Designing and Deploying Internet-Scale Services,‹ 21st LISA Conference 2007 4.  Andrew Tanenbaum, Marten van Steen, Distributed Systems – Principles and Paradigms,‹ Prentice Hall, 2nd Edition, 2006
  • 51. It‘s all about production!
  • 52. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com