SlideShare ist ein Scribd-Unternehmen logo
1 von 58
Downloaden Sie, um offline zu lesen
Fault tolerant 
microservices 
BSkyB 
@chbatey
@chbatey 
Who is this guy? 
● Enthusiastic nerd 
● Senior software engineer at BSkyB 
● Builds a lot of distributed applications 
● Apache Cassandra MVP
@chbatey 
Agenda 
1. Setting the scene 
○ What do we mean by a fault? 
○ What is a microservice? 
○ Monolith application vs the micro(ish) service 
2. A worked example 
○ Identify an issue 
○ Reproduce/test it 
○ Show how to deal with the issue
So… what do applications look like? 
@chbatey
So... what do systems look like now? 
@chbatey
But different things go wrong... 
@chbatey 
down 
slow network 
slow app 
2 second max 
GC :( 
missing packets
Fault tolerance 
1. Don’t take forever - Timeouts 
2. Don’t try if you can’t succeed 
3. Fail gracefully 
4. Know if it’s your fault 
5. Don’t whack a dead horse 
6. Turn broken stuff off 
@chbatey
Time for an example... 
● All examples are on github 
● Technologies used: 
@chbatey 
○ Dropwizard 
○ Spring Boot 
○ Wiremock 
○ Hystrix 
○ Graphite 
○ Saboteur
Example: Movie player service 
@chbatey 
Shiny App 
User 
Service 
Device 
Service 
Pin 
Service 
Shiny App 
Shiny App 
Shiny App 
User 
Se rUvisceer 
Service 
Device 
Service 
Play Movie
Testing microservices 
You don’t know a service is 
fault tolerant if you don’t 
test faults 
@chbatey
Isolated service tests 
Shiny App 
@chbatey 
Mocks 
User 
Device 
Pin 
service 
Acceptance Play Movie 
Test 
Prime
1 - Don’t take forever 
@chbatey 
● If at first you don’t 
succeed, don’t take 
forever to tell someone 
● Timeout and fail fast
Which timeouts? 
● Socket connection timeout 
● Socket read timeout 
@chbatey
Your service hung for 30 seconds :( 
@chbatey 
Customer 
You :(
Which timeouts? 
● Socket connection timeout 
● Socket read timeout 
● Resource acquisition 
@chbatey
Your service hung for 10 minutes :( 
@chbatey
Let’s think about this 
@chbatey
A little more detail 
@chbatey
Wiremock + Saboteur + Vagrant 
● Vagrant - launches + provisions local VMs 
● Saboteur - uses tc, iptables to simulate 
@chbatey 
network issues 
● Wiremock - used to mock HTTP 
dependencies 
● Cucumber - acceptance tests
I can write an automated test for that? 
@chbatey 
Vagrant + Virtual box VM 
Wiremock 
User Service 
Device Service 
Pin Service 
Sabot 
eur 
Play 
Movie 
Service 
Acceptance 
Test 
prime to drop traffic 
reset
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor)
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor) 
● Hystrix
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor) 
● Hystrix 
● Spring Cloud Netflix
A simple Spring RestController 
@chbatey 
@RestController 
public class Resource { 
private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class); 
@Autowired 
private ScaryDependency scaryDependency; 
@RequestMapping("/scary") 
public String callTheScaryDependency() { 
LOGGER.info("RestContoller: I wonder which thread I am on!"); 
return scaryDependency.getScaryString(); 
} 
}
Scary dependency 
@chbatey 
@Component 
public class ScaryDependency { 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
public String getScaryString() { 
LOGGER.info("Scary dependency: I wonder which thread I am on!"); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else { 
Thread.sleep(10000); 
return "Really slow scary string"; } 
} 
}
All on the tomcat thread 
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. 
examples.Resource - RestContoller: I wonder which thread 
I am on! 
13:07:32.896 [http-nio-8080-exec-1] INFO info.batey. 
examples.ScaryDependency - Scary dependency: I wonder 
which thread I am on! 
@chbatey
Seriously this simple now? 
@chbatey 
@Component 
public class ScaryDependency { 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
@HystrixCommand 
public String getScaryString() { 
LOGGER.info("Scary dependency: I wonder which thread I am on!"); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else { 
Thread.sleep(10000); 
return "Really slow scary string"; 
} 
} 
}
What an annotation can do... 
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. 
examples.Resource - RestController: I wonder which 
thread I am on! 
13:07:32.896 [hystrix-ScaryDependency-1] INFO info. 
batey.examples.ScaryDependency - Scary Dependency: I 
wonder which thread I am on! 
@chbatey
Timeouts take home 
● You can’t use network level timeouts for 
@chbatey 
SLAs 
● Test your SLAs - if someone says you can’t, 
hit them with a stick 
● Scary things happen without network issues
2 - Don’t try if you can’t succeed 
@chbatey
Complexity 
● When an application grows in complexity it 
will eventually start sending emails 
@chbatey
Complexity 
● When an application grows in complexity it 
will eventually start sending emails contain 
queues and thread pools 
@chbatey
Don’t try if you can’t succeed 
● Executor Unbounded queues :( 
○ newFixedThreadPool 
○ newSingleThreadExecutor 
○ newThreadCachedThreadPool 
● Bound your queues and threads 
● Fail quickly when the queue / 
@chbatey 
maxPoolSize is met 
● Know your drivers
This is a functional requirement 
● Set the timeout very high 
● Use wiremock to add a large delay to the 
@chbatey 
requests 
● Set queue size and thread pool size to 1 
● Send in 2 requests to use the thread and fill 
the queue 
● What happens on the 3rd request?
3 - Fail gracefully 
@chbatey
Expect rubbish 
● Expect invalid HTTP 
● Expect malformed response bodies 
● Expect connection failures 
● Expect huge / tiny responses 
@chbatey
Testing with Wiremock 
@chbatey 
stubFor(get(urlEqualTo("/dependencyPath")) 
.willReturn(aResponse() 
.withFault(Fault.MALFORMED_RESPONSE_CHUNK))); 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "RANDOM_DATA_THEN_CLOSE" 
} 
} 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "EMPTY_RESPONSE" 
} 
}
4 - Know if it’s your fault 
@chbatey
What to record 
● Metrics: Timings, errors, concurrent 
incoming requests, thread pool statistics, 
connection pool statistics 
● Logging: Boundary logging, elasticsearch / 
@chbatey 
logstash 
● Request identifiers
Graphite + Codahale 
@chbatey
@chbatey 
Response times
Separate resource pools 
● Don’t flood your dependencies 
● Be able to answer the questions: 
○ How many connections will 
you make to dependency X? 
○ Are you getting close to your 
@chbatey 
max connections?
So easy with Dropwizard + Hystrix 
@Override 
public void initialize(Bootstrap<AppConfig> appConfigBootstrap) { 
HystrixCodaHaleMetricsPublisher metricsPublisher 
= new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry()) 
HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher); 
@chbatey 
} 
metrics: 
reporters: 
- type: graphite 
host: 192.168.10.120 
port: 2003 
prefix: shiny_app
5 - Don’t whack a dead horse 
@chbatey 
Shiny App 
User 
Service 
Device 
Service 
Pin 
Service 
Shiny App 
Shiny App 
Shiny App 
User 
Se rUvisceer 
Service 
Device 
Service 
Play Movie
What to do.. 
● Yes this will happen.. 
● Mandatory dependency - fail *really* fast 
● Throttling 
● Fallbacks 
@chbatey
Circuit breaker pattern 
@chbatey
Implementation with Hystrix 
@chbatey 
@GET 
@Timed 
public String integrate() { 
LOGGER.info("I best do some integration!"); 
String user = new UserServiceDependency(userService).execute(); 
String device = new DeviceServiceDependency(deviceService).execute(); 
Boolean pinCheck = new PinCheckDependency(pinService).execute(); 
return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device, 
pinCheck); 
}
Implementation with Hystrix 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
@chbatey 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
}
Implementation with Hystrix 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
@chbatey 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
@Override 
public Boolean getFallback() { 
return true; 
} 
}
Triggering the fallback 
● Error threshold percentage 
● Bucket of time for the percentage 
● Minimum number of requests to trigger 
● Time before trying a request again 
● Disable 
● Per instance statistics 
@chbatey
6 - Turn off broken stuff 
● The kill switch 
@chbatey
To recap 
1. Don’t take forever - Timeouts 
2. Don’t try if you can’t succeed 
3. Fail gracefully 
4. Know if it’s your fault 
5. Don’t whack a dead horse 
6. Turn broken stuff off 
@chbatey
@chbatey 
Links 
● Examples: 
○ https://github.com/chbatey/spring-cloud-example 
○ https://github.com/chbatey/dropwizard-hystrix 
○ https://github.com/chbatey/vagrant-wiremock-saboteur 
● Tech: 
○ https://github.com/Netflix/Hystrix 
○ https://www.vagrantup.com/ 
○ http://wiremock.org/ 
○ https://github.com/tomakehurst/saboteur
Questions? 
● Thanks for listening! 
● http://christopher-batey.blogspot.co.uk/ 
@chbatey
Developer takeaways 
● Learn about TCP 
● Love vagrant, docker etc to enable testing 
● Don’t trust libraries 
@chbatey
Hystrix cost - do this yourself 
@chbatey
Hystrix metrics 
● Failure count 
● Percentiles from Hystrix 
@chbatey 
point of view 
● Error percentages
How to test metric publishing? 
● Stub out graphite and verify calls? 
● Programmatically call graphite and verify 
@chbatey 
numbers? 
● Make metrics + logs part of the story demo

Weitere ähnliche Inhalte

Was ist angesagt?

Testing capability ppt
Testing capability pptTesting capability ppt
Testing capability ppt
anilreddyqa
 
CDN_Netflix_analysis
CDN_Netflix_analysisCDN_Netflix_analysis
CDN_Netflix_analysis
Sanket Jain
 
Software Maintenance Project Proposal PowerPoint Presentation Slides
Software Maintenance Project Proposal PowerPoint Presentation SlidesSoftware Maintenance Project Proposal PowerPoint Presentation Slides
Software Maintenance Project Proposal PowerPoint Presentation Slides
SlideTeam
 

Was ist angesagt? (20)

ITSM & JIRA Service Desk
ITSM & JIRA Service DeskITSM & JIRA Service Desk
ITSM & JIRA Service Desk
 
A Top Down Approach to End-to-End Testing
A Top Down Approach to End-to-End TestingA Top Down Approach to End-to-End Testing
A Top Down Approach to End-to-End Testing
 
[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure[AI] ML Operationalization with Microsoft Azure
[AI] ML Operationalization with Microsoft Azure
 
Azure reference architectures
Azure reference architecturesAzure reference architectures
Azure reference architectures
 
iPaaS: A platform for Integration technology convergence
iPaaS: A platform for Integration technology convergenceiPaaS: A platform for Integration technology convergence
iPaaS: A platform for Integration technology convergence
 
Testing capability ppt
Testing capability pptTesting capability ppt
Testing capability ppt
 
MuleSoft's Approach to Driving Customer Outcomes
MuleSoft's Approach to Driving Customer Outcomes MuleSoft's Approach to Driving Customer Outcomes
MuleSoft's Approach to Driving Customer Outcomes
 
Testing at Spotify
Testing at SpotifyTesting at Spotify
Testing at Spotify
 
What is Application Performance Management?
What is Application Performance Management?What is Application Performance Management?
What is Application Performance Management?
 
CDN_Netflix_analysis
CDN_Netflix_analysisCDN_Netflix_analysis
CDN_Netflix_analysis
 
Building an API Security Strategy
Building an API Security StrategyBuilding an API Security Strategy
Building an API Security Strategy
 
End to end testing - strategies
End to end testing - strategiesEnd to end testing - strategies
End to end testing - strategies
 
Agile QA Process
Agile QA ProcessAgile QA Process
Agile QA Process
 
Performance Engineering Basics
Performance Engineering BasicsPerformance Engineering Basics
Performance Engineering Basics
 
Test Strategy-The real silver bullet in testing by Matthew Eakin
Test Strategy-The real silver bullet in testing by Matthew EakinTest Strategy-The real silver bullet in testing by Matthew Eakin
Test Strategy-The real silver bullet in testing by Matthew Eakin
 
Software Test Estimation
Software Test EstimationSoftware Test Estimation
Software Test Estimation
 
Introduction to SDET
Introduction to SDETIntroduction to SDET
Introduction to SDET
 
Software Maintenance Project Proposal PowerPoint Presentation Slides
Software Maintenance Project Proposal PowerPoint Presentation SlidesSoftware Maintenance Project Proposal PowerPoint Presentation Slides
Software Maintenance Project Proposal PowerPoint Presentation Slides
 
Cigniti Independent Software Testing Services
Cigniti Independent Software Testing ServicesCigniti Independent Software Testing Services
Cigniti Independent Software Testing Services
 
The DevOps Dance - Shift Left, Shift Right - Get It Right
The DevOps Dance - Shift Left, Shift Right - Get It RightThe DevOps Dance - Shift Left, Shift Right - Get It Right
The DevOps Dance - Shift Left, Shift Right - Get It Right
 

Andere mochten auch

Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Stick to the rules - Consumer Driven Contracts. 2015.07 ConfituraStick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Marcin Grzejszczak
 

Andere mochten auch (7)

Dropwizard Internals
Dropwizard InternalsDropwizard Internals
Dropwizard Internals
 
Production Ready Web Services with Dropwizard
Production Ready Web Services with DropwizardProduction Ready Web Services with Dropwizard
Production Ready Web Services with Dropwizard
 
Simple REST-APIs with Dropwizard and Swagger
Simple REST-APIs with Dropwizard and SwaggerSimple REST-APIs with Dropwizard and Swagger
Simple REST-APIs with Dropwizard and Swagger
 
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Stick to the rules - Consumer Driven Contracts. 2015.07 ConfituraStick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
 
Dropwizard
DropwizardDropwizard
Dropwizard
 
Reactive Design Patterns
Reactive Design PatternsReactive Design Patterns
Reactive Design Patterns
 
Patterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWSPatterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWS
 

Ähnlich wie Fault tolerant microservices - LJC Skills Matter 4thNov2014

2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python
Adam Hitchcock
 
13multithreaded Programming
13multithreaded Programming13multithreaded Programming
13multithreaded Programming
Adil Jafri
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
Praveen Gollakota
 

Ähnlich wie Fault tolerant microservices - LJC Skills Matter 4thNov2014 (20)

Voxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservicesVoxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservices
 
LJC: Microservices in the real world
LJC: Microservices in the real worldLJC: Microservices in the real world
LJC: Microservices in the real world
 
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with CassandraDevoxx France: Fault tolerant microservices on the JVM with Cassandra
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
 
2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python2012 07 making disqus realtime@euro python
2012 07 making disqus realtime@euro python
 
13multithreaded Programming
13multithreaded Programming13multithreaded Programming
13multithreaded Programming
 
VISUG - Approaches for application request throttling
VISUG - Approaches for application request throttlingVISUG - Approaches for application request throttling
VISUG - Approaches for application request throttling
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
 
CDI: How do I ?
CDI: How do I ?CDI: How do I ?
CDI: How do I ?
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
 
Approaches to application request throttling
Approaches to application request throttlingApproaches to application request throttling
Approaches to application request throttling
 
Thread syncronization
Thread syncronizationThread syncronization
Thread syncronization
 
Java Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and TrendsJava Concurrency, Memory Model, and Trends
Java Concurrency, Memory Model, and Trends
 
Introduction to Ethereum
Introduction to EthereumIntroduction to Ethereum
Introduction to Ethereum
 
Ad Server Optimization
Ad Server OptimizationAd Server Optimization
Ad Server Optimization
 
Campus HTC at #TechEX15
Campus HTC at #TechEX15Campus HTC at #TechEX15
Campus HTC at #TechEX15
 
Java Concurrency
Java ConcurrencyJava Concurrency
Java Concurrency
 
Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)Post quantum cryptography in vault (hashi talks 2020)
Post quantum cryptography in vault (hashi talks 2020)
 
SwampDragon presentation: The Copenhagen Django Meetup Group
SwampDragon presentation: The Copenhagen Django Meetup GroupSwampDragon presentation: The Copenhagen Django Meetup Group
SwampDragon presentation: The Copenhagen Django Meetup Group
 

Mehr von Christopher Batey

Mehr von Christopher Batey (20)

Cassandra summit LWTs
Cassandra summit  LWTsCassandra summit  LWTs
Cassandra summit LWTs
 
Docker and jvm. A good idea?
Docker and jvm. A good idea?Docker and jvm. A good idea?
Docker and jvm. A good idea?
 
NYC Cassandra Day - Java Intro
NYC Cassandra Day - Java IntroNYC Cassandra Day - Java Intro
NYC Cassandra Day - Java Intro
 
Cassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patternsCassandra Day NYC - Cassandra anti patterns
Cassandra Day NYC - Cassandra anti patterns
 
Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!Think your software is fault-tolerant? Prove it!
Think your software is fault-tolerant? Prove it!
 
Manchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internalsManchester Hadoop Meetup: Cassandra Spark internals
Manchester Hadoop Meetup: Cassandra Spark internals
 
Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0Cassandra London - 2.2 and 3.0
Cassandra London - 2.2 and 3.0
 
Cassandra London - C* Spark Connector
Cassandra London - C* Spark ConnectorCassandra London - C* Spark Connector
Cassandra London - C* Spark Connector
 
IoT London July 2015
IoT London July 2015IoT London July 2015
IoT London July 2015
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
2 Dundee - Cassandra-3
2 Dundee - Cassandra-32 Dundee - Cassandra-3
2 Dundee - Cassandra-3
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
 
Paris Day Cassandra: Use case
Paris Day Cassandra: Use caseParis Day Cassandra: Use case
Paris Day Cassandra: Use case
 
Dublin Meetup: Cassandra anti patterns
Dublin Meetup: Cassandra anti patternsDublin Meetup: Cassandra anti patterns
Dublin Meetup: Cassandra anti patterns
 
Cassandra Day London: Building Java Applications
Cassandra Day London: Building Java ApplicationsCassandra Day London: Building Java Applications
Cassandra Day London: Building Java Applications
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
 
Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra IntegrationManchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop Meetup: Spark Cassandra Integration
 
Manchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra IntroManchester Hadoop User Group: Cassandra Intro
Manchester Hadoop User Group: Cassandra Intro
 
Webinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsWebinar Cassandra Anti-Patterns
Webinar Cassandra Anti-Patterns
 
Munich March 2015 - Cassandra + Spark Overview
Munich March 2015 -  Cassandra + Spark OverviewMunich March 2015 -  Cassandra + Spark Overview
Munich March 2015 - Cassandra + Spark Overview
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Fault tolerant microservices - LJC Skills Matter 4thNov2014

  • 2. @chbatey Who is this guy? ● Enthusiastic nerd ● Senior software engineer at BSkyB ● Builds a lot of distributed applications ● Apache Cassandra MVP
  • 3. @chbatey Agenda 1. Setting the scene ○ What do we mean by a fault? ○ What is a microservice? ○ Monolith application vs the micro(ish) service 2. A worked example ○ Identify an issue ○ Reproduce/test it ○ Show how to deal with the issue
  • 4. So… what do applications look like? @chbatey
  • 5. So... what do systems look like now? @chbatey
  • 6. But different things go wrong... @chbatey down slow network slow app 2 second max GC :( missing packets
  • 7. Fault tolerance 1. Don’t take forever - Timeouts 2. Don’t try if you can’t succeed 3. Fail gracefully 4. Know if it’s your fault 5. Don’t whack a dead horse 6. Turn broken stuff off @chbatey
  • 8. Time for an example... ● All examples are on github ● Technologies used: @chbatey ○ Dropwizard ○ Spring Boot ○ Wiremock ○ Hystrix ○ Graphite ○ Saboteur
  • 9. Example: Movie player service @chbatey Shiny App User Service Device Service Pin Service Shiny App Shiny App Shiny App User Se rUvisceer Service Device Service Play Movie
  • 10. Testing microservices You don’t know a service is fault tolerant if you don’t test faults @chbatey
  • 11. Isolated service tests Shiny App @chbatey Mocks User Device Pin service Acceptance Play Movie Test Prime
  • 12. 1 - Don’t take forever @chbatey ● If at first you don’t succeed, don’t take forever to tell someone ● Timeout and fail fast
  • 13. Which timeouts? ● Socket connection timeout ● Socket read timeout @chbatey
  • 14. Your service hung for 30 seconds :( @chbatey Customer You :(
  • 15. Which timeouts? ● Socket connection timeout ● Socket read timeout ● Resource acquisition @chbatey
  • 16. Your service hung for 10 minutes :( @chbatey
  • 17. Let’s think about this @chbatey
  • 18. A little more detail @chbatey
  • 19. Wiremock + Saboteur + Vagrant ● Vagrant - launches + provisions local VMs ● Saboteur - uses tc, iptables to simulate @chbatey network issues ● Wiremock - used to mock HTTP dependencies ● Cucumber - acceptance tests
  • 20. I can write an automated test for that? @chbatey Vagrant + Virtual box VM Wiremock User Service Device Service Pin Service Sabot eur Play Movie Service Acceptance Test prime to drop traffic reset
  • 21. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor)
  • 22. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor) ● Hystrix
  • 23. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor) ● Hystrix ● Spring Cloud Netflix
  • 24. A simple Spring RestController @chbatey @RestController public class Resource { private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class); @Autowired private ScaryDependency scaryDependency; @RequestMapping("/scary") public String callTheScaryDependency() { LOGGER.info("RestContoller: I wonder which thread I am on!"); return scaryDependency.getScaryString(); } }
  • 25. Scary dependency @chbatey @Component public class ScaryDependency { private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); public String getScaryString() { LOGGER.info("Scary dependency: I wonder which thread I am on!"); if (System.currentTimeMillis() % 2 == 0) { return "Scary String"; } else { Thread.sleep(10000); return "Really slow scary string"; } } }
  • 26. All on the tomcat thread 13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. examples.Resource - RestContoller: I wonder which thread I am on! 13:07:32.896 [http-nio-8080-exec-1] INFO info.batey. examples.ScaryDependency - Scary dependency: I wonder which thread I am on! @chbatey
  • 27. Seriously this simple now? @chbatey @Component public class ScaryDependency { private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); @HystrixCommand public String getScaryString() { LOGGER.info("Scary dependency: I wonder which thread I am on!"); if (System.currentTimeMillis() % 2 == 0) { return "Scary String"; } else { Thread.sleep(10000); return "Really slow scary string"; } } }
  • 28. What an annotation can do... 13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. examples.Resource - RestController: I wonder which thread I am on! 13:07:32.896 [hystrix-ScaryDependency-1] INFO info. batey.examples.ScaryDependency - Scary Dependency: I wonder which thread I am on! @chbatey
  • 29. Timeouts take home ● You can’t use network level timeouts for @chbatey SLAs ● Test your SLAs - if someone says you can’t, hit them with a stick ● Scary things happen without network issues
  • 30. 2 - Don’t try if you can’t succeed @chbatey
  • 31. Complexity ● When an application grows in complexity it will eventually start sending emails @chbatey
  • 32. Complexity ● When an application grows in complexity it will eventually start sending emails contain queues and thread pools @chbatey
  • 33. Don’t try if you can’t succeed ● Executor Unbounded queues :( ○ newFixedThreadPool ○ newSingleThreadExecutor ○ newThreadCachedThreadPool ● Bound your queues and threads ● Fail quickly when the queue / @chbatey maxPoolSize is met ● Know your drivers
  • 34. This is a functional requirement ● Set the timeout very high ● Use wiremock to add a large delay to the @chbatey requests ● Set queue size and thread pool size to 1 ● Send in 2 requests to use the thread and fill the queue ● What happens on the 3rd request?
  • 35. 3 - Fail gracefully @chbatey
  • 36. Expect rubbish ● Expect invalid HTTP ● Expect malformed response bodies ● Expect connection failures ● Expect huge / tiny responses @chbatey
  • 37. Testing with Wiremock @chbatey stubFor(get(urlEqualTo("/dependencyPath")) .willReturn(aResponse() .withFault(Fault.MALFORMED_RESPONSE_CHUNK))); { "request": { "method": "GET", "url": "/fault" }, "response": { "fault": "RANDOM_DATA_THEN_CLOSE" } } { "request": { "method": "GET", "url": "/fault" }, "response": { "fault": "EMPTY_RESPONSE" } }
  • 38. 4 - Know if it’s your fault @chbatey
  • 39. What to record ● Metrics: Timings, errors, concurrent incoming requests, thread pool statistics, connection pool statistics ● Logging: Boundary logging, elasticsearch / @chbatey logstash ● Request identifiers
  • 42. Separate resource pools ● Don’t flood your dependencies ● Be able to answer the questions: ○ How many connections will you make to dependency X? ○ Are you getting close to your @chbatey max connections?
  • 43. So easy with Dropwizard + Hystrix @Override public void initialize(Bootstrap<AppConfig> appConfigBootstrap) { HystrixCodaHaleMetricsPublisher metricsPublisher = new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry()) HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher); @chbatey } metrics: reporters: - type: graphite host: 192.168.10.120 port: 2003 prefix: shiny_app
  • 44. 5 - Don’t whack a dead horse @chbatey Shiny App User Service Device Service Pin Service Shiny App Shiny App Shiny App User Se rUvisceer Service Device Service Play Movie
  • 45. What to do.. ● Yes this will happen.. ● Mandatory dependency - fail *really* fast ● Throttling ● Fallbacks @chbatey
  • 47. Implementation with Hystrix @chbatey @GET @Timed public String integrate() { LOGGER.info("I best do some integration!"); String user = new UserServiceDependency(userService).execute(); String device = new DeviceServiceDependency(deviceService).execute(); Boolean pinCheck = new PinCheckDependency(pinService).execute(); return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device, pinCheck); }
  • 48. Implementation with Hystrix public class PinCheckDependency extends HystrixCommand<Boolean> { @chbatey @Override protected Boolean run() throws Exception { HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); HttpResponse pinCheckResponse = httpClient.execute(pinCheck); String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); return Boolean.valueOf(pinCheckInfo); } }
  • 49. Implementation with Hystrix public class PinCheckDependency extends HystrixCommand<Boolean> { @chbatey @Override protected Boolean run() throws Exception { HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); HttpResponse pinCheckResponse = httpClient.execute(pinCheck); String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); return Boolean.valueOf(pinCheckInfo); } @Override public Boolean getFallback() { return true; } }
  • 50. Triggering the fallback ● Error threshold percentage ● Bucket of time for the percentage ● Minimum number of requests to trigger ● Time before trying a request again ● Disable ● Per instance statistics @chbatey
  • 51. 6 - Turn off broken stuff ● The kill switch @chbatey
  • 52. To recap 1. Don’t take forever - Timeouts 2. Don’t try if you can’t succeed 3. Fail gracefully 4. Know if it’s your fault 5. Don’t whack a dead horse 6. Turn broken stuff off @chbatey
  • 53. @chbatey Links ● Examples: ○ https://github.com/chbatey/spring-cloud-example ○ https://github.com/chbatey/dropwizard-hystrix ○ https://github.com/chbatey/vagrant-wiremock-saboteur ● Tech: ○ https://github.com/Netflix/Hystrix ○ https://www.vagrantup.com/ ○ http://wiremock.org/ ○ https://github.com/tomakehurst/saboteur
  • 54. Questions? ● Thanks for listening! ● http://christopher-batey.blogspot.co.uk/ @chbatey
  • 55. Developer takeaways ● Learn about TCP ● Love vagrant, docker etc to enable testing ● Don’t trust libraries @chbatey
  • 56. Hystrix cost - do this yourself @chbatey
  • 57. Hystrix metrics ● Failure count ● Percentiles from Hystrix @chbatey point of view ● Error percentages
  • 58. How to test metric publishing? ● Stub out graphite and verify calls? ● Programmatically call graphite and verify @chbatey numbers? ● Make metrics + logs part of the story demo