The document discusses circuit breaker mechanisms for microservices architectures. It notes that applications should be designed to fail fast and recover fast from failures. A typical multi-threaded web application is shown where all threads would be blocked if a dependent service goes down. The document then introduces the circuit breaker pattern to address this issue by isolating failures and limiting the number of requests affected when a service is unavailable. It emphasizes testing and monitoring microservices to understand failure rates and the effectiveness of the circuit breaker.
7. Designing Microservices
!! DESIGN FOR FAILURES !!
- Don’t fail if your microservice goes down
- Application should not forever wait for microservice
8. Designing Microservices
!! DESIGN FOR FAILURES !!
- Don’t fail if your microservice goes down
- Application should not forever wait for microservice
- Contain and Isolate failures
9. Designing Microservices
!! DESIGN FOR FAILURES !!
- Don’t fail if your microservice goes down
- Application should not forever wait for microservice
- Contain and Isolate failures
- Respect the service when it is slow
10. Designing Microservices
!! DESIGN FOR FAILURES !!
- Don’t fail if your microservice goes down
- Application should not forever wait for microservice
- Contain and Isolate failures
- Respect the service when it is slow
- Fail fast - Recover fast
12. Our Application and Users service
• Service latency(avg) =>100ms
• Timeout =>1s
• RPS =>100
• Application threads =>100
13. Issues with the architectureOur Application
• Service latency(avg) =>100ms
• Timeout =>1s
• RPS =>100
• Application threads =>100
Users service goes down
• Average latency?
• Requests received by dependent
service?
• RPS served?
14. Did we design for failures?
- Don’t fail if your microservice goes down
- Application should not forever wait for microservice
- Contain and Isolate failures
- Respect the service when it is slow
- Fail fast - Recover fast
17. Threads =>
100(RPS) X 0.1(Latency) = 10
Case: Users service is down.
- How many requests get affected?
- How fast do we recover?
18. Did we design for failures?
- Don’t fail if your microservice goes down
- Application should not forever wait for microservice
- Contain and Isolate failures
- Respect the service when it is slow
- Fail fast - Recover fast
19. Testing and monitoring!
• IPTables FTW!
Drop connections from IP address: iptables -A INPUT -d 202.54.1.22 -j DROP
Drop connections from port: iptables -A INPUT-p tcp --dport 1080 -j DROP
20. Testing and monitoring!
• IPTables FTW!
Drop connections from IP address: iptables -A INPUT -d 202.54.1.22 -j DROP
Drop connections from port: iptables -A INPUT-p tcp --dport 1080 -j DROP
• Percentage of rejections happened?
21. Thank you
• Twitter: @kunalgrover05
• Email: kunalgrover05@gmail.com
• Slides: https://tinyurl.com/RootconfKunal
• Acknowledgements: Alex Koturanov (@alex_koturanov)
22. Pop Quiz
You have 20 microservices in your single endpoint
application. You monitor P99 latency numbers for each
microservice.
24. Pop Quiz
You have 20 microservices in your single endpoint
application. You monitor P99 latency numbers for each
microservice.
What are you monitoring effectively?
25. Pop Quiz
You have 20 microservices in your single endpoint
application. You monitor P99 latency numbers for each
microservice.
What are you monitoring effectively?
(0.99)20 = 0.81
Neglected 19% of your customers