SlideShare ist ein Scribd-Unternehmen logo
1 von 70
Downloaden Sie, um offline zu lesen
Microservices tracing with
Spring Cloud and Zipkin
Marcin Grzejszczak
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
About me
Developer at Pivotal
Part of Spring Cloud Team
Working with OSS:
● Accurest - Consumer Driven Contracts verifier for Java
● JSON Assert - fluent JSON assertions
● Spock Subjects Collaborators Extension
● Gradle Test Profiler
● Up To Date Gradle Plugin
TWITTER: @MGrzejszczak
BLOG: http://TOOMUCHCODING.COM
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Agenda
What is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
An ordinary system...
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
UI calls backend
UI -> BACKEND
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Everything is awesome
CLICK 200
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Until it’s not
CLICK 500
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Time to debug
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
It doesn’t look like this
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
More like this
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
On which server / instance
was the exception thrown?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
SSH and grep for ERROR to find it?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Span
The basic unit of work (e.g. sending RPC)
● Spans are started and stopped
● They keep track of their timing information
● Once you create a span, you must stop it at some point in the future
● Has a parent and can have multiple children
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Trace
A set of spans forming a tree-like structure.
● For example, if you are running a book store then
○ Trace could be retriving a list of available books
○ Assuming that to retrive the books you have to send 3 requests to 3 services
then you could have at least 3 spans (1 for each hop) forming 1 trace
SERVICE 1
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Send
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Send
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Send
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Span Id = A
Parent Id = null
Span Id = B
Parent Id = A
Span Id = C
Parent Id = B
Span Id = D
Parent Id = C
Span Id = E
Parent Id = D
Span Id = F
Parent Id = C
Span Id = G
Parent Id = F
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Is it that simple?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Is it that simple?
How do you pass tracing information (incl. Trace ID)
between:
● different libraries?
● thread pools?
● asynchronous communication?
● …?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
What if you forget about a thread pool?
SERVICE 1
REQUEST
NO TRACE
RESPONSE
SERVICE 2
SERVICE 3
A
A
A
REQUEST
RESPONSE
A
A
A B
A
REQUEST
RESPONSE
B
B
C C
C C
SERVICE 4
REQUEST
RESPONSE
B
B
D D
D D
B
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Log correlation with Spring Cloud Sleuth
We take care of passing tracing information between threads / libraries / contexts for
● Hystrix
● RxJava
● Rest Template
● Feign
● Messaging with Spring Integration
● Zuul
● ...
If you don’t do anything unexpected there’s nothing you need to do to make
Sleuth work. Check the docs for more info.
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Now let’s aggregate the logs!
Instead of SSHing to the machines aggregate the logs!
● With Cloud Foundry’s (CF) Loggergator the logs from different instances are
streamed into a single place
● You can harvest your logs with Logstash Forwarder / FileBeat
● You can use ELK stack to stream and visualize the logs
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-sleuth"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1"
}
}
SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
SERVICE 3
REQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
REQUEST
RESPONSE
“Hello from service3”
“Hello from service4”
“Hello from service2, response from
service3 [Hello from service3] and from
service4 [Hello from service4]”
SERVICE 1
/readtimeout
REQUEST
BOOM!
SERVICE 2
REQUEST
BOOM!
REQUEST
BOOM!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Log correlation with Spring Cloud Sleuth
DEMO
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Great! We’ve found the exception!
But meanwhile....
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
The system is slow...
CLICK 200
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
One of the services is slow?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Which one?
How to measure that?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● Client Send (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from
the server side
Let’s log events!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● The request started at T=0ms
● It took 450 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 200 ms
Conclusions
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Why is there a delay between sending and receiving messages?!!11!one!?!1!
Conclusions
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
https://blogs.oracle.com/jag/resource/Fallacies.html
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Logs
Represents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a
span
● For instance, a span representing a browser page load might add an event for
each of the Performance.timing moments (check https://developer.mozilla.
org/en-US/docs/Web/API/PerformanceTiming)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Main logs
● Client Send (CS)
○ The client has made a request - the span was started
● Server Received (SR)
○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Main logs
● Server Send (SS)
○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)
○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ CR timestamp - SS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Key-value pair
● Every span may have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth
○ message/payload-size
○ http.method
○ commandKey for Hystrix
Tag
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
How to visualise latency in
a distributed system?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● Zipkin is a distributed tracing system
● It runs as a separate process (you can run it as a Spring Boot
application)
● It helps gather timing data needed to troubleshoot latency problems in
microservice architectures
● The front end is a "waterfall" style graph of service calls showing call durations
as horizontal bars
The answer is: Zipkin
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
How does Zipkin work?
SPANS SENT TO
COLLECTORS
SPANS SENT TO
COLLECTORS
STORE
IN DB
APP
APP
UI QUERIES
FOR TRACE
INFO VIA API
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth and Zipkin integration
● We take care of passing tracing information between threads / libraries /
contexts
● Upon closing of a Span we will send it to Zipkin
○ either via HTTP (spring-cloud-sleuth-zipkin)
○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring-
cloud-sleuth-zipkin-stream)
○ you can add the dependency to Zipkin UI!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth Zipkin with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth Zipkin with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-zipkin"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1"
}
}
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
HOLD IT!
● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Sampling to the rescue!
● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin
● You can change that by changing the property
spring.sleuth.sampler.percentage (for 100% pass 1.0)
● Or register a custom org.springframework.cloud.sleuth.Sampler
implementation
SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
/foo
SERVICE 3
/barREQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
/baz
REQUEST
RESPONSE
DEVOXX
SERVICE
/devoxx
REQUEST
RESPONSE
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
DEMO
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
TOTAL DURATION
END
START
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
CLIENT
SENT
CLIENT
RECEIVED
SERVICE 2CLIENT
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
SERVER
RECEIVED
SERVER
SENT
SERVICE 4SERVER
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
LATENCY
SERVER
RECEIVED
CLIENT
SENT
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
SERVER
RECEIVED
CLIENT
SENT
DIFF IS
LATENCY
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Zipkin for Brewery
● A test app for Spring Cloud end to end tests
● Source code:
https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Summary
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
THANK YOU
● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone
and run getReadyForConference.sh - NOTE: you need Vagrant!)
● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository
● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation
● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release
● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server
● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud
Foundry - point of entry to the app
● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry
● http://docsbrewing-zipkin-server.cfapps.io - Zipkin deployed to PCF for Brewery Sample app

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Spring Cloud Contract And Your Microservice Architecture
Spring Cloud Contract And Your Microservice ArchitectureSpring Cloud Contract And Your Microservice Architecture
Spring Cloud Contract And Your Microservice Architecture
 
Consumer Driven Contracts and Your Microservice Architecture @ Warsaw JUG
Consumer Driven Contracts and Your Microservice Architecture @ Warsaw JUGConsumer Driven Contracts and Your Microservice Architecture @ Warsaw JUG
Consumer Driven Contracts and Your Microservice Architecture @ Warsaw JUG
 
Consumer Driven Contracts and Your Microservice Architecture
Consumer Driven Contracts and Your Microservice ArchitectureConsumer Driven Contracts and Your Microservice Architecture
Consumer Driven Contracts and Your Microservice Architecture
 
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMicroservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
 
Consumer Driven Contracts and Your Microservice Architecture
Consumer Driven Contracts and Your Microservice ArchitectureConsumer Driven Contracts and Your Microservice Architecture
Consumer Driven Contracts and Your Microservice Architecture
 
Emerging themes from Cannes 2013
Emerging themes from Cannes 2013Emerging themes from Cannes 2013
Emerging themes from Cannes 2013
 
Eight Signs Your Marketing Content Is Being Wasted, Ignored, Exploited...Or W...
Eight Signs Your Marketing Content Is Being Wasted, Ignored, Exploited...Or W...Eight Signs Your Marketing Content Is Being Wasted, Ignored, Exploited...Or W...
Eight Signs Your Marketing Content Is Being Wasted, Ignored, Exploited...Or W...
 
Menú de ajustes en word press
Menú de ajustes en word pressMenú de ajustes en word press
Menú de ajustes en word press
 
Jessie j analysis
Jessie j analysisJessie j analysis
Jessie j analysis
 
День закрытых дверей
День закрытых дверейДень закрытых дверей
День закрытых дверей
 
外匯交易簡介
外匯交易簡介外匯交易簡介
外匯交易簡介
 
Ept1 unidad 2
Ept1 unidad 2Ept1 unidad 2
Ept1 unidad 2
 
LJ52 42
LJ52 42LJ52 42
LJ52 42
 
Europees Hof zet eerste stap in dispuut onverdoofd slachten
Europees Hof zet eerste stap in dispuut onverdoofd slachtenEuropees Hof zet eerste stap in dispuut onverdoofd slachten
Europees Hof zet eerste stap in dispuut onverdoofd slachten
 
SEGUIMOS CELEBRANDO!
SEGUIMOS CELEBRANDO!SEGUIMOS CELEBRANDO!
SEGUIMOS CELEBRANDO!
 
Surf & Sun Surf Shop South Australia
Surf & Sun Surf Shop South Australia Surf & Sun Surf Shop South Australia
Surf & Sun Surf Shop South Australia
 
Independent Chairman - Research Spotlight
Independent Chairman - Research SpotlightIndependent Chairman - Research Spotlight
Independent Chairman - Research Spotlight
 
Idol Master Platinum Stars アイマス プラチナスターズ Game Review
Idol Master Platinum Stars アイマス プラチナスターズ Game ReviewIdol Master Platinum Stars アイマス プラチナスターズ Game Review
Idol Master Platinum Stars アイマス プラチナスターズ Game Review
 
Новый взгляд на визуализацию информации
Новый взгляд на визуализацию информацииНовый взгляд на визуализацию информации
Новый взгляд на визуализацию информации
 
Entrepreneurial Lessons 2012
Entrepreneurial Lessons 2012Entrepreneurial Lessons 2012
Entrepreneurial Lessons 2012
 

Ähnlich wie Microservices Tracing with Spring Cloud and Zipkin (devoxx)

Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
richard-rodger-awssofia-microservices-2019.pdf
richard-rodger-awssofia-microservices-2019.pdfrichard-rodger-awssofia-microservices-2019.pdf
richard-rodger-awssofia-microservices-2019.pdf
Richard Rodger
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
confluent
 

Ähnlich wie Microservices Tracing with Spring Cloud and Zipkin (devoxx) (9)

Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
 
Debugging data pipelines @OLA by Karan Kumar
Debugging data pipelines @OLA by Karan KumarDebugging data pipelines @OLA by Karan Kumar
Debugging data pipelines @OLA by Karan Kumar
 
richard-rodger-awssofia-microservices-2019.pdf
richard-rodger-awssofia-microservices-2019.pdfrichard-rodger-awssofia-microservices-2019.pdf
richard-rodger-awssofia-microservices-2019.pdf
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
 
Store stream data on Data Lake
Store stream data on Data LakeStore stream data on Data Lake
Store stream data on Data Lake
 
Testing and Developing gRPC APIs
Testing and Developing gRPC APIsTesting and Developing gRPC APIs
Testing and Developing gRPC APIs
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
 
Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...Auditing data and answering the life long question, is it the end of the day ...
Auditing data and answering the life long question, is it the end of the day ...
 

Mehr von Marcin Grzejszczak

Mehr von Marcin Grzejszczak (10)

Consumer Driven Contracts and Your Microservice Architecture
Consumer Driven Contracts and Your Microservice ArchitectureConsumer Driven Contracts and Your Microservice Architecture
Consumer Driven Contracts and Your Microservice Architecture
 
Continuous Deployment of your Application @jSession#5
Continuous Deployment of your Application @jSession#5Continuous Deployment of your Application @jSession#5
Continuous Deployment of your Application @jSession#5
 
Continuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfestContinuous Deployment of your Application @JUGtoberfest
Continuous Deployment of your Application @JUGtoberfest
 
Continuous Deployment To The Cloud @DevoxxPL 2017
Continuous Deployment To The Cloud @DevoxxPL 2017 Continuous Deployment To The Cloud @DevoxxPL 2017
Continuous Deployment To The Cloud @DevoxxPL 2017
 
Continuous Deployment To The Cloud
Continuous Deployment To The CloudContinuous Deployment To The Cloud
Continuous Deployment To The Cloud
 
Consumer Driven Contracts To Enable API Evolution @Geecon
Consumer Driven Contracts To Enable API Evolution @GeeconConsumer Driven Contracts To Enable API Evolution @Geecon
Consumer Driven Contracts To Enable API Evolution @Geecon
 
Continuous Deployment To The Cloud With Spring Cloud Pipelines @WarsawCloudNa...
Continuous Deployment To The Cloud With Spring Cloud Pipelines @WarsawCloudNa...Continuous Deployment To The Cloud With Spring Cloud Pipelines @WarsawCloudNa...
Continuous Deployment To The Cloud With Spring Cloud Pipelines @WarsawCloudNa...
 
Microservices - enough with theory, let's do some code @Geecon Prague 2015
Microservices - enough with theory, let's do some code @Geecon Prague 2015Microservices - enough with theory, let's do some code @Geecon Prague 2015
Microservices - enough with theory, let's do some code @Geecon Prague 2015
 
Do you think you're doing microservice architecture? What about infrastructur...
Do you think you're doing microservice architecture? What about infrastructur...Do you think you're doing microservice architecture? What about infrastructur...
Do you think you're doing microservice architecture? What about infrastructur...
 
Introduction to Groovy runtime metaprogramming and AST transforms
Introduction to Groovy runtime metaprogramming and AST transformsIntroduction to Groovy runtime metaprogramming and AST transforms
Introduction to Groovy runtime metaprogramming and AST transforms
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Microservices Tracing with Spring Cloud and Zipkin (devoxx)

  • 1. Microservices tracing with Spring Cloud and Zipkin Marcin Grzejszczak Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  • 2. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 About me Developer at Pivotal Part of Spring Cloud Team Working with OSS: ● Accurest - Consumer Driven Contracts verifier for Java ● JSON Assert - fluent JSON assertions ● Spock Subjects Collaborators Extension ● Gradle Test Profiler ● Up To Date Gradle Plugin TWITTER: @MGrzejszczak BLOG: http://TOOMUCHCODING.COM
  • 4. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Agenda What is distributed tracing? How to correlate logs with Spring Cloud Sleuth? How to visualize latency with Spring Cloud Sleuth and Zipkin?
  • 5. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 An ordinary system...
  • 6. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 UI calls backend UI -> BACKEND
  • 7. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Everything is awesome CLICK 200
  • 8. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Until it’s not CLICK 500
  • 10. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Time to debug https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
  • 11. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 It doesn’t look like this
  • 12. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 More like this
  • 13. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 On which server / instance was the exception thrown?
  • 14. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 SSH and grep for ERROR to find it?
  • 15. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 16. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 17. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Span The basic unit of work (e.g. sending RPC) ● Spans are started and stopped ● They keep track of their timing information ● Once you create a span, you must stop it at some point in the future ● Has a parent and can have multiple children
  • 18. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Trace A set of spans forming a tree-like structure. ● For example, if you are running a book store then ○ Trace could be retriving a list of available books ○ Assuming that to retrive the books you have to send 3 requests to 3 services then you could have at least 3 spans (1 for each hop) forming 1 trace
  • 19. SERVICE 1 REQUEST No Trace Id No Span Id RESPONSE SERVICE 2 SERVICE 3 Trace Id = X Span Id = A Trace Id = X Span Id = A Trace Id = X Span Id = A REQUEST RESPONSE Trace Id = X Span Id = B Client Send Trace Id = X Span Id = B Client Received Trace Id = X Span Id = B Server Received Trace Id = X Span Id = C Trace Id = X Span Id = B Server Sent REQUEST RESPONSE Trace Id = X Span Id = D Client Send Trace Id = X Span Id = D Client Received Trace Id = X Span Id = D Server Received Trace Id = X Span Id = E Trace Id = X Span Id = D Server Sent Trace Id = X Span Id = E SERVICE 4 REQUEST RESPONSE Trace Id = X Span Id = F Client Send Trace Id = X Span Id = F Client Received Trace Id = X Span Id = F Server Received Trace Id = X Span Id = G Trace Id = X Span Id = F Server Sent Trace Id = X Span Id = G Trace Id = X Span Id = C
  • 20. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Span Id = A Parent Id = null Span Id = B Parent Id = A Span Id = C Parent Id = B Span Id = D Parent Id = C Span Id = E Parent Id = D Span Id = F Parent Id = C Span Id = G Parent Id = F
  • 21. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Is it that simple?
  • 22. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Is it that simple? How do you pass tracing information (incl. Trace ID) between: ● different libraries? ● thread pools? ● asynchronous communication? ● …?
  • 23. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 What if you forget about a thread pool? SERVICE 1 REQUEST NO TRACE RESPONSE SERVICE 2 SERVICE 3 A A A REQUEST RESPONSE A A A B A REQUEST RESPONSE B B C C C C SERVICE 4 REQUEST RESPONSE B B D D D D B
  • 24. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Log correlation with Spring Cloud Sleuth We take care of passing tracing information between threads / libraries / contexts for ● Hystrix ● RxJava ● Rest Template ● Feign ● Messaging with Spring Integration ● Zuul ● ... If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
  • 25. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Now let’s aggregate the logs! Instead of SSHing to the machines aggregate the logs! ● With Cloud Foundry’s (CF) Loggergator the logs from different instances are streamed into a single place ● You can harvest your logs with Logstash Forwarder / FileBeat ● You can use ELK stack to stream and visualize the logs
  • 26. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.SR1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency>
  • 27. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-sleuth" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1" } }
  • 28. SERVICE 1 /start REQUEST RESPONSE SERVICE 2 SERVICE 3 REQUEST RESPONSE REQUEST RESPONSE SERVICE 4 REQUEST RESPONSE “Hello from service3” “Hello from service4” “Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]”
  • 30. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Log correlation with Spring Cloud Sleuth DEMO
  • 34. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Great! We’ve found the exception! But meanwhile....
  • 35. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 The system is slow... CLICK 200
  • 36. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 One of the services is slow?
  • 37. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Which one? How to measure that?
  • 38. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● Client Send (CS) - The client has made a request ● Server Received (SR) - The server side got the request and will start processing ● Server Send (SS) - Annotated upon completion of request processing ● Client Received (CR) - The client has successfully received the response from the server side Let’s log events!
  • 39. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 40. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● The request started at T=0ms ● It took 450 ms for the client to receive a response ● Server side received the request at T=100 ms ● The request got processed on the server side in 200 ms Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 41. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Why is there a delay between sending and receiving messages?!!11!one!?!1! Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 42. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 https://blogs.oracle.com/jag/resource/Fallacies.html
  • 43. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 44. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Logs Represents an event in time associated with a span ● Every span has zero or more logs ● Each log is a timestamped event name ● Event should be the stable name of some notable moment in the lifetime of a span ● For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla. org/en-US/docs/Web/API/PerformanceTiming)
  • 46. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Main logs ● Client Send (CS) ○ The client has made a request - the span was started ● Server Received (SR) ○ The server side got the request and will start processing it ○ SR timestamp - CS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms
  • 47. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Main logs ● Server Send (SS) ○ Annotated upon completion of request processing ○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME ● Client Received (CR) ○ The client has successfully received the response from the server side ○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE ○ CR timestamp - SS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 48. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Key-value pair ● Every span may have zero or more key/value Tags ● They do not have timestamps and simply annotate the spans. ● Example of default tags in Sleuth ○ message/payload-size ○ http.method ○ commandKey for Hystrix Tag
  • 49. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 How to visualise latency in a distributed system?
  • 50. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● Zipkin is a distributed tracing system ● It runs as a separate process (you can run it as a Spring Boot application) ● It helps gather timing data needed to troubleshoot latency problems in microservice architectures ● The front end is a "waterfall" style graph of service calls showing call durations as horizontal bars The answer is: Zipkin
  • 51. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 How does Zipkin work? SPANS SENT TO COLLECTORS SPANS SENT TO COLLECTORS STORE IN DB APP APP UI QUERIES FOR TRACE INFO VIA API
  • 52. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth and Zipkin integration ● We take care of passing tracing information between threads / libraries / contexts ● Upon closing of a Span we will send it to Zipkin ○ either via HTTP (spring-cloud-sleuth-zipkin) ○ or via Spring Cloud Stream (spring-cloud-sleuth-stream) ● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring- cloud-sleuth-zipkin-stream) ○ you can add the dependency to Zipkin UI!
  • 53. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth Zipkin with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.SR1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency>
  • 54. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth Zipkin with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-zipkin" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1" } }
  • 55. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 HOLD IT! ● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
  • 56. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Sampling to the rescue! ● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin ● You can change that by changing the property spring.sleuth.sampler.percentage (for 100% pass 1.0) ● Or register a custom org.springframework.cloud.sleuth.Sampler implementation
  • 57. SERVICE 1 /start REQUEST RESPONSE SERVICE 2 /foo SERVICE 3 /barREQUEST RESPONSE REQUEST RESPONSE SERVICE 4 /baz REQUEST RESPONSE DEVOXX SERVICE /devoxx REQUEST RESPONSE
  • 59. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call
  • 60. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call TOTAL DURATION END START
  • 61. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call CLIENT SENT CLIENT RECEIVED SERVICE 2CLIENT
  • 62. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call SERVER RECEIVED SERVER SENT SERVICE 4SERVER
  • 63. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call LATENCY SERVER RECEIVED CLIENT SENT
  • 64. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call SERVER RECEIVED CLIENT SENT DIFF IS LATENCY
  • 65. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Zipkin for Brewery ● A test app for Spring Cloud end to end tests ● Source code: https://github.com/spring-cloud-samples/brewery ● Around 10 applications involved
  • 68. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Summary ● Log correlation allows you to match logs for a given trace ● Distributed tracing allows you to quickly see latency issues in your system ● Zipkin is a great tool to visualize the latency graph and system dependencies ● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
  • 70. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 THANK YOU ● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone and run getReadyForConference.sh - NOTE: you need Vagrant!) ● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository ● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation ● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release ● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server ● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud Foundry - point of entry to the app ● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry ● http://docsbrewing-zipkin-server.cfapps.io - Zipkin deployed to PCF for Brewery Sample app