The service mesh is an infrastructure component that helps manage services running within our clusters. Without any changes to service or application code, solutions like Istio and Linkerd provide features to manage container deployments at scale. With Istio we get traffic management, security, rate limiting, monitoring, and many more things out of the box. We will discuss these solutions and some of their features at a high level, then roll in some specific demonstrations of using a service mesh to route and shift service traffic, easily manage deployments and test our services with micro benchmarks and fault injection.
2. About Us
Jesse Butler
• Oracle via Sun Microsystems
• Responsible for Docker on Solaris,
later on Oracle Linux
• Some work with Open Containers and
CNCF WGs
• Now a Cloud Native Advocate @
Oracle Cloud
@jlb13
2
Mofizur Rahman (Mofi)
• Developer Advocate @IBM
• Works on container and cloud native
technologies
• Favorite programming language is
golang.
@moficodes
7. Microservices
• Microservices are the de facto
standard for cloud native software
• Microservices allow development
teams to deploy portable and scalable
applications
8
14. DevOps, Mother of Invention
15
• Microservices
• CI / CD
• Cloud Adoption
• Containers
15. Docker
• Docker changed the way we build and
ship software
• Application and host are decoupled,
making application services portable
• Containers are an implementation
detail, but a critical one
16
24. Kubernetes as a Platform
• Infrastructure resource abstraction
• Cluster software where one or more
masters control worker nodes
• Scheduler deploys work to the nodes
• Work is deployed in groups of containers
25
27. …to Cloud Native Kubernetes Hotness
• Microservices running in orchestrated
containers
• Everybody's happy
• What happens now?
28
Load
balancer
Service Service
Database
Service
Queue
28. …to Cloud Native Kubernetes Hotness
• Microservices running in orchestrated
containers
• Everybody's happy
• What happens now?
29
Load
balancer
Service
Service
Service Service
Service
Database
Service
Queue
33. Table Stakes for Services at Cloud
Scale
• We require a method to simply and repeatably deploy software,
and simply and recoverably modify deployments
• We require telemetry, observability, and diagnosability for our
software if we hope to run at cloud scale
34
34. Day 2 Solutions
• Ingress and Traffic Management
35
• Metrics and Analytics
• Tracing and Observability
• Identity and Security
37. These are Hard Problems™, and
some software may address one of
them well.
Service mesh addresses them all.
38
38. What Is a Service Mesh?
• Infrastructure layer for controlling and
monitoring service-to-service traffic
• A data plane deployed alongside
application services, and a control
plane used to manage the mesh
39
39. Service Mesh
• Provides DevOps teams a stable and
extensible platform to monitor and
maintain deployed services
• For the most part, invisible to
development teams
40
40. Service Mesh
• This is not a new solution which solves all the
world’s problems, but a different way to apply
existing solutions
• Enables integration of existing (as well as future)
best-in-class solutions for All The Things
41
41. Let’s Talk About Istio
Istio a service mesh that allows us to connect,
secure, control and observe services at scale,
often requiring no service code modification
42
42. Istio Components
• Envoy
• Sidecar proxy
• Pilot
• Propagates rules to
sidecars
43
• Mixer
– Enforces access control,
collects telemetry data
• Citadel
– Service-to-service and
end-user AuthN and AuthZ
43. Istio Features
• Traffic Management
• Fine-grained control with rich routing rules, retries,
failovers, and fault injection
• Observability
• Automatic metrics, logs, and traces for all traffic within a
cluster, including cluster ingress and egress
44
44. Istio Features
• Security
• Strong identity-based AuthN and AuthZ layer, secure by
default for ingress, egress and service-to-service traffic
• Policy
• Extensible policy engine supporting access controls, rate
limits and quotas
45
57. Telemetry
• Istio’s Mixer is stateless and does not manage
any persistent storage of its own
• Capable of accumulating a large amount of
transient ephemeral state
• Designed to be a highly reliable, goal is >
99.999% uptime for any individual instance
• Many adapters available: Prometheus, Cloud
providers, Datadog, Solarwinds…
58
58. Performance and Scalability
• Code level micro-benchmarks
• Synthetic end-to-end benchmarks across various
scenarios
• Realistic complex app end-to-end benchmarks
across various settings
• Automation to ensure performance doesn’t
regress
59
59. Security
• Traffic encryption to defend against the man-
in-the-middle attacks
• Mutual TLS and fine-grained access policies
to provide flexible access control
• Auditing tools to monitor all of it
60
MR - Lets talk about why we are really here today. Bookinfo app. This is the next best thing after sliced bread. -> next
MR – So Jesse, should we build a nice little monolith for our app?
I mean it will be the easiest to build.
JB – But what if we have to change and scale the app differently for different functionality. And who are we kidding. This app will be so big, soon we will have to hire a full team to manage this.
MR – That makes sense. In a monolith all the apps will talk to same db and any change will mean we change the whole deployment.
JB – Also adding new features is not always easy. More over you are kind of stuck to the same language stack. I know Java and Ruby. But you probably want to write the app with something you feel more comfortable with.
MR – Right tool for the job amirite.
Audience check
JB – Lets build this app using microservices instead. You can write your application based on functionality. We can then Connect it all up using http protocol.
MR – That makes sense. We decouple our functionalities so we can build and grow our applications individually. Nothing looks different from the point of view of our user. But makes it much easier to make a change or improvement with out breaking the entire application.
JB – I took the liberty to design out the application while you were talking. We will have 4 microservices.
MR – Why do we have 3 different reviews app.
JB – Well our intensive market research shows some people may not like the stars at the review. Others has issues with the color of the stars. So we will experiment with the different colors for now.
MR – Btw, we open sourced this code for all to take and learn from
MR – So we are done. App is written. We just get these to run. And
JB – Not so fast. We actually have to go through dev ops first.
MR – We have our application now. But since they are all independently developed and deployable. Devops is a whole another issue.
JB – I ran the code in my computer and it worked. But bob from ops team said something is wrong with the deployment.
MR – BUT DID you tell him though.
JB – Tell him what?
Microservices – iterative development, rapid release, super fact pivot and time to market
CI / CD - if you’re going to release fast, you need a system to keep up with it
Cloud adoption - while not strictly required, at this point… yeah.
Containers. - just as with microservice architecture and ci / cd – this was a good fit for the methodology
these technologies became prolific and ubiquitous because they were needed by the methodology, we did not adapt the methodology to fit the tech
JB – We can now dockerize our app in a container that has all the dependency it needs to run our application. Any place that has a docker engine now can run the app the same way for the most part.
MR – Docker changed the game in many way. While not a new technology. My Linux diehards know the tech to isolate namespace in linux have been around forever. But docker lets us use easy apis and tooling to makes this process much more streamlined.
MR – So jesse, I dockerized all our microservices.
JB – How does one dockerize and APP.
MR – Using this artifacts called dockerfiles of course. I can define everything I need to run my app. And docker builds it for me. Then I can push these of to a registry.
JB – Hah, too slow. I already did.
JB – All we have to do now. Is run these docker images.
MR – Nice. You know what this means?
MR – So our app is ready. Lets run it.
JB - Well one container went down. No big deal. I can spin up more.
MR – Wait more stuff is going down. And I cant really check why. Spin up more.
JB – I don’t know whats happening. Its all on fire.
MR – we really need to have a way to manage this containers. I wish there was a way that the platform could handle auto recovery from failure and scaling.
JB – There might just be a way.
MR – Kubernetes to the rescue.
JB - Kubernetes provides abstractions for deploying software in containers at scale
Again, out of necessity - containers were everywhere, and various orchestration options arose. Mesos, Docker Swarm, others… Kubernetes won
IMR - nfrastrcture resources are abstracted in a cluster of worker nodes, and the cluster has a scheduler which deploys work to those nodes
So, everything we need…
JB – I know we have spent countless hours in making our app dockerized. How do we now get it to run on kubernetes.
MR – Its not that bad. We kubernetize our app (you heard it here first folks, I am trying to get this word trending, some one tweet #kubernetize )
JB – I see you are using the image we built and pushed up earlier.
MR – Yeah and we are opening up the same port for communication.
JB – <Talks about deployment>
MR - <Talks about service>
JB - so now rather than a monolithic application, running in bespoke compute environments
JB - this is day one. (-> next)
JB - what do we pick up on day two? or put another way, what happens when we succeed, and our prototyped happy path software needs to scale?
JB – But in either way, our app is running, and we are ready to do anything.
MR – You know what this mean.
JB – But wait, did not kelcey warn against this.
MR – IDK Jesse our way seems pretty solid. Lets see where it goes.
JB – App is running great. We getting some traffic, but Kubernetes has autoscaling.
MR – Yeah. And we don’t have time debugging a failed instance. Also kubernetes just spins up a new copy so we cant even look at the logs.
JB – More stuff is going down. Why is that one service taking forever to respond. Wait that’s returning 500 now?
MR – Wait didn’t we just have something like this with our docker setup.
JB – I don’t know Mofi. I think we just need to live with this.
MR – Maybe there is something more out there.
MR – Back to the drawing board with this one. We need a way to simply and repeatably deploy, and simply and recoverably modify. Kubernetes has our backs there.
JB - We need to re-establish telemetry, observability, and diagnosibility as table stakes for computing at scale. Here, we need to bring some stuff to the party. I want to benefit of microservices with the observability of a monolith.
MR – No worries. ingress and traffic managemnet, we can use nginx for that.
JB - tracing and observabilty, No sweat, open tracing to the rescue.
MR - metrics and analytics. Prometheus all day.
JB –I know we are forgetting something. Uhh. Wait security. We never thought about security. I think we can use vault for that.
MR – so it kind of sounds like what we want is something that gives us these things.
MR – These looks pretty hard to get.
JB – Hard things are hard mofi.
JB - …
MR – Well I know, we will use a service mesh. I it in a talk once.
JB – Whats with you and throwing around big words randomly like that?
MR - The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them.
JB - A service mesh is an ideal component in a DevOps environment, as it provides operators with a stable and extensible platform for all of the work needed to maintain and improve the platform, while it remains completely invisible to developers
MR - Service Mesh
This is not a new solution which solves all the world's problems
It allows for integration of all existing (and future) best in class solutions for All the Things
JB - First time I’ve used that gif unsarcastically
So, how does it do that? For that…
MR - Istio a service mesh that allows us to connect, secure, control and observe services at scale, often requiring no service source code modification
Envoy
we’ve talked about this, the proxy
pilot
converts high level routing rules that control traffic behavior into Envoy-specific configurations, and propagates them to the sidecars at runtime
Mixer
Mixer enforces access control and usage policies across the service mesh, and collects telemetry data from the Envoy proxy and other services
citadel
provides strong service-to-service and end-user authentication with built-in identity and credential management.
JB - fine-grained traffic control with rich routing rules, retries, failovers, and fault injection
Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress and egress
MR - security - a strong identity-based authentication and authorization layer which is secure by default for ingress, egress and service-to-service traffic
policy - layered over all of this is a pluggable policy engine supporting access controls, rate limits and quotas
JB - typically we get all of this with little to no application code changes
MR – I don’t believe you.
MR – Yeah I am with skeptical kid on this one. How would that even work?
What does this look like:
two services talking to each other – HTTP GET, simple
JB - IP Tables rules are automated to intercept all service traffic and reroute
to proxy
MR - The proxy has rules and policies to follow, and after considering policy,
routing rules, etc, it forwards the traffic to the appropriate service
Let’s talk about that little proxy box
MR – So Jesse things this is the work of the Hackermant
JB – and mofi things this is just magic
MR – But in reality this is all done at the envoy level.
JB - Definitely could be its own talk, and there are many out there to check out.
Envoy is a good example of Istio surfacing other features of a best-in-class component through its mesh
.
Ok so, back to this
HTTP HTTP/2 supported, gRPC, or anything over TCP… with or w/o mTLs
pilot
managing the proxies
Mixer
handling enfocement and telemtrey pickup
citadel
authN authZ
Differentiate API gateway – primaritly north-south, vs service mesh east-west
Istio has gateway’s which provide ingress for the mesh
Betyond that, a lot of day-to-day becomes really simple – canary, traffic mixing for blue/green, AB testing
We get most of this for free
More on testing
Integrated benchmarking virtually free, making it incredibly easy to catch version-to-version regression
And all of this is safe out of the box, secure by default in depth with multiple components