2. Agenda
• What is microservices?
• Benefits / Challenges
• Designing microservices
• Migrating to microservices
3. What is microservices?
Small autonomous services that work together,
modeled around a business domain
- Sam Newman -
Loosely coupled service oriented architecture
with bounded contexts
- Adrian Cockcroft-
5. Why microservices? Why not monolith?
Release
Dev team A
Dev team B
Dev team C
Dev team D
Dev team E
Production
.
.
.
Common dependency
- Data schema
- Message schema
- Leaking service internals via API
- Framework/Library version
- Shared component
6. Why microservices? Why not monolith?
Dev team A
Dev team B
Dev team C
Dev team D
Dev team E
Production
.
.
.
Release
Production
Production
Production
Production
Release
Release
Release
Release
How many changes each release should include?Integration, E2E test?
7. DevOps practices
Source: 2016 DevOps pulse
How frequently do you deploy code?
Do you have continuous integration in place?
Do you have continuous deployment in place?
9. microservices - Benefits
• Continuous innovation
• Independent deployments
• Technology diversity
• Small focused team
• Separate scalability/availability
• Fault isolation
10. microservices - Challenges
• Complexity
• Network congestion
• Data integrity/consistency
• Integration and versioning
• Testing
• Reliability
• Service discovery and routing
• Monitoring and logging
11. microservices - Principles
• Model services around a business domain
• Make each service independently deployable
• Decentralize all things
• Hide implementation details
• Data is private to its service
• Automate DevOps tasks
• Isolate failure
12. Designing microservices
• Service boundary
• Granularity
• Gateway
• Offloading, Aggregation, Routing
• Inter service communication
• Sync/Async, Protocol/Serialization, Messaging
• Data management
• Integrity/Consistency
• Distributed transactions
• Dealing with partial failure
• Monitoring
• Sidecar, Distributed tracing
13. Finding service boundary
• Start with bounded context
• Further breakdown per non-functional requirements
• Vertical decomposition rather than horizontal (layers)
• Also consider
• Rate of change
• Technology used
• Communication overhead
• Splitting data is challenge due to consistency issues
• Refactoring across boundary is an extremely expensive operation
15. Inter service communication
Svc
A
Svc
B
Svc
D
Svc
E
GW
North – South request
East–West
Challenges
- Endpoint proliferation
- East – West chattiness
- Overhead by serialization
- Different svc lifecycle requires decoupling
- Versioning
- IP masquerading
16. Data integrity/consistency
Survey Tenant
Survey ID Tenant ID Questions Tenant ID Tenant Name Survey count
1234 001 001 Fabrikam 99Do you know Surface?
Replica of
Tenant DB
Survey
Event
Atomic?
N < 100 ?
Yes
Message Broker
100
Single
Transaction
scope
17. Decoupling data by CQRS
Survey
Survey
Analysis
Survey ID Tenant ID Answers
1234 001 2.0, 3.5, 4.0, …
Survey ID Tenant Name Answers
1234 Fabrikam 992.0, 3.5, 4.0, …
MessageBroker
Write model Read model
Eventually consistent
18. Reversible workflow by Sagas
• Sagas is long running transactions that can be written as a sequence
of transactions that can be interleaved with other transactions.
- Hector Garcia Molina , et al. 1987-
Svc
A
Svc
B
Svc
C
Order
Mgmt
Place an order
Decrease stock level
Delegate shipping
Cancel
Retry
Managing State
Retry
Timeout
Concurrency
scheduler agent supervisor pattern
19. Monitoring microservices
Svc
A
Svc
B
Svc
D
GW
- Correlating distributed transactions
- Sidecar pattern for cross language env
- APM tools
- Logging at GW
- Monitor system as a whole
Side
car
Side
car
Side
car
Activity #1
Activity #1
Activity #1
20. Options to implement microservices on Azure
• Service Fabric
• Azure Container Service
• Azure functions
• Docker cloud (supports Azure)
• Docker on Virtual Machine
• App service
21. Container and orchestration
DevOps
User
Application
Gateway
Application Host
Master
Image
Registry
Nginx /HA proxy
App GW
Docker Hub
ACR
Docker engine on
Virtual Machines
Kubernetes
Marathon
Swarm
Request
Repository
Validation
Cluster state
store
Etcd
Consul
Zookeeper
Administor
Docker
imageDocker
image
Node state tracking
Discovery
Leader election
Deployment
Cluster management
Routing
Load balancing
Offloading
Run services
23. Migrating monolith to microservices
• Extract one service at a time
• Add glue code that takes care of dirty work
• Strangler / Anti-corruption layer in transition period
Monolith
(Big ball of mud)
microservicesCoarse grained
Somewhat decomposed
26. Summary
• microservices is not something running on containers
• Choose microservices for continuous innovation
• Independent deployment is the key
• Incrementally migrate from monolith to MSA
• Use a few hosting options on Azure
27. Resources
• Microservices with Docker on Microsoft Azure (Trent Swanson, et al.)
• Building microservices (Sam Newman)
• Microservice architecture (Irakli Nadareishvili, et al.)
• https://www.nginx.com/blog/introduction-to-microservices/
• http://www.vinaysahni.com/best-practices-for-building-a-microservice-
architecture
• http://www.grahamlea.com/2015/07/microservices-security-questions/
• Principles of Microservices by Sam Newman
• Adrian Cockcroft on InfoQ
• Service fabric training course on MVA
Hinweis der Redaktion
Why do we need to consider moving to MS or we don’t need to?
You don’t necessarily always move to MS. What kind of benefit you get, does it justify paying lots of effort?
In order to take advantage of benefits and address challenges, you need to design it correctly.
What are options to implement MS on Azure.
This is a quote from Sam Newman
Another quote from Adrian Cockcroft
Traditional 3 tier or N tier architecture with horizontal layering.
Most of the great MSA examples uses to be in this design, Netflix, LinkedIn, Uber etc.
MSA is a Vertical (functional) slice of monolith. Each and every function becomes separate MS
One service represents one responsibility
Instead of having a single DB, most likely Each service may choose the right storage types from RDB or NoSQL.
. Polyglot persistence approach
As part of refactoring, MS will expose API , UI layer become SPA or mobile that consume that API
Why do we need to move to MSA? There’re reasons users are moving to MSA. This is #1 reason among them.
This is what we do with monolith. Unified single code base with a single release pipeline.
A bug found in one team will block the whole release process
If the size of the developer goes big, then it’s not manageable.
Testing, fixing, breaking dependentants are the blocker
Feature C depends on A in terms of.
- Data schema
Leaking service internals via API
Library version
- Service bus / API gateway (Fat gateway)
With MSA, each team owns its service and deploy them separately into separate production environment most likely containers which is isolated from each other.
So even if one of the services has a problem, it wouldn’t affect other services.
Obvious question is How much changes each release should include?
Netflix allows to deploy only one significant change at a time.
Another question, How can we do integration testing or E2E testing?
- Test against contract using stub, tools like Pact (test both consumer/provider against contract)
Canary release
Test in production
Roll back deployment
Now we understand that the benefit of MSA is faster release but do we care such a faster frequent release cycle?
Let’s see how often users are deploying new code.
According to DevOps pulse, 20% of respondents deploy new code multiple times a day, 60% do more than a few times a week.
CI/CD adoption is also very high.
More than 70% of users already have them in place or working on it.
Almost same number to continuous deployment.
Quite a lot of users already adopted these DevOps practice and the adoption rate is increasing.
Let’s take a look at a few case studies.
Netflix started refactoring their system in 2008. Back then their business was sending DVD. Now it changed primarily to live streaming where user retention matters. Longer a user stays the better. In fact 0.1% gain in retention means a lot to their business. So they always do A/B testing in terms of UI, algorithm etc. Then deploy new feature very quick. They call this feature velocity which now became one of their core competence.
They started moving monolith in on-prem to MSA in the cloud which took 7 years to complete. Now they’re running 400+ microservices. They OSSed their components as Hystrix.
Pokemon GO had 50x more traffic than they expected on the day they launched it. Their app is running on Kubenetes on Google container engine. Because of the influx of requests, the LB became a bottleneck. They needed to optimize LB to meet the demands. It is the largest Kubernetes deployment on GCE.
Spotify serves 75 mil users in 58 countries. They say their business rules are incredibly complex.
They had very common challenges such as UX depends on shared library which depends on server and infrastructure so that was super hard to change something without breaking other part of the system. In order to solve this issue they split 600+ devs into 90+ teams. Each team own its autonomous service. Now they have 800+ services. They OSSed components as Apollo.
LinkedIn also started as monolith (Java, servlet, Oracle), then went to CQRS architecture to support member-connection graph.
Then they went into 300+ fine-grained services with monolithic build and release process meaning they didn’t exploit microservices benefit yet. Eventually they optimized build and release process to take full advantage of MSA.
One interesting fact is that they brought the nortion of tiers into MSA as a way to manage dependency. Services in front tier only talk to middle tier, middle tier only talk to backend tier. The services in backend tier talk each other but never go outside its boundary. It’s tier within the bounded context.
They depend Deco for cross domain data reference and REST.LI for inter service communication.
Well, you may think that I’m not developing Netflix, nor Pokemon Go..
However MSA gives you benefits regardless of the size of company or service.
Most of the benefits are from DevOps point of view.
Biggest benefit among all is that it enables continuous innovation. Like we see in Netflix.
To support continuous innovation, it needs to be deployed independently without worrying about other services
MSA enables it by isolating each service.
By isolating microservices, you can choose right technology for right purpose such as library, language etc. You’re free from version hell too.
Decomposition of monolith enables them to scale differently.
Ramping up time will be short because of the size of service
Fault isolation can’t come with free. You need to explicitly isolate faults by bulkhead, circuit breaker etc.
You won’t get these benefits for free. You have to design MSA in a way to gain them.
We’ll discuss it a few slides later
New arch style always come with new set of challenges. In the case of microservices, This is it.
Each service gets simpler with a clear focus and boundary
In exchange, now the complexity shifts to orchestration part
Since each service owns it’s private data they need to communicate via API. When you have 100s of services talking to each other, you now notice that how much congestion you’ll see.
For the same reason, you’ll see lots of data integrity/consistency issues because of data fragmentation.
Since each service can be developed and deployed independently, integrating them and versioning them
Testing MSA is one of the biggest challenges because of service dependencies. Integration/E2E testing will be challenge. Rate of change is different, luck of tools. Stubs and mocks are required. Importance of automation.
More service means more surface area to break. In dependent deployment doesn’t mean there’s no runtime dependency
In MSA, service location is dynamic, based on cluster state, resource usage. Relocation is common. You need to route request to the right instance.
Monitoring hundreds of services that are potentially using different technologies, languages is a big challenge. We’ll discuss this later in the session.
Density in the cluster make it difficult
http://mantl.io/technologies
Don’t do layering across business domains which causes dependency
No shared Library, configuration, framework. Everything needs to be self-contained so it can be deployed independently
Decentralize repository, build process, LB, data and governance
Don’t leak implementation details to other services which causes dependency
Data must be private to its service meaning anybody else can’t directly access other’s data
N times of deployment for N services. Automation is the key to solve that issue.
All techniques that we discussed in resiliency session can apply here. Circuit breaker, bulkhead, retry, timeout etc.
We’re expanding the notion of high cohesion loosely coupling to DevOps area such as build, test, deploy for faster evolution
How can we design these principles in your system? Let’s take a look.
Now we understand the principles, how can we bring them to life?
Translate the principles into actionable design items.
If you scan through existing articles, videos, and training materials, these are the topics that they’re talking about.
How do we know about the right granularity? Smaller is better as in lean startup, agile manifesto, CI/CD practice?
Smaller means more services means more orchestration among services and data they own.
GW plays three key roles. We’ll discuss them later.
With 100s of services, inter SVC comm has lots of challenges, perf overhead, decoupling
How can you maintain data integrity, consistency across partitions?
Since we have small granular services, A transaction is most likely span across multiple services. How can we deal with partial failure.
Monitoring one service may easy but monitoring system as a whole is difficult.
Async IO w/ REST or Async messaging
Consul supports HEM pattern
Understanding the domain is super important
BC is the boundary of the system where same model / language applies. It could be team organization, data type, infrastructure etc. It is a way for separation of concerns.
Workload with diff Scalability, availability, security can be split into diff services
It’s not always smaller is better.
Other boundaries.
Deployment, Dev/Test, Communication, Technology, Resource allocation, scale, SLA, security, data to fetch, discovery, team, monitoring, versioning etc.
Routing based on IP+port#. It also has to consider node state.
If majority of services are responsible for the same thing such as logging, caching, authentication etc. It makes sense to offload it to GW.
There’re commercial or OSS products that support this scenario.
Azure App GW, Nginx, HA proxy, Traefik ( https://docs.traefik.io/) are good examples
OpenID Connect for consumer services, LDAP for enterprise
Fat gateway is an anti-pattern. Too much domain knowledge in GW becomes a blocker for fast deployment.That’s the mistake we made in SOA.
Gateway can be SPOF or perf bottleneck. That’s what happened at Pockemon GO lanuch event.
This slide has list of challenges rather than practices. I want to emphasize how important the networking is in MSA.
If there’re 100s of service each exposing endpoint, it’s hard to discover, load balance, protect, etc.
If you rotate this picture 90 degrees clock wise, it’ll be clear.
Especially, N-S requests becomes lots of E-W calls. We’ll see lots of East – West chattiness
Serialization-deserialization becomes performance overhead. Protobuf, Avro, Json, etc.
Centralized LB vs. decentralized LB (Service Fabric), Central one has better knowledge about state, decentralized one is handling distribution.
100s of services have different lifecycle, the destination of the service call may not be up and running. That’s why using message broker makes lots of sense because it keeps requests as messages while the destination is down. Then they’ll be processed afterwards.
When you update API, make sure it’s backward compatible. Or you can have 2 versions running SxS and gradually migrate from old version to new. There’re a few API versioning techniques such as using URL, query string, header etc. Choose the right one and use it consistently across all services.
Since IP address per container is masqueraded by default, NVA can’t protect them.
REST.LI: Framework for RESTful API used by LinkedIn
Thrift: Framework for cross-language RPC
Let’s say a tenant can create survey up to 100.
When they create a new one, we need to check if it’s still < 100. But directly accessing data in different service is anti-pattern. How can we deal with this situation?
One way is to replicate data from other service but it’s anti-pattern fro the same reason.
So the right solution is to access tenant’s data through tenant service.
Another scenario is where we need atomic operation. Let’s say when you create a new survey, the survey count needs to be incremented as atomic operation. How can we do this?
In order to make them atomic, DB write and sending msg needs to be in a single trx.
Instead, update 2 DB (Order, Order Event) as a single trx, then you’re safe to send a msg.
Reporting and analysis are very difficult workload in terms of data management.
Survey and Survey analysis obviously need to access the same data survey. But it’s not good in terms of dependency management. So instead we should separate read model from write. CQRS makes sense because most likely analysis needs only a subset of survey data schema so we can optimize read model to analysis workload.
Using transaction log or event sourcing is another option. There’re OSS components that supports this scenario.
Since it’s highly distributed, most likely one transaction requires multiple operations across services w/o DTC.
Serealizability and ACID is not longer relevant in the cloud. I don’t want to talk about CAP theorem because we all know that.
How can we deal with partial failure?
Instead of ensure consistency at DB operation level, we make the system state consistent at business level.
So in this case, the DB still look inconsistent, because we don’t roll back but system state is fine.
It may look easy but it’s not. You need to manage retry/timeout of each operation and manage the state of entire workflow. You also have to make sure no more than one worker will process the same order
SAS pattern is a good example of implementing this scenario.
http://www.cs.cornell.edu/andru/cs711/2002fa/reading/sagas.pdf
Correlating distributed transactions
Sidecar is a special container that is collocated with main container. It’s deployed into different cgroup but within the same namespace as main container so it can access the main service for monitoring.
Zipkin, Open tracing
APM tools like New Relic, Splunk etc.
Zipkin, Open tracing to correlate transactions across network boundary is the key.
They analyze the sampling of calls
https://www.youtube.com/watch?v=Q4nniyAarbs
There’re a number of options that you can choose from to implement MSA on Azure.
ACS lets you provision a cluster using DC/OS, Docker Swarm or Kubernetes.
Service Fabric is another Microsoft offering for hosting MSA
If you want to go down serverless path, then Azure functions is the right one for you. It becomes GA yesterday.
Docker cloud is the service from Docker. You can choose a provider from Azure, AWS, digital ocean etc.
You can provision VMs and install Docker yourself
App Service is another PaaS from MSFT
Docker streamlines deployment, isolation, standardize API (environment valiables, networking) for applications
Dev/Test benefit
Container is the standard format to host apps, that’s it.
The critical part is the orchestration. It’s very competitive area right now.
There’re Kubernetes, DC/OS, Docker Swarm, Nomad, now Hyper.sh is getting popular.
When I started learning containers a couple of years ago, I was always wondering which service does what?
From logical point of view there’s a set of components in the container orchestration.
You can register your container image into public/private registry. Scan the image to validate security policy etc.
Master node takes care of cluster management such as scheduling, recycling, deployment
It works with cluster state store which know what’s running where and who’s the leader
Requests from users would go through gateway and load balanced and routed to the service
Actual services are running on the app host
Minimum 5 node cluster
Each node can have multiple services running
There’s no dedicated master node. Every node acts as master as well as application host and possibly GW
Advanced features such as reliable collection, actors, rolling upgrade etc.
We started SF guidance a month ago, I wanted to share something but it’s still pre-mature so plz stay tuned.
When we migrate monolith to MS, Big bang approach wouldn’t work well
But even incremental migration is a challenge
How old monolith app can talk to the new microservices and vice versa?
How users can reach to the right service? They don’t know if it’s already migrated or not.
(Many folks recommend to start with monolith then migrate to MSA)
Strangler is the stuff that covers the tree, so you don’t see what’s inside.
The problem that this pattern solve is this.
Users used to access this feature in the old app but now it’s moved in to the new app.
We use it as analogy here. From outside the system, users don’t see what’s going on inside the system.
This is a 60 min summary of what MSA is and how to design it.
There’re numerous great materials. These are best ones among them.