So you’re decided to make the transition to a Microservice Architecture. You’ve spent time doing the research. You’ve designed out the responsibilities of each service with your team. You’ve read and memorized the entire article Martin Fowler wrote on the subject. Now, you’re running a team that’s tasked with building some Microservices. You’re built or extracted your first services. You’ve been successfully transmitting data between these services. What next? What should you be aware of? What should keep you up at night?
In this talk we’ll begin with a brief introduction to the architecture pattern before covering some of the more advanced topics when developing Microservices, focusing primarily on team management and service design philosophy. We’ll discuss the CAP theorem and why it should be your obsession. We’ll look at how Conway’s Law should be taken seriously and how it can serve as a warning to facilitate better communication between teams. Finally we’ll examine some common pitfalls of Microservices architectures and how they can be mitigated.
3. THIRDCHANNEL
Agenda
1. Service Design Philosophy (The Tech)
2. Team Organization (Less Tech)
3. What does it mean to be the CTO / Lead Tech person with
Microservices?
15. THIRDCHANNEL
CAP Theorem (Brewer’s
Conjecture)
In a Distributed System, can have two of:
• High (C)onsistency
• High (A)vailability
• High (P)artition Tolerance
18. THIRDCHANNEL
CAP Theorem (Brewer’s
Conjecture)
• High (C)onsistency, (A)vailability, or (P)artition Tolerance
• More of a Sliding Scale, Really
19. THIRDCHANNEL
CAP Theorem (Brewer’s
Conjecture)
• High (C)onsistency, (A)vailability, or (P)artition Tolerance
• More of a Sliding Scale, Really
• Focus on Speed
20. THIRDCHANNEL
CAP Theorem (Brewer’s
Conjecture)
• High (C)onsistency, (A)vailability, or (P)artition Tolerance
• More of a Sliding Scale, Really
• Focus on Speed
• Embrace Eventual Consistency
26. THIRDCHANNEL
Data Locality
• Spatial: how far away is the data?
• Temporal: how often is it accessed?
• Microservices Goal: Have highly Spatial Data and efficiently
handle Temporal Data.
32. THIRDCHANNEL
Conway’s Law
“organizations which design systems ... are constrained to
produce designs which are copies of the communication
structures of these organizations”
-Melvin Conway, 1968
http://en.wikipedia.org/wiki/Conway's_law
me
work for
This is “Managing a Microservices Development Team” or “Advanced Microservices Concerns”, “Steve spews a bunch of random Microservice facts at you”
-I don’t have much time up here, so I’d be happy to talk later about what we do at 3C. Essentially, we’re doing some very cool things with predictive analytics in the Retail space.
-for the past 6 plus months, we’ve been building up a decently sized Microservices system. Before that, I spent years as a Principal Consultant with a local agency called Cantina, where we were big proponents of Microservices
- I’m hoping to share some of the advanced concerns we’ve run into at Thirdchannel while working with the architecture
In the next few minutes, I will hopefully discuss:
Serious concerns you should have when designing the services
advanced team organization practices
what role you, as technical leaders in your organization, should have with a Microservice dev team
-Who here has heard of Microservice Architecture?
-Is anyone actually using it at your company, or worked with it previously?
-I’m going to assume familiarity , but quickly cover one of the main reasons why I gravitated towards the architecture: fighting the Monolith
A common approach to building web applications these days is to build a Monolithic application
-this particularly pops up when using a framework like Rails, Node, etc.
I consider this an anti-pattern and find them very dangerous; I saw it routinely as a consultant: it would invariably result in slow, brittle code and the dev teams would miss every release deadline.
A Monolith is a Single logical unit, everything is contained within one code base.
This diagram is of An example commerce app: cloud is the internet, box is the single code base, orange is the database, blue icons are individual components.
Monoliths are dangerous because they feel natural. It’s so easy to simply add one more controller, one more model, one more database migration script, than to think about your application in the longer term
It feels like the right thing to do. A quick, easy way to get what you need working, it seems very enticing.
Soon enough though, you’ll start running into problems:
highly complex code
development speed will grind to a halt as the complexity of the application grows
I could go on, but not a lot of time
Microservices are an alternative architecture which effectively combats the monolith.
They’re really just a distributed application… a more focused version of Service Oriented Architcture.
Taking our example, converting this to micro services would involve:
breaking up each component into its own individual deployed logical units. Individual code bases, etc.
Borrowed this slide from Martin Fowler’s website.
All the greatness of Microservices can be derived from this image
As demand on the application grows, I can scale just the components that are needed, rather than needing to replicate the whole monolith as demand grows
I could go on at length about why that approach is a huge win, but there’s not enough time and you all probably know anyway. Be happy to talk about it later.
Let’s move onto Service Design, or… things you should be focused on when designing and building your individual services
A service should be an authoritative source on one thing.
A service should be small. It is ‘micro’, after all. Some people will argue about lines of code or file size or something, but I think a better metric is ‘developer head space’.
In other words, your service should ”bound the context” of its responsibility to the degree that a developer can hold the whole thing in his or her head. No more.
Next up is the CAP Theorem.
The Cap Theorem was originally called “Brewer’s Conjecture”
The theorem states that a Distributed System can have two of…
- high consistency: when a data change occurs in the system, everything knows about it at once
high availability: a system should efficiently and quickly process every request. No dropped or ignored requests
High partition tolerance: Services should still operate even if others are down. Independent things fail independently.
For a web Application, we most likely want our services to be highly available and partition tolerant. Which means that data consistency will happen later
Users get real mad when things are slow or appear broken
Giving a little bit on consistency is ok. Updates should eventually propagate. If I place an order, I tell the user ‘Hey Thanks”, they go on about their day, and then I send the data over - eventually - to the order system, which -eventually- process the order and eventually deducts from inventory, etc. We’re talking of course about fractions of seconds. This term is called ‘eventual consistency’.
A banking system, however, is something that should probably be highly consistent
The Cap Theorem was proved in 2012 in this paper; before that it was simply a ‘Conjecture’.
Interestingly, a bit of trivia: this paper also introduced the term ‘Eventual Consistency’
The paper also states that CAP is a bit of a sliding scale… for example we can sacrifice a bit on Availability to have more Consistency
That being said, if you’re building a web app, focus on speed. Your customers will never forgive a slow application.
-Besides, the faster you are, the more money and data you can collect.
-Build your services so that they follow asynchronous, non-blocking patterns in order to avoid resource contention.
-Offload tasks to message queues.
-Try out the Reactor pattern (there’s a great library for JVM languages called ‘Reactor’, btw).
I could go on and on about this particular topic
Embrace the idea of Eventual Consistency. It’s ok to let data changes propagate asynchronously through your system, so long as they get there eventually.
-Users will forgive small mistakes.
-I can talk about some examples of this afterwards if anyone is interested
Getting back to Service Design, Let’s talk about a concept called ‘Data Locality’.
I think that this is perhaps the most important point in this wole presentation.
IT’s a bit of complicated subject, and I’m not sure I fully understand it myself.
There are two aspects of data locality:
1. Spatial. How proximal is our data? In a single data base, data can be considered highly spatial if it’s in the same table. Data which is reachable by joins is less spatial and perhaps less efficient to reach
2. Temporal. Being highly temporal means that data is read frequently. Highly temporal data - is an excellent candidate for caching.
These terms take on slightly different meanings in a distributed system, though.
In an ideal distributed environment, each system would be completely separate.
-However, there will always be some conceptual overlap. For example, let’s say we have an ecommerce system where <describe system>.
-Red lines are anchors between data localities
Here, users’ uuid is a shared natural key which acts as each services’ anchor to understanding the concept of a User.
Our user service is the authoritative source on a User
Our order history system also understands the User, but simply maintains the order history along with the user uuid of the user placing the order
Same chart from before, but another service, communication via email. This service also conceptualizes the user, but with a different subset of information. Namely, the email address.
Email address is stored in the user information service, but is *also* used extensively by the communication service. Which one should be the authoritative source? The communication service uses it more (and thus has higher Temporal locality), but email is a key piece of information which should be managed by the user auth system.
Tough Choice
One Compromise is to (unfortunately) synchronize the email address across multiple services
Or, communication service has no concept of User’s email address, but simply blindly sends emails out. The email addresses involved must be placed into each message received by the Communication service
Or, third option, perhaps we cache the email address on the communication service and wait for an event or notification from the user service when it changes (Eventual consistency, eh? When the user changes their email address, the communication service eventually - with eventually being perhaps milliseconds - busts the cache and resets the email address. The user may miss an email until that happens, and that may be ok).
I think the goal of a Microservices Architecture with regards to Data Locality is to keep high spatial locality, and yet efficiently handle situations like the one I just described when dealing with temporal data.
Next let’s talk about team organization
Your engineering department should be broken up into multidisciplinary teams.
Each service should have a team assigned to and be responsible for it.
Teams should be built from members with different skill sets
In fact, there should be no more teams broken up by discipline. That is, there should be no ‘QA team’, or ‘UX Team’, or … the remote corner of your office where the DBAs sit.
Each service’s team should be comprised of different disciplines that all work together.
That being said, there may be some team overlap, especially for startups which have a small developer base anyway.
One of the best working environments I’ve had was working directly with a QA person sitting next to me.
Next, Conway’s law!
Somewhat tongue and cheek, but basically says that the structure of any given computer system will begin to reflect the social structures of the company that built it.
While yes, teams should be organized around services, these services still need to communicate with one another.
Thus, encourage your teams to discuss the global picture. Have Team Leads meet periodically, particularly when discussing new service queries or events.
Do your absolute best to encourage team communication. The last thing you want is for a service to be down / broken, and have the rest of the staff laying blame on the beleaguered service’s team, rather than trying to help fix issues.
Also consider rotating individual team members into new teams periodically.
Third, Empower your teams to be highly autonomous on how they operate. Let them choose technologies (within reason; we may not be ready for Go or Haskell), let them run deployments, let them choose the approach.
-But they also need to be aligned well with the business. Essentially, provide the end goal, and let the team figure out how best to get there.
-I borrowed this drawing from a Spotify blog post on their engineering culture, and it shows the concept of a highly Aligned, highly Autonomous team.
-As leaders in your businesses, it’s your responsibility to keep the teams aligned with the business goals.
Which leads me to my last topic: your responsibilities as leaders.
Your engineering team will need someone (e.g. YOU) or a team of folks to ensure that the Microservice vision is being preserved. That is, keep individual services from getting too monolithic.
Lead discussions on micro service or distributed computing. Build up the communication libraries or messaging approaches. E.g. design a common messaging format that each service communicates on.
You are going to run into developers who don’t see the distributed approach as the right idea. They will question you. They will complain. They will say how much easier it is to simply have all of the data in one database….
don’t give in!
Part of your job is to reassure them. To educate. To prove that this is a viable architecture pattern!
It is tough, though. I’ve found that some people gravitate towards this approach right away, and others need convincing.
Anyway, I’m nearly out of time.
In summary:
Empower your service teams to be autonomous with their decision making