The document discusses lies that architects sometimes tell and truths they avoid. It provides examples of six common lies: 1) saying a system is real-time or has big data when it really has specific requirements, 2) claiming a microservices architecture exists when the goal is still to migrate, 3) saying hybrid/multi-cloud architectures don't exist when the architecture is just copy-pasted, 4) using "best of breed" when really using only one of everything, 5) claiming something can't be done at an organization due to its nature when other similar organizations succeeded, and 6) avoiding risk or change by safely interpreting things in a non-threatening way. The document advocates defining responsibilities clearly, embracing change, taking measured
10. Actual Requirements
⢠Throughput:
X MB/sec
⢠Latency:
X ms from A to B
⢠Storage: TB
⢠Size of working set: GB
⢠Availability
⢠Durability
⢠Data access patterns
11. Soft Requirements
⢠Tolerance for failure
⢠Tolerance for âcutting edgeâ
⢠Operational maturity
⢠Size of engineering team
⢠Culture
12. Buzzwords are Suspicious
⢠Cloud Native == Resilient & Elastic
⢠Serverless == Less ops & Pay per use
⢠Service mesh == Flexible infra & Central control
13. What I say:
What They Hear:
What I mean:
âOh, 15 minute produce-to-consume latency
and 1 MB/sec latency is easy!â
âYou are working on a crappy systemâ
âWe can build something cheap, stable and
easy to maintain. YAY!â
19. Have an escape plan
Once the data is in Kafka, itâs so much more
portable⌠Portable data means you donât need to
worry about vendor lock-in.â
â Chris Ricommini, WePay
25. Distributed Monolith?
⢠How many services do you need to modify
when making âa tiny changeâ?
⢠One of your microservices crashed.
How many microservices do you need to restart?
⢠Can you deploy a new version of any microservice
at any time and at any order?
39. Bad Architect Good Architect
Uses buzzwords to justify
decisions
Uses data to drive decisions
Forces rapid adoption of trendy
choices
Looks at goals, costs and
benefits
Copy-pastes from vendors Does own research and POC
Protects âplay with techâ turf
Help engineers adopt
technologies
Says âweâve always done it this
wayâ
Is a change agent
Avoids risk / ignores risk.
Takes risks that make a
difference.
40. â Rabbi Hanina bar Hama
"I have learned much from my teachers, more from
my colleagues, and the most from my students"
Hinweis der Redaktion
Hey, Iâm Gwen Shapira - Iâm a committer on Apache Kafka project and I work for Confluent, which is the company building a streaming event platform out of Kafka. I left my title out, because if I included it, you wouldnât believe a single word Iâm about to say.Before I start, I want to find out what percentage of the audience Iâll accidentally insult this morning. So⌠how many of you have the title of âenterprise architectâ?Those of you working as enterprise architects probably know what you do. So it may surprise you to discover that to a large extent the rest of the world doesnât.
Hey, Iâm Gwen Shapira - Iâm a committer on Apache Kafka project and I work for Confluent, which is the company building a streaming event platform out of Kafka. I left my title out, because if I included it, you wouldnât believe a single word Iâm about to say.
Apache Kafka is often used for what some people call âfast dataâ systems. Which means that the first thing I often hear when I meet a new customer isâŚ
This is not a lie. Regardless of what you may think, we have real work to do.
Architects are there to force software engineers across the company to collaborate.
Software engineers want whatever they want.
And their management is incentivized to let them do it, because hiring engineers is hard and keeping them productive is harder.
You want to use Go? fantastic! Want to use Airflow? Go for it. Love Spark - you do you! Love REST APIs? Weâll do REST APIs. You prefer GRPC? Lets do GRPC!
But we all need to work together, and this is where it gets challenging. So enterprise architects are supposed to pick âstandardsâ and get the entire company to use them. Standard languages, frameworks, tools, methods, architectures, design styles, governance, etc.
Engineers have tons of leverage. So you canât force them to collaborate. You need to convince them to do things your way. Which is why no one knows what architects do: When we do a great job, it looks like we do nothing at all.
Convincing smart and opinionated people ainât easy. You need to build trust, you need to have good arguments, you may need to have proof which means that you need to build convincing proof of concepts, you need to do research, you need to educate, you need to win debates.
This is difficult. So, sometimes we take shortcuts.
Not always intentionally. Usually the first victim is ourselves. We want something to be true, so we convince our selves it is true. And then we started repeating it to other people.
Our lies are a shortcut. A way to convince ourselves and everyone else that we made a good decision, instead of doing the hard work of research, proof and consensus building.
Lets look at some examples of lies that Iâve heard again and again and again, in 10 years of advising enterprise architects, in companies large and small, on how to build their data infrastructure. Lets see look at the lies and the truth behind them, because knowing the truth will free you to do a better job.
Since I work with Apache Kafka, which is loosely connected to something called âFast dataâ or âspeed layerâ, I get this one a lot.
If you naively imagine they are building something likeâŚ
you are wrong.
I learned to ask: Which part of the system need to be how fast?
Because most of the time it is â15 minutes until the dashboard reflects changes to the DBâ.
Except when it is â15 ms until mobile app shows that the action was acceptedâ.
Or even â15 microseconds until we trigger a tradeâ.
That lie has an older cousin. Iâve heard this non-stop between 2012 to 2016.
It was usually a reason for adopting an unusual, poorly understood and poorly tested data storage system. âWhy are you re-building your system to use System-Z?ââWell, we have big data, you knowâ.Big data in that context was around 45GB. Not that it matters.
Buzzwords like âReal timeâ and âbig dataâ are nearly meaningless. All those buzzwords are a shortcut for specific requirements. The things you actually need your system to do.
We take the shortcut because collecting requirements is difficult, and we canât be really sure we nailed them. To make things worse, storage isnât always as agile as other components - if we make a mistake, migration is difficult.
But: Picking a system based on a buzzword is worse - because the probability it is an actual fit is pretty low.
Talk to me in requirements. Whatâs your throughput? What is your latency tolerance. How long do you need to store your data? Whatâs the size of the working set? What are your data access patterns? Whatâs your reporting requirements?
Iâve seen an email from an engineer who decided to adopt a relatively new technology, and use one of the most cutting edge features in it. The email basically said âIâve now lost all my credibility and I need to switch tracks or Iâll lose my job tooâ. This is heartbreaking to me.
Like lies, they are used as shortcuts. And often lose all meaning. You need to learn to translate them.
Always look for the costs and benefits.
I learned that I need to be very gentle when I give my opinion about those requirements. I definitely canât laugh out loud. Architects can be sensitive. I say âWell, 15 minutes is trivialâ and they hear âYou are a crappy architect working on a crappy systemâ. What I mean is âYay! We can build a system that is cheap, easy to maintain and low risk!â
Lots of architects act like they are allergic to simplicity. Simple is good.
Collect your requirements and if they lead you to a simple system - celebrate!
This lie has a close sibling. Now, youâd think that if âwe have big dataâ is a lie, then âwe donât have big dataâ must be the truth. But youâd be wrong. Just like its sibling, this is a way of explaining a choice of data storage system made without proper consideration of requirements, capabilities, performance tests, any tests at all or even any data at all. An example would be:âWhy are you using Oracle to search text documents?â âWell, we donât need Elastic, right? We donât have big data or anythingâ.
If you are using Oracle to do something that Elastic is really good at, thereâs high probability that you are paying few million dollars too many for your text search. Same for using Oracle as a work queue.
On a side note, it is really funny how I show large prospects how they can use Kafka for their work queue and therefore save few millions of dollars and suddenly 20,000 dollar becomes âwhy should I pay so much for open source?â. Iâll never understand that.
Anyway, whatâs the truth here?
âBig Dataâ and âNot Big Dataâ are nearly meaningless. Talk to me in requirements. Whatâs your throughput? What is your latency tolerance. How long do you need to store your data? Whatâs the size of the working set? What are your data access patterns? Whatâs your reporting requirements?
Lets talk about non-functional requirements too: What is your appetite for risk? Do you prefer a well-understood and stable system or a presentation with âlessons learned implementing System-Zâ at a conference? Whatâs the outcome of failure like in your organization? How many other databases do you have? What is your operational culture like? Are you comfortable contributing to open source?
Iâve seen an email from an engineer who decided to adopt a relatively new technology, and use one of the most cutting edge features in it. The email basically said âIâve now lost all my credibility and I need to switch tracks or Iâll lose my job tooâ. This is heartbreaking to me.
Given all the functional and non-functional factors, you should be able to make a strong, well-justified choice of a database. Without a single buzzword.
This is a particularly annoying lie, because it is so easy to disprove. I have maybe 20 counter-examples. From a very diverse set of companies. Early on, I only ever heard this from âDeveloper Advocatesâ working for a certain cloud provider. But now I also hear it from customers.
By now, if an architect tells me: âMult-cloud architectures are not a thingâ, you of know that they are very very attached to a specific cloud provider. You can tell for sure, if the next thing they say is:
You really canât argue with someone who didnât really think about the architecture in the first place.
But! If you get there first, you can get them to copy-paste YOUR architecture!
Seriously though. There are lots of benefits to using lots of cloud managed services. You just need to have an escape hatch. A plan on what to do if things donât turn out as expected. And this is true for most architecture decisions.
https://riccomini.name/kafka-escape-hatch
I have to admit that this is a lie that I fell for many many times before I discovered the truth.
Actually, there are few versions of the truth.
Sometimes it isnât a lie as much as it is an aspiration. They want to do microservices. They are moving in that direction. They are just not there yet. This is great.
It includes an implicit admission in what is really a grand universal truth: âSoftware architectures, like anything else, only exist in a state of changeâ. Like buddhist mandalas, they are great works of art, built in sand, to be swept away. In few month there will be somewhere else for you to migrate to. Maybe serverless.
We have 50,000 different microservices that we canât keep track of. Changing a simple config requires weeks of stitching together a workflow across at least 6 different services. Troubleshooting is impossible.
How do you know if you have a distributed monolith? You try to do one of the things that microservices are supposed to make easy. Was it easy?
The worst offender is leak of responsibilities. Microservices are supposed to contain entire context. But you see cases where the customer profile service needs to call the insurance quotes service whenever someone changes address. Some of the logic around quotes leaked from the quotes service into the customer profile service, where it does not belong.
Now, if we had an event driven architecture, the customer profile service would publish changes in profile and the insurance quote service can decide how to use them.
Now, donât copy-paste my architecture⌠but you may want to look into this.
As I said earlier, as a software architect⌠you have one job. Make sure your organization is standardized on a set of technologies and design principles. âBest of breedâ can be a nice way to say âIâm not doing my jobâ.
There is a cost to having a technology. What makes additional technology costly? learning, deployment, monitoring, finding all the âunexpected behaviorsâ.
If a technology does not serve a purpose or âspark joyâ, say âthank youâ and figure out a plan to get rid of it.
Carefully assess the technical and cultural fit of new technology vs the costs of adding another technology to the stack.
Remember that the cost of adding new software is much higher if there are lots of unique integration points. Each integration adds its own risk, so definitely try to minimize those.
We used to call it âintegration taxâ - it ainât bad at first, but system #20 has to integrate with the 19 older things. Unless you take steps to control it with something like Kafka as central integration point.
There is a tricky anti pattern here thoughâŚ
Sometimes architects think that just because their job is to get the organization to standardize, they are the only ones who can use new technologies. This is terrible and if you work in such a place, leave.
As a developer, it robs you of your growth and joy. As an architect, it robs you of your chance to make an impact⌠because developers are unlikely to actually do what you tell them.
I learned this from my customers. The most successful enterprise architects know how to detect great bets and work with the engineers and managers to help the rest of the organization adopt the new methods. You do it with workgroups, hackathons, office-hours, tech-talks, etc, etc.
It is an immensely satisfying way to work - you get to see many engineers grow and your decisions take root and take the organization to the next level.
Lyft had a great talk at Qcon NYC couple of years back on how they migrated from REST to GRPC, and the main challenges were getting company-wide participation â and they wrote many tools, including a legendary proxy to help gradually increase comfort level with the new technology.
And please donât tell me âOh, this is Lyft. We can never do it hereâ becauseâŚ
From âWe know almost nothing and call a vendor for everythingâ through âWe know lots, do almost everything alone and also have a deep expert on retainerâ all the way to âWe employ 2+ committers on the projectâ.
Remember that being an architect is all about changing how you think and approach problems, changing how others work and changing entire organizations.Â
This is your job. Assess capacity for risk. Small organizations can feel bolder. Large organizations sometimes prefer to play it safe, and other times want to use their resources to do very bold things. Assess what matters to the business. Find the best solutions from large teams of engineers - and with sure hand, steer the organization in the right direction at the right speed.
This can be a very benign lie. Agile manifesto has very reasonable ideas like âvalue peopleâ and âvalue working softwareâ. Why would anyone lie about something so reasonable?
There is a wonderful document called âthe half-arsed agile manifestoâ.https://www.halfarsedagilemanifesto.org/
I learned a lot preparing this presentation. Thank you for the opportunity to learn and share.