Distributed systems are not strictly an engineering problem. It’s far too easy to assume a backend development concern, but the reality is there are implications at every point in the stack. Often the trade-offs we make lower in the stack in order to buy responsiveness bubble up to the top—so much, in fact, that it rarely doesn’t impact the application in some way.
Distributed systems affect the user. We need to shift the focus from system properties and guarantees to business rules and application behavior. We need to understand the limitations and trade-offs at each level in the stack and why they exist. We need to assume failure and plan for recovery. We need to start thinking of distributed systems as a UX problem.
Tyler Treat looks at distributed systems through the lens of user experience, observing how architecture, design patterns, and business problems all coalesce into UX. Tyler also shares system design anti-patterns and alternative patterns for building reliable and scalable systems with respect to business outcomes.
Topic include:
- The “truth” can be prohibitively expensive: When does strong consistency make sense, and when does it not? How do we reconcile this with application UX?
- Failure as an inevitability: If we can’t build perfect systems, what is “good enough”?
- Dealing with partial knowledge: Systems usually operate in the real world (e.g., an inventory application for a widget warehouse). How do we design for the “disconnect” between the real world and the system?
37. @tyler_treat
A Study of Transparency and Adaptability of Heterogeneous
Computer Networks with TCP/IP and IPv6 Protocols
Das, 2012
“Any change in a computing system, such as a new feature or new
component, is transparent if the system after change adheres to
previous external interface as much as possible while changing its
internal behavior.”
77. @tyler_treat
Problems with 2PC
• Chatty protocol: beholden to network latency
• Limited throughput
• Transaction coordinator: single point of failure
• Blocking protocol: susceptible to deadlock
94. @tyler_treat
Challenges to Adopting Stronger Consistency at Scale
Ajoux et al., 2015
“The biggest barrier to providing stronger consistency guarantees…is
that the consistency mechanism must integrate consistency across
many stateful services.”
127. @tyler_treat
Sagas
Garcia-Molina & Salem, 1987
“A long-lived transaction is a saga if it can be written as a sequence of
transactions that can be interleaved with other transactions…Either all
the transactions in a saga are successfully completed or
compensating transactions are run to amend a partial execution.”
128. @tyler_treat
Sagas
Garcia-Molina & Salem, 1987
“A long-lived transaction is a saga if it can be written as a sequence of
transactions that can be interleaved with other transactions…Either all
the transactions in a saga are successfully completed or
compensating transactions are run to amend a partial execution.”
144. @tyler_treat
CAP Theorem
• Consistency, Availability, Partition Tolerance
• When a partition occurs, do we:
• Choose availability and give up consistency?
- or -
• Choose consistency and give up availability?
145. @tyler_treat
CAP Theorem
• Consistency, Availability, Partition Tolerance
• When a partition occurs, do we:
• Choose availability and give up consistency?
- or -
• Choose consistency and give up availability?
(or YOLO it)