This document discusses data consistency patterns in cloud native applications. It begins with an overview of data consistency challenges and models, including eventual, causal, and linearizable consistency. It then covers consistency at the application tier using patterns like sticky sessions as well as using distributed databases that provide strong consistency with global transactions. Specific databases discussed include Spanner, CockroachDB, YugaByte, FaunaDB, and FoundationDB. The document also covers using eventual consistency with patterns like CRDTs, sagas, and lightweight transactions. It concludes with an in-depth discussion of the Calvin consensus protocol used by FaunaDB to provide deterministic, serializable transactions at a global scale.
2. 2
Agenda
• What is Data Consistency?
• Data Consistency in Microservices
• Application Tier Consistency
• Strong Consistency with Distributed
Databases
• Linearizable Consistency Patterns
3. No One Solution
“Do you want your data right or right now?” - Pat Helland
PACELC Theorem -> More than CAP
• In the absence of network partitions the trade-off is between latency and
consistency - Daniel Abadi
Understand what types of concurrency problems exist
Evaluate trade-offs in the differing approaches
Minimize Development Complexity
5. Consistency Challenges
Dirty Reads - Read Uncommitted Write
Read Skew / Non-Repeatable Reads
Read your own Writes
Lost Updates
Write Skew
6. Write Skew
Two concurrent transactions each
determine what they are writing based
on reading a data set which overlaps
what the other is writing
begriffs.com
7. Consistency in ACID Transactions
ACID - Atomic, Consistent, Isolated and Durable
• Many different levels of ACID
Atomicity
• All or Northing. - it all happened or it didn’t.
• You don’t see things in the state in-between being processed.
• Ability to abort an operation and roll-back
Isolation
• Concurrently executing transactions are isolated from each other.
• Let the application developer pretend there is not two operations happening
in parallel.
Durable - Writes Stick
Consistent
• Enforce Invariants
• Data must be valid according to all defined rules, including constraints,
cascades, triggers, and any combination thereof
THIS IS NOT THE CONSISTENCY WE ARE LOOKING FOR
8. Credit to Peter Bailis and Aphyr - jepsen.io/consistency
Consistency Models
Credit to Peter Bailis and Aphyr at jepsens.io
9. Linearizable and Serializable Consistency
Serializability - multi-operation, multi-object, arbitrary total order
Linearizability - single-operation, single-object, real-time order
Linearizability plus Serializability provides Strict Serializability
Peter Bailis - Linearizability versus Serializability
10. What is Serializability?
Serializability Consistency
• Transaction isolation.
• Concurrency issues when one transaction reads data that is concurrently
modified by another transaction. Or when two transactions try to
simultaneously modify the same data.
• Database guarantees that two transactions have the same effect as if they
where run serially. Or they have the illusion of running one at a time
without any concurrency.
11. Levels of Serializable Isolation
Repeatable Reads - Read and Write Locks
• Prevents Phantom Reads
• Write skew still possible
Read Committed - Write Locks that prevent:
• Dirty reads - only see data once the transaction is committed
• Dirty writes - only overwrite data that has been committed
Read Uncommitted
12. Linearizability
Eventual vs. Strong Consistency is talking about Linearizability
Guarantees that the order of reads and writes to a single register or row will
always appear the same on all nodes.
Appearance that there is only one copy of the data.
It doesn’t group operations into transactions. It doesn’t address problems
like dirty reads, phantom reads, etc..
Guarantees read-your-write behavior
13. Linearizable Consistency in CAP Theorem
CAP Theorem is about “atomic consistency”
• Atomic consistency refers only to a property of a single
request/response operation sequence.
• Linearizability
Linearizable Consistency in CAP Theorem
14. AP w/ Session Based Consistency
Yellow Nodes
Causal consistency
Monotonic reads / writes
Strong consistency with a single
process only
Not isolated
15. AP Consistency
Blue Nodes
Sacrifice consistency for higher-availability and
partition tolerance
Maintains availability even when the network is
down
Monotonic atomic view
Read committed / uncommitted
16. CP Consistency
Strict Serializable - Combine Serializable plus
Linearizability
Provides Highest Level of Consistency
Moves Complexity of Transactions out of
Microservices into the Database
True ACID Transactions
Atomicity allows for rollback of transactions
Complete Isolation of Transaction
High Levels of Safety Guarantees
18. From Monolith to Microservices
Data Consistency was easy in a monolith application - single source of
truth w/ ACID transactions
Move to microservices each service became a bounded context that
owns and manages its data.
Data Consistency became very difficult w/ microservices
19. Consistency Challenges with Data in Microservices
Traditional ACID transactions did not scale
Data orchestration between multiple services - Number of Microservices
Increases Number of Interactions
Stateful or Stateless
Data rehydration for things like service failures and rolling updates.
20. Eventual Consistency
CAP Theorem
• Force choice between Global Scale or Strong Consistency
Eventual Consistency
• Sacrificed consistency for availability and partition tolerance.
• Really a Necessary Evil
• Last Write Wins - What if I can’t loose a write?
• Write now and figure it out later
Pushed complexity of managing consistency to application tier
21. Return of Strong Consistency
Rise of Databases providing strong consistency and global scale
Possible to push complexity of consistency back to the database
Not a panacea for data consistency challenges
22. Distributed System Design
Heart of distributed system design is a requirement for a
consistent, performant, and reliable way of managing data.
24. Advantages of Application Tier Consistency
Low Read / Write Latency
High-Throughput
Read your Writes - Same session only
Requires application to enforce session stickiness
25. Disadvantages of Application Tier Consistency
No Isolation and limited atomicity
Consistency problems are far harder to solve in the application
tier where
Increased Complexity
26. Use Cases of Application Tier Consistency
Music Playlists
Shopping Carts
Social Media Posts
27. Patterns for Application Tier Consistency
Sticky Sessions
• Session Consistency
• Differing Levels of Lineriazability
• Example - Akka Clustering
28. Sticky Sessions
Whether or not read-your-write, session and monotonic
consistency can be achieved depends in general on the
"stickiness" of clients to the server that executes the distributed
protocol for them. If this is the same server every time than it is
relatively easy to guarantee read-your-writes and monotonic
reads. - Werner Vogels 2007
29. Akka Clustering
Pin Session to an Actor - Sticky Session
Akka Clustering Libraries
• Cluster Sharding
• Cluster Singleton
• Cluster Proxy
Akka Persistence
Akka Distributed Data w/ Conflict Free Replicated Data Types (CRDTs)
30. What are CRDT’s?
CRDT - Conflict Free Replicated Data Types
Data types that guarantee convergence to the same value without
any synchronization mechanism
Consistency without Consensus
Avoid distributed locks, two-phase commit, etc.
Data Structure that tells how to build the value
Sacrifice linearizability (guaranteed ordering ) while remaining
correct
31. Akka Distributed Data CRDT’s
Monotonic Sequences - Sequence that always increases or
always decreases
Monotonic Sequences are eventually consistent without any
need for coordination protocols
GCounter, PNCounter - Grow Only Counter / Positive Negative
Counter
GSet, ORSet - Grow Only Set, Observe Remove Set
ORMap, PNCounterMap, LWWMap
Flag, LWWRegister
32. Akka Cluster Strengths
Strong Consistency within a Single Actor
Monotonic Read / Writes
High Availability
High Throughput and Low Latency
Can be AP with a Split Brain Resolver
Reduced latency because no db roundtrip
33. Akka Cluster Weaknesses
Akka Distributed Data limited to CRDT’s
Akka Distributed Data has a limited data size
• All entries are held in memory
• Limit of 100,000 top level elements
• Replication via gossip with large data gets slower
No Isolation of Data
Consistency of Akka Persistence depends on backing data store
35. Advantages of Strict Serializable Consistency
Decrease Application Tier Complexity
Reduce Cognitive Overhead
Increased Developer Productivity
Increased Focus on Business Value
Strong Isolation
Most implementations also provide strong atomicity
36. Use Cases for Global Transactions
Processing financial transactions
Fulfilling and managing orders
Anytime there needs to be coordination of complex transactions
across multiple data sources.
37. Strengths and Weaknesses of Strict Serializable Consistency
Read your Writes - Across sessions
Prevent Phantom Reads, Write Skew, etc.
Higher Read / Write Latency
Lower Throughput
38. Disadvantages
Transactions are hard. Distributed transactions are harder.
Distributed transactions over the WAN are final boss
hardness. I'm all for new DBMSs but people should tread
carefully. - Andy Pavlo on Twitter
39. Not All Transactions are the Same
Distributed Multi-Value Concurrency (MVCC) / Snapshots
Differences in Transaction Protocol
• Global Ordering Done in a Single Phase vs. Multi-Phase
• Pre or Post Commit Transaction Resolution
Different levels of consistency
Maximum scope of a transaction
• Single Record vs. Multiple Records
Transaction can be regional or global
40. Differing Interpretations of Consistency and ACID
Consistency and ACID Spectrum
Week Isolation Level
Scope of Transaction -
Single Row
Eventually Consistent
Strongest Isolation Level
Scope of Transaction -
Distributed Across
Partitions
Serializable Consistency
41. Lots of Options
Google Spanner - 2 Phase Commit with dependency on proprietary atomic clocks
Coackroach & YugaByte - Open Source version of Spanner with 2 Phase
Commits and Hybrid Clocks
Fauna - Single Phase Commit with no hard dependency on clocks
FoundationDB - Serializable Snapshot Isolation
AWS Dynamo Transactions - Multiple Object with limits to single region
AWS Aurora - Multi-Master coming soon
• Low Latency Read Replicas
• Fault-tolerant - replicates six copies of your data across three Availability Zone
42. Google Spanner
External consistency, an isolation level even stricter than
serializability
Relation Integrity Constraints
99.999% availability SLA
Uses a global commit timestamps to guarantee ordering of
transactions using the TrueTime API.
Multiple Shards with 2PC
Single Shard Avoids 2PC for Writes / Read-only Transactions also avoid 2 PC
No Downtime upgrades - Maintenance done by moving data between nodes
Downside is cost and some limitations to the SQL model and schema design
43. CoackroachDB
Open source version of Spanner
Hybrid Logical Clock similar to a vector clock for ordering of transactions
Challenges with clock skew - waits up to 250 MS on reads
Provides linearizability on single key and overlapping keys
Transactions that span disjoint set of key it only provides serializability
and not linearizability
Some edge cases cause anomalies called “casual reverse” - Jepsen
analysis
44. YugaByte
Also uses Hybrid Logical Clock
Currently supports snapshot isolation
Serializable isolation level work in progress
Distributed Transactions to multiple partitions require a provisional record
https://docs.yugabyte.com/latest/architecture/transactions/distributed-txns/
45. Fauna DB
Distributed Global Scale OLTP Database with Global Transactions
Cloud or On-Prem
Temporality
Multi-Tenancy
Advanced Security Model w/ Row Level Security
Document Model
Multiple Indexes per table (class) similar to materialized views
46. Fauna Consensus Algorithm
Transactions can include multiple rows - not restricted to data in a single
row or shard
Transaction resolution based on the Calvin protocol - pre-ordering of
transactions before commit
Global transaction ordering provides serializable consistency
Distributed log based algorithm scales throughput with cluster size by
partitioning the log
47. Fauna Upsides
Single-Phase Consensus Algorithm provides lowest possible global
latency
Low Latency Snapshot Reads
No difference in multi-partition and single-partition transactions
Powerful Query Language - Complex Transaction in a single query
48. Fauna Downsides
Proprietary Query Language
Higher Write Latency with Global Transactions
Writes always pay the cost of multi-partition transactions
50. Distributed Saga Overview
Central Coordinator
• Manages Complex Transaction Logic
• Uses Event Sourcing to store state
• State managed in an distributed log
Split work into idempotent executors /
requests
Requires compensating requests for
dealing with failures / aborting
transaction
Effectively Once instead of Exactly Once
51. Distributed Saga Strengths
Fault Tolerant / HA
Composable executors
Isolation of complex code into coordinator
Atomicity/Abortability if created by the developer
52. Distributed Saga Weaknesses
No Consistency
Weak isolation
No Guaranteed Atomicity - Unsafe partially committed states
Complexity with versioning of saga logic
Increased application complexity
Rollback and recovery logic required in application tier
Idempotentcy impossible for some services
54. Cassandra LWT Design Approach
Cassandra Lightweight Transactions (LWT)
Use a single partition as a record locking mechanism
Use a CAS type operation
• Compare and Swap
• Compare and Set a new value
Cassandra Batch is not a traditional DB Batch
• Only Atomic within a single partition
55. Upside of Cassandra LWT
Linearizable Consistency within a single partition
All the benefits of Cassandra
• Great read / write performance
• High Availability / Fault Tolerant
• Tunable Consistency
If transactions are only needed for a small portion of the application then
LWT’s are useful
56. Downsides of Cassandra LWT
Transaction only applies to a single partition
High-Latency with multi-phase commit
Does not provide isolation of transaction
Expensive consensus algorithm - 4 roundtrips via Paxo’s
58. Classic 2 Phase Commit / Non-Deterministic
• 2 Phase Commit Rolling Dice on Effects being Applied
• Transaction Coordinator Walks across all involved partitions through out cluster
• Prepare Phase— Determining Effects and Contention
• Commit Phase - Tells Partitions if they succeed or failed
• 2 Rounds of Global Consensus
Writes - Non-Deterministic
59. Calvin Protocol / Deterministic Commit / Single Global Consensus Round
Three Phases
• Query Speculative Execution / Calculate Effects
• Non-Deterministic Query => Deterministic Transaction => Pure Function over state
• Global Log Commit / Ordering Phase
• Deterministic Transaction Results Applied
Writes - Deterministic
60. Non-Deterministic / Classic 2 Phase Commit
• Phase One
• Transaction Effects Determined => Intents Written
• Determining Transaction success or failure requires global coordination
• Phase Two
• Tell partitions if the succeed or need to abort
• 2 Phase Commit Rolling Dice on Effects being Applied
Deterministic Commit / Single Global Consensus Round
• When transaction is committed to log outcome is pre-determined
• Pre-Commit Speculative Calculate the Effects
• Transaction Committed => Transaction is Order => Effects Determined and Applied
• Transaction is a Pure Function over the State of the Database
• No Rollback of Effects
Deterministic vs. Non-Deterministic
61. Calvin inspired
• Single, global consensus domain per cluster
• All transactions handled identically on each replica
• Transaction batching maximizes throughput
(In contrast) Spanner inspired
• Multiple consensus domains, one per shard
• Per shard consensus ==> challenges in multi-shard transactions
• Wall clock TS approach to solve this problem
• Introduces “window of uncertainty”
• Consistency guarantee is lost any time clock skew exceeds “uncertainty” threshold
• Spanner addresses with unique Google hardware and API
• Others, software only solution
Consistency Without Clocks: Inspired by Calvin
62. Spanner / Spanner Inspired
• Problem consistent reads with concurrent writes
• Read using 2PC have high latency
• Alternative is choose a stable snapshot time for reads - How stale do you
want your data?
• Spanner guarantees stable read timestamps through clock synchronization
Read Consistency with Clocks
63. Calvin Protocol
• Read only transactions avoid transaction pipeline
• Log based total ordering allows partitions to guarantee stable read
• Local partition waits until data is consistent to perform reads at a given
snapshot
• Wait only happens when a node is behind on applying transactions
Read Consistency Without Clocks