SlideShare a Scribd company logo
1 of 66
Download to read offline
Data Consistency Patterns
in Cloud Native Applications
Ryan Knight @knight_cloud
2
Agenda
• What is Data Consistency?
• Data Consistency in Microservices
• Application Tier Consistency
• Strong Consistency with Distributed
Databases
• Linearizable Consistency Patterns
No One Solution
“Do you want your data right or right now?” - Pat Helland
PACELC Theorem -> More than CAP
• In the absence of network partitions the trade-off is between latency and
consistency - Daniel Abadi
Understand what types of concurrency problems exist
Evaluate trade-offs in the differing approaches
Minimize Development Complexity
What is Data Consistency?
Consistency Challenges
Dirty Reads - Read Uncommitted Write
Read Skew / Non-Repeatable Reads
Read your own Writes
Lost Updates
Write Skew
Write Skew
Two concurrent transactions each
determine what they are writing based
on reading a data set which overlaps
what the other is writing
begriffs.com
Consistency in ACID Transactions
ACID - Atomic, Consistent, Isolated and Durable
• Many different levels of ACID
Atomicity
• All or Northing. - it all happened or it didn’t.
• You don’t see things in the state in-between being processed.
• Ability to abort an operation and roll-back
Isolation
• Concurrently executing transactions are isolated from each other.
• Let the application developer pretend there is not two operations happening
in parallel.
Durable - Writes Stick
Consistent
• Enforce Invariants
• Data must be valid according to all defined rules, including constraints,
cascades, triggers, and any combination thereof
THIS IS NOT THE CONSISTENCY WE ARE LOOKING FOR
Credit to Peter Bailis and Aphyr - jepsen.io/consistency
Consistency Models
Credit to Peter  Bailis and Aphyr at jepsens.io
Linearizable and Serializable Consistency
Serializability - multi-operation, multi-object, arbitrary total order
Linearizability - single-operation, single-object, real-time order
Linearizability plus Serializability provides Strict Serializability
Peter Bailis - Linearizability versus Serializability
What is Serializability?
Serializability Consistency
• Transaction isolation.
• Concurrency issues when one transaction reads data that is concurrently
modified by another transaction. Or when two transactions try to
simultaneously modify the same data.
• Database guarantees that two transactions have the same effect as if they
where run serially. Or they have the illusion of running one at a time
without any concurrency.
Levels of Serializable Isolation
Repeatable Reads - Read and Write Locks
• Prevents Phantom Reads
• Write skew still possible
Read Committed - Write Locks that prevent:
• Dirty reads - only see data once the transaction is committed
• Dirty writes - only overwrite data that has been committed
Read Uncommitted
Linearizability
Eventual vs. Strong Consistency is talking about Linearizability
Guarantees that the order of reads and writes to a single register or row will
always appear the same on all nodes.
Appearance that there is only one copy of the data.
It doesn’t group operations into transactions. It doesn’t address problems
like dirty reads, phantom reads, etc..
Guarantees read-your-write behavior
Linearizable Consistency in CAP Theorem
CAP Theorem is about “atomic consistency”
• Atomic consistency refers only to a property of a single
request/response operation sequence.
• Linearizability
Linearizable Consistency in CAP Theorem
AP w/ Session Based Consistency
Yellow Nodes
Causal consistency
Monotonic reads / writes
Strong consistency with a single
process only
Not isolated
AP Consistency
Blue Nodes
Sacrifice consistency for higher-availability and
partition tolerance 

Maintains availability even when the network is
down
Monotonic atomic view
Read committed / uncommitted
CP Consistency
Strict Serializable - Combine Serializable plus
Linearizability
Provides Highest Level of Consistency

Moves Complexity of Transactions out of
Microservices into the Database 

True ACID Transactions
Atomicity allows for rollback of transactions
Complete Isolation of Transaction
High Levels of Safety Guarantees

Data Consistency in Microservices
From Monolith to Microservices
Data Consistency was easy in a monolith application - single source of
truth w/ ACID transactions
Move to microservices each service became a bounded context that
owns and manages its data.
Data Consistency became very difficult w/ microservices
Consistency Challenges with Data in Microservices
Traditional ACID transactions did not scale
Data orchestration between multiple services - Number of Microservices
Increases Number of Interactions
Stateful or Stateless
Data rehydration for things like service failures and rolling updates.
Eventual Consistency
CAP Theorem
• Force choice between Global Scale or Strong Consistency
Eventual Consistency
• Sacrificed consistency for availability and partition tolerance.
• Really a Necessary Evil
• Last Write Wins - What if I can’t loose a write?
• Write now and figure it out later
Pushed complexity of managing consistency to application tier
Return of Strong Consistency
Rise of Databases providing strong consistency and global scale
Possible to push complexity of consistency back to the database
Not a panacea for data consistency challenges
Distributed System Design
Heart of distributed system design is a requirement for a
consistent, performant, and reliable way of managing data.
Application Tier Consistency
Advantages of Application Tier Consistency
Low Read / Write Latency
High-Throughput
Read your Writes - Same session only
Requires application to enforce session stickiness

Disadvantages of Application Tier Consistency
No Isolation and limited atomicity
Consistency problems are far harder to solve in the application
tier where
Increased Complexity
Use Cases of Application Tier Consistency
Music Playlists
Shopping Carts
Social Media Posts
Patterns for Application Tier Consistency
Sticky Sessions
• Session Consistency
• Differing Levels of Lineriazability
• Example - Akka Clustering
Sticky Sessions
Whether or not read-your-write, session and monotonic
consistency can be achieved depends in general on the
"stickiness" of clients to the server that executes the distributed
protocol for them. If this is the same server every time than it is
relatively easy to guarantee read-your-writes and monotonic
reads. - Werner Vogels 2007
Akka Clustering
Pin Session to an Actor - Sticky Session
Akka Clustering Libraries
• Cluster Sharding
• Cluster Singleton
• Cluster Proxy
Akka Persistence
Akka Distributed Data w/ Conflict Free Replicated Data Types (CRDTs)
What are CRDT’s?
CRDT - Conflict Free Replicated Data Types
Data types that guarantee convergence to the same value without
any synchronization mechanism
Consistency without Consensus
Avoid distributed locks, two-phase commit,  etc.
Data Structure that tells how to build the value
Sacrifice linearizability (guaranteed ordering ) while remaining
correct
Akka Distributed Data CRDT’s
Monotonic Sequences - Sequence that always increases or
always decreases
Monotonic Sequences are eventually consistent without any
need for coordination protocols

GCounter, PNCounter - Grow Only Counter / Positive Negative
Counter
GSet, ORSet - Grow Only Set, Observe Remove Set
ORMap, PNCounterMap, LWWMap
Flag, LWWRegister
Akka Cluster Strengths
Strong Consistency within a Single Actor
Monotonic Read / Writes
High Availability
High Throughput and Low Latency
Can be AP with a Split Brain Resolver
Reduced latency because no db roundtrip
Akka Cluster Weaknesses
Akka Distributed Data limited to CRDT’s
Akka Distributed Data has a limited data size
• All entries are held in memory
• Limit of 100,000 top level elements
• Replication via gossip with large data gets slower
No Isolation of Data
Consistency of Akka Persistence depends on backing data store
Strong Consistency with Global
Distributed Databases
Advantages of Strict Serializable Consistency
Decrease Application Tier Complexity
Reduce Cognitive Overhead
Increased Developer Productivity
Increased Focus on Business Value
Strong Isolation
Most implementations also provide strong atomicity
Use Cases for Global Transactions
Processing financial transactions
Fulfilling and managing orders
Anytime there needs to be coordination of complex transactions
across multiple data sources.
Strengths and Weaknesses of Strict Serializable Consistency
Read your Writes - Across sessions
Prevent Phantom Reads, Write Skew, etc.
Higher Read / Write Latency
Lower Throughput
Disadvantages
Transactions are hard. Distributed transactions are harder.
Distributed transactions over the WAN are final boss
hardness. I'm all for new DBMSs but people should tread
carefully. - Andy Pavlo on Twitter
Not All Transactions are the Same
Distributed Multi-Value Concurrency (MVCC) / Snapshots
Differences in Transaction Protocol
• Global Ordering Done in a Single Phase vs. Multi-Phase
• Pre or Post Commit Transaction Resolution
Different levels of consistency
Maximum scope of a transaction
• Single Record vs. Multiple Records
Transaction can be regional or global
Differing Interpretations of Consistency and ACID
Consistency and ACID Spectrum
Week Isolation Level
Scope of Transaction -
Single Row
Eventually Consistent
Strongest Isolation Level
Scope of Transaction -
Distributed Across
Partitions
Serializable Consistency
Lots of Options
Google Spanner - 2 Phase Commit with dependency on proprietary atomic clocks
Coackroach & YugaByte - Open Source version of Spanner with 2 Phase
Commits and Hybrid Clocks
Fauna - Single Phase Commit with no hard dependency on clocks
FoundationDB - Serializable Snapshot Isolation
AWS Dynamo Transactions - Multiple Object with limits to single region
AWS Aurora - Multi-Master coming soon
• Low Latency Read Replicas
• Fault-tolerant - replicates six copies of your data across three Availability Zone
Google Spanner
External consistency, an isolation level even stricter than 

serializability
Relation Integrity Constraints
99.999% availability SLA
Uses a global commit timestamps to guarantee ordering of 

transactions using the TrueTime API.
Multiple Shards with 2PC
Single Shard Avoids 2PC for Writes / Read-only Transactions also avoid 2 PC
No Downtime upgrades - Maintenance done by moving data between nodes
Downside is cost and some limitations to the SQL model and schema design

CoackroachDB
Open source version of Spanner
Hybrid Logical Clock similar to a vector clock for ordering of transactions
Challenges with clock skew - waits up to 250 MS on reads
Provides linearizability on single key and overlapping keys
Transactions that span disjoint set of key it only provides serializability
and not linearizability
Some edge cases cause anomalies called “casual reverse” - Jepsen
analysis
YugaByte
Also uses Hybrid Logical Clock
Currently supports snapshot isolation
Serializable isolation level work in progress
Distributed Transactions to multiple partitions require a provisional record
https://docs.yugabyte.com/latest/architecture/transactions/distributed-txns/
Fauna DB
Distributed Global Scale OLTP Database with Global Transactions
Cloud or On-Prem
Temporality
Multi-Tenancy
Advanced Security Model w/ Row Level Security
Document Model
Multiple Indexes per table (class) similar to materialized views
Fauna Consensus Algorithm
Transactions can include multiple rows - not restricted to data in a single
row or shard
Transaction resolution based on the Calvin protocol - pre-ordering of
transactions before commit
Global transaction ordering provides serializable consistency
Distributed log based algorithm scales throughput with cluster size by
partitioning the log
Fauna Upsides
Single-Phase Consensus Algorithm provides lowest possible global
latency 

Low Latency Snapshot Reads 

No difference in multi-partition and single-partition transactions 

Powerful Query Language - Complex Transaction in a single query 

Fauna Downsides
Proprietary Query Language
Higher Write Latency with Global Transactions
Writes always pay the cost of multi-partition transactions 

Saga Pattern
Distributed Saga Overview
Central Coordinator
• Manages Complex Transaction Logic
• Uses Event Sourcing to store state
• State managed in an distributed log
Split work into idempotent executors /
requests
Requires compensating requests for 

dealing with failures / aborting
transaction
Effectively Once instead of Exactly Once
Distributed Saga Strengths
Fault Tolerant / HA
Composable executors
Isolation of complex code into coordinator
Atomicity/Abortability if created by the developer
Distributed Saga Weaknesses
No Consistency
Weak isolation
No Guaranteed Atomicity - Unsafe partially committed states
Complexity with versioning of saga logic
Increased application complexity
Rollback and recovery logic required in application tier
Idempotentcy impossible for some services
Linearizable Consistency Patterns
Cassandra LWT Design Approach
Cassandra Lightweight Transactions (LWT)
Use a single partition as a record locking mechanism
Use a CAS type operation
• Compare and Swap
• Compare and Set a new value
Cassandra Batch is not a traditional DB Batch
• Only Atomic within a single partition
Upside of Cassandra LWT
Linearizable Consistency within a single partition
All the benefits of Cassandra
• Great read / write performance
• High Availability / Fault Tolerant
• Tunable Consistency
If transactions are only needed for a small portion of the application then
LWT’s are useful
Downsides of Cassandra LWT
Transaction only applies to a single partition
High-Latency with multi-phase commit
Does not provide isolation of transaction
Expensive consensus algorithm - 4 roundtrips via Paxo’s
Global Scale Database
Transactions In-Depth
Classic 2 Phase Commit / Non-Deterministic
• 2 Phase Commit Rolling Dice on Effects being Applied
• Transaction Coordinator Walks across all involved partitions through out cluster
• Prepare Phase— Determining Effects and Contention
• Commit Phase - Tells Partitions if they succeed or failed
• 2 Rounds of Global Consensus
Writes - Non-Deterministic
Calvin Protocol / Deterministic Commit / Single Global Consensus Round
Three Phases
• Query Speculative Execution / Calculate Effects
• Non-Deterministic Query => Deterministic Transaction => Pure Function over state
• Global Log Commit / Ordering Phase
• Deterministic Transaction Results Applied
Writes - Deterministic
Non-Deterministic / Classic 2 Phase Commit
• Phase One
• Transaction Effects Determined => Intents Written
• Determining Transaction success or failure requires global coordination
• Phase Two
• Tell partitions if the succeed or need to abort
• 2 Phase Commit Rolling Dice on Effects being Applied
Deterministic Commit / Single Global Consensus Round
• When transaction is committed to log outcome is pre-determined
• Pre-Commit Speculative Calculate the Effects
• Transaction Committed => Transaction is Order => Effects Determined and Applied
• Transaction is a Pure Function over the State of the Database
• No Rollback of Effects
Deterministic vs. Non-Deterministic
Calvin inspired
• Single, global consensus domain per cluster
• All transactions handled identically on each replica
• Transaction batching maximizes throughput
(In contrast) Spanner inspired
• Multiple consensus domains, one per shard
• Per shard consensus ==> challenges in multi-shard transactions
• Wall clock TS approach to solve this problem
• Introduces “window of uncertainty”
• Consistency guarantee is lost any time clock skew exceeds “uncertainty” threshold
• Spanner addresses with unique Google hardware and API
• Others, software only solution
Consistency Without Clocks: Inspired by Calvin
Spanner / Spanner Inspired
• Problem consistent reads with concurrent writes
• Read using 2PC have high latency
• Alternative is choose a stable snapshot time for reads - How stale do you
want your data?
• Spanner guarantees stable read timestamps through clock synchronization
Read Consistency with Clocks
Calvin Protocol
• Read only transactions avoid transaction pipeline
• Log based total ordering allows partitions to guarantee stable read
• Local partition waits until data is consistent to perform reads at a given
snapshot
• Wait only happens when a node is behind on applying transactions
Read Consistency Without Clocks
Calvin Transaction Processing
Coordin
ator
Log
Data
Storage
Coordin
ator
Log
Data
Storage
Coordin
ator
Log
Data
Storage
Client
AWS
Calvin Transaction Processing
Coordin
ator
Log
Data
Storage
Coordin
ator
Log
Data
Storage
Coordin
ator
Log
Data
Storage
AWS
Client
Thank You
Ryan Knight @knight_cloud

More Related Content

What's hot

Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming WorldBen Stopford
 
Three perspective on migrating to Cloud
Three perspective on migrating to CloudThree perspective on migrating to Cloud
Three perspective on migrating to CloudLogicalis Australia
 
Accelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and KubernetesAccelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and KubernetesRick Hightower
 
Unlocking the Power of Salesforce Integrations with Confluent
Unlocking the Power of Salesforce Integrations with ConfluentUnlocking the Power of Salesforce Integrations with Confluent
Unlocking the Power of Salesforce Integrations with ConfluentAaronLieberman5
 
Declare Victory with Big Data
Declare Victory with Big DataDeclare Victory with Big Data
Declare Victory with Big DataJ On The Beach
 
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and MicroservicesAccelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and MicroservicesRick Hightower
 
Events & Microservices
Events & MicroservicesEvents & Microservices
Events & MicroservicesYamen Sader
 
DevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps TeamDevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps TeamNick DeMaster
 
CQRS: Command/Query Responsibility Segregation
CQRS: Command/Query Responsibility SegregationCQRS: Command/Query Responsibility Segregation
CQRS: Command/Query Responsibility SegregationBrian Ritchie
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?DataStax
 
Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)WSO2
 
Amazon AWS - a quick review
Amazon AWS - a quick reviewAmazon AWS - a quick review
Amazon AWS - a quick reviewGeeks Anonymes
 
Event Sourcing & CQRS: Down the rabbit hole
Event Sourcing & CQRS: Down the rabbit holeEvent Sourcing & CQRS: Down the rabbit hole
Event Sourcing & CQRS: Down the rabbit holeJulian May
 
Cloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessCloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessJonas Bonér
 
The 7 quests of resilient software design
The 7 quests of resilient software designThe 7 quests of resilient software design
The 7 quests of resilient software designUwe Friedrichsen
 
Mds cloud saturday 2015 how to heroku
Mds cloud saturday 2015 how to herokuMds cloud saturday 2015 how to heroku
Mds cloud saturday 2015 how to herokuDavid Scruggs
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?Venu Anuganti
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systemsKlika Tech, Inc
 

What's hot (20)

Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
 
Three perspective on migrating to Cloud
Three perspective on migrating to CloudThree perspective on migrating to Cloud
Three perspective on migrating to Cloud
 
Accelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and KubernetesAccelerate DevOps/Microservices and Kubernetes
Accelerate DevOps/Microservices and Kubernetes
 
Unlocking the Power of Salesforce Integrations with Confluent
Unlocking the Power of Salesforce Integrations with ConfluentUnlocking the Power of Salesforce Integrations with Confluent
Unlocking the Power of Salesforce Integrations with Confluent
 
Declare Victory with Big Data
Declare Victory with Big DataDeclare Victory with Big Data
Declare Victory with Big Data
 
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and MicroservicesAccelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
Accelerate Delivery: Business Case for Agile DevOps, CI/CD and Microservices
 
Events & Microservices
Events & MicroservicesEvents & Microservices
Events & Microservices
 
Data Insight Action
Data Insight ActionData Insight Action
Data Insight Action
 
DevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps TeamDevOpsDays SLC - Getting Along With Your DBOps Team
DevOpsDays SLC - Getting Along With Your DBOps Team
 
CQRS: Command/Query Responsibility Segregation
CQRS: Command/Query Responsibility SegregationCQRS: Command/Query Responsibility Segregation
CQRS: Command/Query Responsibility Segregation
 
C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
 
Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)
 
Amazon AWS - a quick review
Amazon AWS - a quick reviewAmazon AWS - a quick review
Amazon AWS - a quick review
 
Event Sourcing & CQRS: Down the rabbit hole
Event Sourcing & CQRS: Down the rabbit holeEvent Sourcing & CQRS: Down the rabbit hole
Event Sourcing & CQRS: Down the rabbit hole
 
Cloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful ServerlessCloudstate—Towards Stateful Serverless
Cloudstate—Towards Stateful Serverless
 
The 7 quests of resilient software design
The 7 quests of resilient software designThe 7 quests of resilient software design
The 7 quests of resilient software design
 
CQRS
CQRSCQRS
CQRS
 
Mds cloud saturday 2015 how to heroku
Mds cloud saturday 2015 how to herokuMds cloud saturday 2015 how to heroku
Mds cloud saturday 2015 how to heroku
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 

Similar to Data Consitency Patterns in Cloud Native Applications

Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...javier ramirez
 
Designing distributed systems
Designing distributed systemsDesigning distributed systems
Designing distributed systemsMalisa Ncube
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?Sargun Dhillon
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakijavier ramirez
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...javier ramirez
 
Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Codemotion
 
Simple Solutions for Complex Problems
Simple Solutions for Complex ProblemsSimple Solutions for Complex Problems
Simple Solutions for Complex ProblemsTyler Treat
 
Simple Solutions for Complex Problems
Simple Solutions for Complex Problems Simple Solutions for Complex Problems
Simple Solutions for Complex Problems Apcera
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxTekle12
 
Simple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder Meetup Simple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder Meetup NATS
 
Simple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder MeetupSimple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder MeetupApcera
 
ACID properties_DBMS.pdf
ACID properties_DBMS.pdfACID properties_DBMS.pdf
ACID properties_DBMS.pdfAbhoyBiswas1
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيMohamed Galal
 

Similar to Data Consitency Patterns in Cloud Native Applications (20)

Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
Designing distributed systems
Designing distributed systemsDesigning distributed systems
Designing distributed systems
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowaki
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
 
Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...
 
Simple Solutions for Complex Problems
Simple Solutions for Complex ProblemsSimple Solutions for Complex Problems
Simple Solutions for Complex Problems
 
Simple Solutions for Complex Problems
Simple Solutions for Complex Problems Simple Solutions for Complex Problems
Simple Solutions for Complex Problems
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
Database selection
Database selectionDatabase selection
Database selection
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptx
 
Hbase hive pig
Hbase hive pigHbase hive pig
Hbase hive pig
 
Simple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder Meetup Simple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder Meetup
 
Simple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder MeetupSimple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder Meetup
 
ACID properties_DBMS.pdf
ACID properties_DBMS.pdfACID properties_DBMS.pdf
ACID properties_DBMS.pdf
 
مقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربيمقدمة عن NoSQL بالعربي
مقدمة عن NoSQL بالعربي
 

Recently uploaded

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Recently uploaded (20)

Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Data Consitency Patterns in Cloud Native Applications

  • 1. Data Consistency Patterns in Cloud Native Applications Ryan Knight @knight_cloud
  • 2. 2 Agenda • What is Data Consistency? • Data Consistency in Microservices • Application Tier Consistency • Strong Consistency with Distributed Databases • Linearizable Consistency Patterns
  • 3. No One Solution “Do you want your data right or right now?” - Pat Helland PACELC Theorem -> More than CAP • In the absence of network partitions the trade-off is between latency and consistency - Daniel Abadi Understand what types of concurrency problems exist Evaluate trade-offs in the differing approaches Minimize Development Complexity
  • 4. What is Data Consistency?
  • 5. Consistency Challenges Dirty Reads - Read Uncommitted Write Read Skew / Non-Repeatable Reads Read your own Writes Lost Updates Write Skew
  • 6. Write Skew Two concurrent transactions each determine what they are writing based on reading a data set which overlaps what the other is writing begriffs.com
  • 7. Consistency in ACID Transactions ACID - Atomic, Consistent, Isolated and Durable • Many different levels of ACID Atomicity • All or Northing. - it all happened or it didn’t. • You don’t see things in the state in-between being processed. • Ability to abort an operation and roll-back Isolation • Concurrently executing transactions are isolated from each other. • Let the application developer pretend there is not two operations happening in parallel. Durable - Writes Stick Consistent • Enforce Invariants • Data must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof THIS IS NOT THE CONSISTENCY WE ARE LOOKING FOR
  • 8. Credit to Peter Bailis and Aphyr - jepsen.io/consistency Consistency Models Credit to Peter  Bailis and Aphyr at jepsens.io
  • 9. Linearizable and Serializable Consistency Serializability - multi-operation, multi-object, arbitrary total order Linearizability - single-operation, single-object, real-time order Linearizability plus Serializability provides Strict Serializability Peter Bailis - Linearizability versus Serializability
  • 10. What is Serializability? Serializability Consistency • Transaction isolation. • Concurrency issues when one transaction reads data that is concurrently modified by another transaction. Or when two transactions try to simultaneously modify the same data. • Database guarantees that two transactions have the same effect as if they where run serially. Or they have the illusion of running one at a time without any concurrency.
  • 11. Levels of Serializable Isolation Repeatable Reads - Read and Write Locks • Prevents Phantom Reads • Write skew still possible Read Committed - Write Locks that prevent: • Dirty reads - only see data once the transaction is committed • Dirty writes - only overwrite data that has been committed Read Uncommitted
  • 12. Linearizability Eventual vs. Strong Consistency is talking about Linearizability Guarantees that the order of reads and writes to a single register or row will always appear the same on all nodes. Appearance that there is only one copy of the data. It doesn’t group operations into transactions. It doesn’t address problems like dirty reads, phantom reads, etc.. Guarantees read-your-write behavior
  • 13. Linearizable Consistency in CAP Theorem CAP Theorem is about “atomic consistency” • Atomic consistency refers only to a property of a single request/response operation sequence. • Linearizability Linearizable Consistency in CAP Theorem
  • 14. AP w/ Session Based Consistency Yellow Nodes Causal consistency Monotonic reads / writes Strong consistency with a single process only Not isolated
  • 15. AP Consistency Blue Nodes Sacrifice consistency for higher-availability and partition tolerance 
 Maintains availability even when the network is down Monotonic atomic view Read committed / uncommitted
  • 16. CP Consistency Strict Serializable - Combine Serializable plus Linearizability Provides Highest Level of Consistency
 Moves Complexity of Transactions out of Microservices into the Database 
 True ACID Transactions Atomicity allows for rollback of transactions Complete Isolation of Transaction High Levels of Safety Guarantees

  • 17. Data Consistency in Microservices
  • 18. From Monolith to Microservices Data Consistency was easy in a monolith application - single source of truth w/ ACID transactions Move to microservices each service became a bounded context that owns and manages its data. Data Consistency became very difficult w/ microservices
  • 19. Consistency Challenges with Data in Microservices Traditional ACID transactions did not scale Data orchestration between multiple services - Number of Microservices Increases Number of Interactions Stateful or Stateless Data rehydration for things like service failures and rolling updates.
  • 20. Eventual Consistency CAP Theorem • Force choice between Global Scale or Strong Consistency Eventual Consistency • Sacrificed consistency for availability and partition tolerance. • Really a Necessary Evil • Last Write Wins - What if I can’t loose a write? • Write now and figure it out later Pushed complexity of managing consistency to application tier
  • 21. Return of Strong Consistency Rise of Databases providing strong consistency and global scale Possible to push complexity of consistency back to the database Not a panacea for data consistency challenges
  • 22. Distributed System Design Heart of distributed system design is a requirement for a consistent, performant, and reliable way of managing data.
  • 24. Advantages of Application Tier Consistency Low Read / Write Latency High-Throughput Read your Writes - Same session only Requires application to enforce session stickiness

  • 25. Disadvantages of Application Tier Consistency No Isolation and limited atomicity Consistency problems are far harder to solve in the application tier where Increased Complexity
  • 26. Use Cases of Application Tier Consistency Music Playlists Shopping Carts Social Media Posts
  • 27. Patterns for Application Tier Consistency Sticky Sessions • Session Consistency • Differing Levels of Lineriazability • Example - Akka Clustering
  • 28. Sticky Sessions Whether or not read-your-write, session and monotonic consistency can be achieved depends in general on the "stickiness" of clients to the server that executes the distributed protocol for them. If this is the same server every time than it is relatively easy to guarantee read-your-writes and monotonic reads. - Werner Vogels 2007
  • 29. Akka Clustering Pin Session to an Actor - Sticky Session Akka Clustering Libraries • Cluster Sharding • Cluster Singleton • Cluster Proxy Akka Persistence Akka Distributed Data w/ Conflict Free Replicated Data Types (CRDTs)
  • 30. What are CRDT’s? CRDT - Conflict Free Replicated Data Types Data types that guarantee convergence to the same value without any synchronization mechanism Consistency without Consensus Avoid distributed locks, two-phase commit,  etc. Data Structure that tells how to build the value Sacrifice linearizability (guaranteed ordering ) while remaining correct
  • 31. Akka Distributed Data CRDT’s Monotonic Sequences - Sequence that always increases or always decreases Monotonic Sequences are eventually consistent without any need for coordination protocols
 GCounter, PNCounter - Grow Only Counter / Positive Negative Counter GSet, ORSet - Grow Only Set, Observe Remove Set ORMap, PNCounterMap, LWWMap Flag, LWWRegister
  • 32. Akka Cluster Strengths Strong Consistency within a Single Actor Monotonic Read / Writes High Availability High Throughput and Low Latency Can be AP with a Split Brain Resolver Reduced latency because no db roundtrip
  • 33. Akka Cluster Weaknesses Akka Distributed Data limited to CRDT’s Akka Distributed Data has a limited data size • All entries are held in memory • Limit of 100,000 top level elements • Replication via gossip with large data gets slower No Isolation of Data Consistency of Akka Persistence depends on backing data store
  • 34. Strong Consistency with Global Distributed Databases
  • 35. Advantages of Strict Serializable Consistency Decrease Application Tier Complexity Reduce Cognitive Overhead Increased Developer Productivity Increased Focus on Business Value Strong Isolation Most implementations also provide strong atomicity
  • 36. Use Cases for Global Transactions Processing financial transactions Fulfilling and managing orders Anytime there needs to be coordination of complex transactions across multiple data sources.
  • 37. Strengths and Weaknesses of Strict Serializable Consistency Read your Writes - Across sessions Prevent Phantom Reads, Write Skew, etc. Higher Read / Write Latency Lower Throughput
  • 38. Disadvantages Transactions are hard. Distributed transactions are harder. Distributed transactions over the WAN are final boss hardness. I'm all for new DBMSs but people should tread carefully. - Andy Pavlo on Twitter
  • 39. Not All Transactions are the Same Distributed Multi-Value Concurrency (MVCC) / Snapshots Differences in Transaction Protocol • Global Ordering Done in a Single Phase vs. Multi-Phase • Pre or Post Commit Transaction Resolution Different levels of consistency Maximum scope of a transaction • Single Record vs. Multiple Records Transaction can be regional or global
  • 40. Differing Interpretations of Consistency and ACID Consistency and ACID Spectrum Week Isolation Level Scope of Transaction - Single Row Eventually Consistent Strongest Isolation Level Scope of Transaction - Distributed Across Partitions Serializable Consistency
  • 41. Lots of Options Google Spanner - 2 Phase Commit with dependency on proprietary atomic clocks Coackroach & YugaByte - Open Source version of Spanner with 2 Phase Commits and Hybrid Clocks Fauna - Single Phase Commit with no hard dependency on clocks FoundationDB - Serializable Snapshot Isolation AWS Dynamo Transactions - Multiple Object with limits to single region AWS Aurora - Multi-Master coming soon • Low Latency Read Replicas • Fault-tolerant - replicates six copies of your data across three Availability Zone
  • 42. Google Spanner External consistency, an isolation level even stricter than 
 serializability Relation Integrity Constraints 99.999% availability SLA Uses a global commit timestamps to guarantee ordering of 
 transactions using the TrueTime API. Multiple Shards with 2PC Single Shard Avoids 2PC for Writes / Read-only Transactions also avoid 2 PC No Downtime upgrades - Maintenance done by moving data between nodes Downside is cost and some limitations to the SQL model and schema design

  • 43. CoackroachDB Open source version of Spanner Hybrid Logical Clock similar to a vector clock for ordering of transactions Challenges with clock skew - waits up to 250 MS on reads Provides linearizability on single key and overlapping keys Transactions that span disjoint set of key it only provides serializability and not linearizability Some edge cases cause anomalies called “casual reverse” - Jepsen analysis
  • 44. YugaByte Also uses Hybrid Logical Clock Currently supports snapshot isolation Serializable isolation level work in progress Distributed Transactions to multiple partitions require a provisional record https://docs.yugabyte.com/latest/architecture/transactions/distributed-txns/
  • 45. Fauna DB Distributed Global Scale OLTP Database with Global Transactions Cloud or On-Prem Temporality Multi-Tenancy Advanced Security Model w/ Row Level Security Document Model Multiple Indexes per table (class) similar to materialized views
  • 46. Fauna Consensus Algorithm Transactions can include multiple rows - not restricted to data in a single row or shard Transaction resolution based on the Calvin protocol - pre-ordering of transactions before commit Global transaction ordering provides serializable consistency Distributed log based algorithm scales throughput with cluster size by partitioning the log
  • 47. Fauna Upsides Single-Phase Consensus Algorithm provides lowest possible global latency 
 Low Latency Snapshot Reads 
 No difference in multi-partition and single-partition transactions 
 Powerful Query Language - Complex Transaction in a single query 

  • 48. Fauna Downsides Proprietary Query Language Higher Write Latency with Global Transactions Writes always pay the cost of multi-partition transactions 

  • 50. Distributed Saga Overview Central Coordinator • Manages Complex Transaction Logic • Uses Event Sourcing to store state • State managed in an distributed log Split work into idempotent executors / requests Requires compensating requests for 
 dealing with failures / aborting transaction Effectively Once instead of Exactly Once
  • 51. Distributed Saga Strengths Fault Tolerant / HA Composable executors Isolation of complex code into coordinator Atomicity/Abortability if created by the developer
  • 52. Distributed Saga Weaknesses No Consistency Weak isolation No Guaranteed Atomicity - Unsafe partially committed states Complexity with versioning of saga logic Increased application complexity Rollback and recovery logic required in application tier Idempotentcy impossible for some services
  • 54. Cassandra LWT Design Approach Cassandra Lightweight Transactions (LWT) Use a single partition as a record locking mechanism Use a CAS type operation • Compare and Swap • Compare and Set a new value Cassandra Batch is not a traditional DB Batch • Only Atomic within a single partition
  • 55. Upside of Cassandra LWT Linearizable Consistency within a single partition All the benefits of Cassandra • Great read / write performance • High Availability / Fault Tolerant • Tunable Consistency If transactions are only needed for a small portion of the application then LWT’s are useful
  • 56. Downsides of Cassandra LWT Transaction only applies to a single partition High-Latency with multi-phase commit Does not provide isolation of transaction Expensive consensus algorithm - 4 roundtrips via Paxo’s
  • 58. Classic 2 Phase Commit / Non-Deterministic • 2 Phase Commit Rolling Dice on Effects being Applied • Transaction Coordinator Walks across all involved partitions through out cluster • Prepare Phase— Determining Effects and Contention • Commit Phase - Tells Partitions if they succeed or failed • 2 Rounds of Global Consensus Writes - Non-Deterministic
  • 59. Calvin Protocol / Deterministic Commit / Single Global Consensus Round Three Phases • Query Speculative Execution / Calculate Effects • Non-Deterministic Query => Deterministic Transaction => Pure Function over state • Global Log Commit / Ordering Phase • Deterministic Transaction Results Applied Writes - Deterministic
  • 60. Non-Deterministic / Classic 2 Phase Commit • Phase One • Transaction Effects Determined => Intents Written • Determining Transaction success or failure requires global coordination • Phase Two • Tell partitions if the succeed or need to abort • 2 Phase Commit Rolling Dice on Effects being Applied Deterministic Commit / Single Global Consensus Round • When transaction is committed to log outcome is pre-determined • Pre-Commit Speculative Calculate the Effects • Transaction Committed => Transaction is Order => Effects Determined and Applied • Transaction is a Pure Function over the State of the Database • No Rollback of Effects Deterministic vs. Non-Deterministic
  • 61. Calvin inspired • Single, global consensus domain per cluster • All transactions handled identically on each replica • Transaction batching maximizes throughput (In contrast) Spanner inspired • Multiple consensus domains, one per shard • Per shard consensus ==> challenges in multi-shard transactions • Wall clock TS approach to solve this problem • Introduces “window of uncertainty” • Consistency guarantee is lost any time clock skew exceeds “uncertainty” threshold • Spanner addresses with unique Google hardware and API • Others, software only solution Consistency Without Clocks: Inspired by Calvin
  • 62. Spanner / Spanner Inspired • Problem consistent reads with concurrent writes • Read using 2PC have high latency • Alternative is choose a stable snapshot time for reads - How stale do you want your data? • Spanner guarantees stable read timestamps through clock synchronization Read Consistency with Clocks
  • 63. Calvin Protocol • Read only transactions avoid transaction pipeline • Log based total ordering allows partitions to guarantee stable read • Local partition waits until data is consistent to perform reads at a given snapshot • Wait only happens when a node is behind on applying transactions Read Consistency Without Clocks
  • 66. Thank You Ryan Knight @knight_cloud