SlideShare a Scribd company logo
1 of 54
Download to read offline
Self-healing data
An introduction to consistency models, Quorums and CRDTs

Uwe Friedrichsen, codecentric AG, 2014-2015
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
Why NoSQL?



•  Scalability
•  Easier schema evolution
•  Availability on unreliable OTS hardware
•  It‘s more fun ...
Why NoSQL?



•  Scalability
•  Easier schema evolution
•  Availability on unreliable OTS hardware
•  It‘s more fun ...
Challenges



•  Giving up ACID transactions
•  (Temporal) anomalies and inconsistencies

short-term due to replication or long-term due to partitioning


It might happen. Thus, it will happen!
C
Consistency
A
Availability
P
Partition Tolerance
Strict Consistency

ACID / 2PC
Strong Consistency

Quorum R&W / Paxos
Eventual Consistency

CRDT / Gossip / Hinted Handoff
Strict Consistency (CA)

•  Great programming model

no anomalies or inconsistencies need to be considered
•  Does not scale well

best for single node databases

„We know ACID – It works!“
„We know 2PC – It sucks!“

Use for moderate data amounts
And what if I need more data?






•  Distributed datastore
•  Partition tolerance is a must
•  Need to give up strict consistency (CP or AP)
Strong Consistency (CP)

•  Majority based consistency model

can tolerate up to N nodes failing out of 2N+1 nodes
•  Good programming model

Single-copy consistency
•  Trades consistency for availability

in case of partitioning

Paxos (for sequential consistency)
Quorum-based reads & writes
Quorum-based reads & writes


•  N – number of nodes (replicas)
•  R – number of reads delivering the same result required
•  W – number of confirmed writes required

W > N / 2
R + W > N
Example 1

•  3 replica nodes

This is: N = 3
•  W > N / 2 à W > 3 / 2 à W > 1,5

Let‘s pick: W = 2 (can tolerate failure of 1 node)
•  R + W > N à R + 2 > 3 à R > 1

Let‘s pick: R = 2 (can tolerate failure of 1 node)
Client
Node 1
 Node 3
Node 2
W
 W
W
Client
Node 1
 Node 3
Node 2
Client
Node 1
 Node 3
Node 2
✔
 ✖
✔
W = 2
 ✔
Client
Node 1
 Node 3
Node 2
R
 R
R
Client
Node 1
 Node 3
Node 2
✔
 ✔
 ✔
Client
Node 1
 Node 3
Node 2
2x / 1x
 ✔
R = 2
 ✔
Client
Node 1
 Node 3
Node 2
R
 R
R
Client
Node 1
 Node 3
Node 2
✔
 ✔
✖
Client
Node 1
 Node 3
Node 2
1x / 1x
 ✖
R = 1
 ✖
Example 2

•  3 replica nodes

This is: N = 3
•  W > N / 2 à W > 3 / 2 à W > 1,5

Let‘s pick: W = 3 (strict consistency – does not tolerate any failure)
•  R + W > N à R + 3 > 3 à R > 0

Let‘s pick: R = 1 (single read from any node sufficient)
Example 3

•  5 replica nodes

This is: N = 5
•  W > N / 2 à W > 5 / 2 à W > 2,5

Let‘s pick: W = 3 (can tolerate failure of 2 nodes)
•  R + W > N à R + 3 > 5 à R > 2

Let‘s pick: R = 3 (can tolerate failure of 2 nodes)
Limitations of QB R&W

•  Requires fixed set of replica nodes

Dynamic adding and removal of nodes not possible
•  Client-centric consistency

Consistency only perceived on client, not on nodes
•  Requires additional measures on nodes

Nodes need to implement at least eventual consistency

Use if client-perceived strong consistency is

sufficient and simplicity trumps formal precision
And what if I need more availability?






•  Need to give up strong consistency (CP)
•  Relax required consistency properties even more
•  Leads to eventual consistency (AP)
Eventual Consistency (AP)

•  Gives up some consistency guarantees

no sequential consistency, anomalies become visible
•  Maximum availability possible

can tolerate up to N-1 nodes failing out of N nodes
•  Challenging programming model

anomalies usually need to be resolved explicitly

Gossip / Hinted Handoffs
CRDT
Conflict-free Replicated Data Types


•  Eventually consistent, self-stabilizing data structures
•  Designed for maximum availability
•  Tolerates up to N-1 out of N nodes failing

State-based CRDT: Convergent Replicated Data Type (CvRDT)
Operation-based CRDT: Commutative Replicated Data Type (CmRDT)
A bit of theory first ...
Convergent Replicated Data Type

State-based CRDT – CvRDT

•  All replicas (usually) connected
•  Exchange state between replicas, calculate new state on target replica
•  State transfer at least once over eventually-reliable channels
•  Set of possible states form a Semilattice
•  Partially ordered set of elements where all subsets have a Least Upper Bound (LUB)
•  All state changes advance upwards with respect to the partial order
Commutative Replicated Data Type

Operation-based CRDT - CmRDT

•  All replicas (usually) connected
•  Exchange update operations between replicas, apply on target replica
•  Reliable broadcast with ordering guarantee for non-concurrent
updates
•  Concurrent updates must be commutative
That‘s been enough theory ...
Counter
Op-based Counter

Data
Integer i
Init
i ≔ 0
Query
return i
Operations
increment(): i ≔ i + 1
decrement(): i ≔ i - 1
State-based G-Counter (grow only)
(Naïve approach)

Data
Integer i
Init
i ≔ 0
Query
return i
Update
increment(): i ≔ i + 1
Merge( j)
i ≔ max(i, j)
State-based G-Counter (grow only)
(Naïve approach)
R1
R3
R2
i = 1
U
i = 1
U
i = 0
i = 0
i = 0
I
I
I
M
i = 1
 i = 1
M
State-based G-Counter (grow only)
(Vector-based approach)

Data
Integer V[] / one element per replica set
Init
V ≔ [0, 0, ... , 0]
Query
return ∑i V[i]
Update
increment(): V[i] ≔ V[i] + 1 / i is replica set number
Merge(V‘)
∀i ∈ [0, n-1] : V[i] ≔ max(V[i], V‘[i])
State-based G-Counter (grow only)
(Vector-based approach)
R1
R3
R2
V = [1, 0, 0]
U
V = [0, 0, 0]
I
I
I
V = [0, 0, 0]
V = [0, 0, 0]
U
V = [0, 0, 1]
M
V = [1, 0, 0]
M
V = [1, 0, 1]
State-based PN-Counter (pos./neg.)


•  Simple vector approach as with G-Counter does not work
•  Violates monotonicity requirement of semilattice
•  Need to use two vectors
•  Vector P to track incements
•  Vector N to track decrements
•  Query result is ∑i P[i] – N[i]
State-based PN-Counter (pos./neg.)

Data
Integer P[], N[] / one element per replica set
Init
P ≔ [0, 0, ... , 0], N ≔ [0, 0, ... , 0]
Query
Return ∑i P[i] – N[i]
Update
increment(): P[i] ≔ P[i] + 1 / i is replica set number
decrement(): N[i] ≔ N[i] + 1 / i is replica set number
Merge(P‘, N‘)
∀i ∈ [0, n-1] : P[i] ≔ max(P[i], P‘[i])
∀i ∈ [0, n-1] : N[i] ≔ max(N[i], N‘[i])
Non-negative Counter

Problem: How to check a global invariant with local information only?

•  Approach 1: Only dec when local state is > 0
•  Concurrent decs could still lead to negative value

•  Approach 2: Externalize negative values as 0
•  inc(negative value) == noop(), violates counter semantics

•  Approach 3: Local invariant – only allow dec if P[i] - N[i] > 0
•  Works, but may be too strong limitation

•  Approach 4: Synchronize
•  Works, but violates assumptions and prerequisites of CRDTs
Sets
Op-based Set
(Naïve approach)

Data
Set S
Init
S ≔ {}
Query(e)
return e ∈ S
Operations
add(e) : S ≔ S ∪ {e}
remove(e): S ≔ S  {e}
Op-based Set
(Naïve approach)
R1
R3
R2
S = {e}
add(e)
S = {}
S = {}
S = {}
I
I
I
S = {e}
S = {}
rmv(e)
add(e)
add(e)
 add(e)
 rmv(e)
add(e)
S = {e}
S = {e}
 S = {e}
 S = {}
State-based G-Set (grow only)

Data
Set S
Init
S ≔ {}
Query(e)
return e ∈ S
Update
add(e): S ≔ S ∪ {e}
Merge(S‘)
S = S ∪ S‘
State-based 2P-Set (two-phase)

Data
Set A, R / A: added, R: removed
Init
A ≔ {}, R ≔ {}
Query(e)
return e ∈ A ∧ e ∉ R
Update
add(e): A ≔ A ∪ {e}
remove(e): (pre query(e)) R ≔ R ∪ {e}
Merge(A‘, R‘)
A ≔ A ∪ A‘, R ≔ R ∪ R‘
Op-based OR-Set (observed-remove)

Data
Set S / Set of pairs { (element e, unique tag u), ... }
Init
S ≔ {}
Query(e)
return ∃u : (e, u) ∈ S
Operations
add(e): S ≔ S ∪ { (e, u) } / u is generated unique tag
remove(e):
pre query(e)
R ≔ { (e, u) | ∃u : (e, u) ∈ S } /at source („prepare“)
S ≔ S  R /downstream („execute“)
Op-based OR-Set (observed-remove)
R1
R3
R2
S = {ea}
add(e)
S = {}
S = {}
S = {}
I
I
I
S = {eb}
S = {}
rmv(e)
add(e)
add(eb)
 add(ea)
 rmv(ea)
add(eb)
S = {eb}
S = {eb}
 S = {ea, eb}
 S = {eb}
More datatypes

•  Register
•  Dictionary (Map)
•  Tree
•  Graph
•  Array
•  List

plus more representations for each datatype
Garbage collection

•  Sets could grow infinitely in worst case
•  Garbage collection possible, but a bit tricky
•  Only remove updates that can be deleted safely
and were received by all replicas
•  Usually implemented using vector clocks
•  Can tolerate up to n-1 crashes
•  Live only while no replicas are crashed
•  Can induce surprising behavior sometimes
•  Sometimes stronger consensus is needed (Paxos, ...)
Limitations of CRDTs

•  Very weak consistency guarantees

Strives for „quiescent consistency“
•  Eventually consistent

Not suitable for high-volume ID generator or alike
•  Not easy to understand and model
•  Not all data structures representable

Use if availability is extremely important
Further reading

1.  Shapiro et al., Conflict-free Replicated
Data Types, Inria Research report, 2011
2.  Shapiro et al., A comprehensive study
of Convergent and Commutative
Replicated Data Types,

Inria Research report, 2011
3.  Basho Tag Archives: CRDT,

https://basho.com/tag/crdt/
4.  Leslie Lamport,

Time, clocks, and the ordering of
events in a distributed system,

Communications of the ACM (1978)
Wrap-up

•  CAP requires rethinking consistency
•  Strict Consistency

ACID / 2PC
•  Strong Consistency

Quorum-based R&W, Paxos
•  Eventual Consistency

CRDT, Gossip, Hinted Handoffs

Pick your consistency model based on your

consistency and availability requirements
The real world is not ACID
Thus, it is perfectly fine to go for a relaxed consistency model
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
Self healing data

More Related Content

What's hot

Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...SlideTeam
 
DevOps & Security: Here & Now
DevOps & Security: Here & NowDevOps & Security: Here & Now
DevOps & Security: Here & NowCheckmarx
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in MicroservicesRandy Shoup
 
Unattended OutSystems Installation
Unattended OutSystems InstallationUnattended OutSystems Installation
Unattended OutSystems InstallationOutSystems
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesShiva Narayanaswamy
 
Principles of Monitoring Microservices
Principles of Monitoring MicroservicesPrinciples of Monitoring Microservices
Principles of Monitoring MicroservicesMichael Ducy
 
All about paas_iaas_saas_29.01.2015
All about paas_iaas_saas_29.01.2015All about paas_iaas_saas_29.01.2015
All about paas_iaas_saas_29.01.2015mihaiburada
 
devon2013: 사내Git저장소개발사례
devon2013: 사내Git저장소개발사례devon2013: 사내Git저장소개발사례
devon2013: 사내Git저장소개발사례Daehyun Kim
 
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...Veritas Technologies LLC
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesIvo Andreev
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceFranklin Angulo
 
Mashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingMashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingDavid Linthicum
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud ComputingTom Eberle
 
GitOps A/B testing with Istio and Helm
GitOps A/B testing with Istio and HelmGitOps A/B testing with Istio and Helm
GitOps A/B testing with Istio and HelmWeaveworks
 

What's hot (20)

Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
Devops Strategy Roadmap Lifecycle Ppt Powerpoint Presentation Slides Complete...
 
DevOps & Security: Here & Now
DevOps & Security: Here & NowDevOps & Security: Here & Now
DevOps & Security: Here & Now
 
Managing Data in Microservices
Managing Data in MicroservicesManaging Data in Microservices
Managing Data in Microservices
 
Unattended OutSystems Installation
Unattended OutSystems InstallationUnattended OutSystems Installation
Unattended OutSystems Installation
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best Practices
 
Principles of Monitoring Microservices
Principles of Monitoring MicroservicesPrinciples of Monitoring Microservices
Principles of Monitoring Microservices
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
All about paas_iaas_saas_29.01.2015
All about paas_iaas_saas_29.01.2015All about paas_iaas_saas_29.01.2015
All about paas_iaas_saas_29.01.2015
 
devon2013: 사내Git저장소개발사례
devon2013: 사내Git저장소개발사례devon2013: 사내Git저장소개발사례
devon2013: 사내Git저장소개발사례
 
AWS Training and Certification
AWS Training and CertificationAWS Training and Certification
AWS Training and Certification
 
AWS Code + AWS Device Farm
AWS Code + AWS Device FarmAWS Code + AWS Device Farm
AWS Code + AWS Device Farm
 
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
Technical Best Practices for Veritas and Microsoft Azure Using a Detailed Ref...
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ Squarespace
 
Mashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud ComputingMashing Up DevOps with Cloud Computing
Mashing Up DevOps with Cloud Computing
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
How to Build a DevOps Toolchain
How to Build a DevOps ToolchainHow to Build a DevOps Toolchain
How to Build a DevOps Toolchain
 
DevOps introduction
DevOps introductionDevOps introduction
DevOps introduction
 
GitOps A/B testing with Istio and Helm
GitOps A/B testing with Istio and HelmGitOps A/B testing with Istio and Helm
GitOps A/B testing with Istio and Helm
 
Enabling The DevOps Culture At Organization
Enabling The DevOps Culture At OrganizationEnabling The DevOps Culture At Organization
Enabling The DevOps Culture At Organization
 

Viewers also liked

How to survive in a BASE world
How to survive in a BASE worldHow to survive in a BASE world
How to survive in a BASE worldUwe Friedrichsen
 
Dr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismDr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismUwe Friedrichsen
 
The promises and perils of microservices
The promises and perils of microservicesThe promises and perils of microservices
The promises and perils of microservicesUwe Friedrichsen
 
Microservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack riskMicroservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack riskUwe Friedrichsen
 
Conway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective ITConway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective ITUwe Friedrichsen
 
Why resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudesWhy resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudesUwe Friedrichsen
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHenning Spjelkavik
 
Towards complex adaptive architectures
Towards complex adaptive architecturesTowards complex adaptive architectures
Towards complex adaptive architecturesUwe Friedrichsen
 
Resilient Functional Service Design
Resilient Functional Service DesignResilient Functional Service Design
Resilient Functional Service DesignUwe Friedrichsen
 
DevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader contextDevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader contextUwe Friedrichsen
 
Modern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of ITModern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of ITUwe Friedrichsen
 
The Next Generation (of) IT
The Next Generation (of) ITThe Next Generation (of) IT
The Next Generation (of) ITUwe Friedrichsen
 

Viewers also liked (20)

Resilience with Hystrix
Resilience with HystrixResilience with Hystrix
Resilience with Hystrix
 
Devops for Developers
Devops for DevelopersDevops for Developers
Devops for Developers
 
How to survive in a BASE world
How to survive in a BASE worldHow to survive in a BASE world
How to survive in a BASE world
 
Dr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismDr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinism
 
Fault tolerance made easy
Fault tolerance made easyFault tolerance made easy
Fault tolerance made easy
 
The promises and perils of microservices
The promises and perils of microservicesThe promises and perils of microservices
The promises and perils of microservices
 
No stress with state
No stress with stateNo stress with state
No stress with state
 
Microservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack riskMicroservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack risk
 
Conway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective ITConway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective IT
 
Why resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudesWhy resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudes
 
Fantastic Elastic
Fantastic ElasticFantastic Elastic
Fantastic Elastic
 
How we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.noHow we sleep well at night using Hystrix at Finn.no
How we sleep well at night using Hystrix at Finn.no
 
Towards complex adaptive architectures
Towards complex adaptive architecturesTowards complex adaptive architectures
Towards complex adaptive architectures
 
Production-ready Software
Production-ready SoftwareProduction-ready Software
Production-ready Software
 
Resilient Functional Service Design
Resilient Functional Service DesignResilient Functional Service Design
Resilient Functional Service Design
 
Watch your communication
Watch your communicationWatch your communication
Watch your communication
 
DevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader contextDevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader context
 
Modern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of ITModern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of IT
 
The Next Generation (of) IT
The Next Generation (of) ITThe Next Generation (of) IT
The Next Generation (of) IT
 
Life, IT and everything
Life, IT and everythingLife, IT and everything
Life, IT and everything
 

Similar to Self healing data

Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...AboutYouGmbH
 
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...NoSQLmatters
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresSusan Potter
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...Nesreen K. Ahmed
 
Functional Operations - Susan Potter
Functional Operations - Susan PotterFunctional Operations - Susan Potter
Functional Operations - Susan Potterdistributed matters
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with RAbhirup Mallik
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
 
Predective analytcis v0.1 AS
Predective analytcis v0.1 ASPredective analytcis v0.1 AS
Predective analytcis v0.1 ASAnkur Sansanwal
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018Manish Pandey
 
VRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRPVRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRPVictor Pillac
 
MLSD18. Unsupervised Learning
MLSD18. Unsupervised LearningMLSD18. Unsupervised Learning
MLSD18. Unsupervised LearningBigML, Inc
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Parallelising Dynamic Programming
Parallelising Dynamic ProgrammingParallelising Dynamic Programming
Parallelising Dynamic ProgrammingRaphael Reitzig
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةFares Al-Qunaieer
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues listsJames Wong
 

Similar to Self healing data (20)

Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...
 
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
 
Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Dynamo: Not Just For Datastores
Dynamo: Not Just For DatastoresDynamo: Not Just For Datastores
Dynamo: Not Just For Datastores
 
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
The Power of Motif Counting Theory, Algorithms, and Applications for Large Gr...
 
Functional Operations - Susan Potter
Functional Operations - Susan PotterFunctional Operations - Susan Potter
Functional Operations - Susan Potter
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
Predective analytcis v0.1 AS
Predective analytcis v0.1 ASPredective analytcis v0.1 AS
Predective analytcis v0.1 AS
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018
 
VRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRPVRP2013 - Comp Aspects VRP
VRP2013 - Comp Aspects VRP
 
1015 track2 abbott
1015 track2 abbott1015 track2 abbott
1015 track2 abbott
 
1030 track2 abbott
1030 track2 abbott1030 track2 abbott
1030 track2 abbott
 
MLSD18. Unsupervised Learning
MLSD18. Unsupervised LearningMLSD18. Unsupervised Learning
MLSD18. Unsupervised Learning
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Parallelising Dynamic Programming
Parallelising Dynamic ProgrammingParallelising Dynamic Programming
Parallelising Dynamic Programming
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 

More from Uwe Friedrichsen

Timeless design in a cloud-native world
Timeless design in a cloud-native worldTimeless design in a cloud-native world
Timeless design in a cloud-native worldUwe Friedrichsen
 
The hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developerThe hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developerUwe Friedrichsen
 
Digitization solutions - A new breed of software
Digitization solutions - A new breed of softwareDigitization solutions - A new breed of software
Digitization solutions - A new breed of softwareUwe Friedrichsen
 
Real-world consistency explained
Real-world consistency explainedReal-world consistency explained
Real-world consistency explainedUwe Friedrichsen
 
The 7 quests of resilient software design
The 7 quests of resilient software designThe 7 quests of resilient software design
The 7 quests of resilient software designUwe Friedrichsen
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsUwe Friedrichsen
 
The truth about "You build it, you run it!"
The truth about "You build it, you run it!"The truth about "You build it, you run it!"
The truth about "You build it, you run it!"Uwe Friedrichsen
 

More from Uwe Friedrichsen (9)

Timeless design in a cloud-native world
Timeless design in a cloud-native worldTimeless design in a cloud-native world
Timeless design in a cloud-native world
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Life after microservices
Life after microservicesLife after microservices
Life after microservices
 
The hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developerThe hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developer
 
Digitization solutions - A new breed of software
Digitization solutions - A new breed of softwareDigitization solutions - A new breed of software
Digitization solutions - A new breed of software
 
Real-world consistency explained
Real-world consistency explainedReal-world consistency explained
Real-world consistency explained
 
The 7 quests of resilient software design
The 7 quests of resilient software designThe 7 quests of resilient software design
The 7 quests of resilient software design
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestors
 
The truth about "You build it, you run it!"
The truth about "You build it, you run it!"The truth about "You build it, you run it!"
The truth about "You build it, you run it!"
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Self healing data

  • 1. Self-healing data An introduction to consistency models, Quorums and CRDTs Uwe Friedrichsen, codecentric AG, 2014-2015
  • 2. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
  • 3. Why NoSQL? •  Scalability •  Easier schema evolution •  Availability on unreliable OTS hardware •  It‘s more fun ...
  • 4. Why NoSQL? •  Scalability •  Easier schema evolution •  Availability on unreliable OTS hardware •  It‘s more fun ...
  • 5. Challenges •  Giving up ACID transactions •  (Temporal) anomalies and inconsistencies
 short-term due to replication or long-term due to partitioning It might happen. Thus, it will happen!
  • 6. C Consistency A Availability P Partition Tolerance Strict Consistency ACID / 2PC Strong Consistency Quorum R&W / Paxos Eventual Consistency CRDT / Gossip / Hinted Handoff
  • 7. Strict Consistency (CA) •  Great programming model
 no anomalies or inconsistencies need to be considered •  Does not scale well
 best for single node databases „We know ACID – It works!“ „We know 2PC – It sucks!“ Use for moderate data amounts
  • 8. And what if I need more data? •  Distributed datastore •  Partition tolerance is a must •  Need to give up strict consistency (CP or AP)
  • 9. Strong Consistency (CP) •  Majority based consistency model
 can tolerate up to N nodes failing out of 2N+1 nodes •  Good programming model
 Single-copy consistency •  Trades consistency for availability
 in case of partitioning Paxos (for sequential consistency) Quorum-based reads & writes
  • 10. Quorum-based reads & writes •  N – number of nodes (replicas) •  R – number of reads delivering the same result required •  W – number of confirmed writes required W > N / 2 R + W > N
  • 11. Example 1 •  3 replica nodes
 This is: N = 3 •  W > N / 2 à W > 3 / 2 à W > 1,5
 Let‘s pick: W = 2 (can tolerate failure of 1 node) •  R + W > N à R + 2 > 3 à R > 1
 Let‘s pick: R = 2 (can tolerate failure of 1 node)
  • 12. Client Node 1 Node 3 Node 2
  • 13. W W W Client Node 1 Node 3 Node 2
  • 14. Client Node 1 Node 3 Node 2 ✔ ✖ ✔ W = 2 ✔
  • 15. Client Node 1 Node 3 Node 2
  • 16. R R R Client Node 1 Node 3 Node 2 ✔ ✔ ✔
  • 17. Client Node 1 Node 3 Node 2 2x / 1x ✔ R = 2 ✔
  • 18. Client Node 1 Node 3 Node 2
  • 19. R R R Client Node 1 Node 3 Node 2 ✔ ✔ ✖
  • 20. Client Node 1 Node 3 Node 2 1x / 1x ✖ R = 1 ✖
  • 21. Example 2 •  3 replica nodes
 This is: N = 3 •  W > N / 2 à W > 3 / 2 à W > 1,5
 Let‘s pick: W = 3 (strict consistency – does not tolerate any failure) •  R + W > N à R + 3 > 3 à R > 0
 Let‘s pick: R = 1 (single read from any node sufficient)
  • 22. Example 3 •  5 replica nodes
 This is: N = 5 •  W > N / 2 à W > 5 / 2 à W > 2,5
 Let‘s pick: W = 3 (can tolerate failure of 2 nodes) •  R + W > N à R + 3 > 5 à R > 2
 Let‘s pick: R = 3 (can tolerate failure of 2 nodes)
  • 23. Limitations of QB R&W •  Requires fixed set of replica nodes
 Dynamic adding and removal of nodes not possible •  Client-centric consistency
 Consistency only perceived on client, not on nodes •  Requires additional measures on nodes
 Nodes need to implement at least eventual consistency Use if client-perceived strong consistency is
 sufficient and simplicity trumps formal precision
  • 24. And what if I need more availability? •  Need to give up strong consistency (CP) •  Relax required consistency properties even more •  Leads to eventual consistency (AP)
  • 25. Eventual Consistency (AP) •  Gives up some consistency guarantees
 no sequential consistency, anomalies become visible •  Maximum availability possible
 can tolerate up to N-1 nodes failing out of N nodes •  Challenging programming model
 anomalies usually need to be resolved explicitly Gossip / Hinted Handoffs CRDT
  • 26. Conflict-free Replicated Data Types •  Eventually consistent, self-stabilizing data structures •  Designed for maximum availability •  Tolerates up to N-1 out of N nodes failing State-based CRDT: Convergent Replicated Data Type (CvRDT) Operation-based CRDT: Commutative Replicated Data Type (CmRDT)
  • 27. A bit of theory first ...
  • 28. Convergent Replicated Data Type State-based CRDT – CvRDT •  All replicas (usually) connected •  Exchange state between replicas, calculate new state on target replica •  State transfer at least once over eventually-reliable channels •  Set of possible states form a Semilattice •  Partially ordered set of elements where all subsets have a Least Upper Bound (LUB) •  All state changes advance upwards with respect to the partial order
  • 29. Commutative Replicated Data Type Operation-based CRDT - CmRDT •  All replicas (usually) connected •  Exchange update operations between replicas, apply on target replica •  Reliable broadcast with ordering guarantee for non-concurrent updates •  Concurrent updates must be commutative
  • 30. That‘s been enough theory ...
  • 32. Op-based Counter Data Integer i Init i ≔ 0 Query return i Operations increment(): i ≔ i + 1 decrement(): i ≔ i - 1
  • 33. State-based G-Counter (grow only) (Naïve approach) Data Integer i Init i ≔ 0 Query return i Update increment(): i ≔ i + 1 Merge( j) i ≔ max(i, j)
  • 34. State-based G-Counter (grow only) (Naïve approach) R1 R3 R2 i = 1 U i = 1 U i = 0 i = 0 i = 0 I I I M i = 1 i = 1 M
  • 35. State-based G-Counter (grow only) (Vector-based approach) Data Integer V[] / one element per replica set Init V ≔ [0, 0, ... , 0] Query return ∑i V[i] Update increment(): V[i] ≔ V[i] + 1 / i is replica set number Merge(V‘) ∀i ∈ [0, n-1] : V[i] ≔ max(V[i], V‘[i])
  • 36. State-based G-Counter (grow only) (Vector-based approach) R1 R3 R2 V = [1, 0, 0] U V = [0, 0, 0] I I I V = [0, 0, 0] V = [0, 0, 0] U V = [0, 0, 1] M V = [1, 0, 0] M V = [1, 0, 1]
  • 37. State-based PN-Counter (pos./neg.) •  Simple vector approach as with G-Counter does not work •  Violates monotonicity requirement of semilattice •  Need to use two vectors •  Vector P to track incements •  Vector N to track decrements •  Query result is ∑i P[i] – N[i]
  • 38. State-based PN-Counter (pos./neg.) Data Integer P[], N[] / one element per replica set Init P ≔ [0, 0, ... , 0], N ≔ [0, 0, ... , 0] Query Return ∑i P[i] – N[i] Update increment(): P[i] ≔ P[i] + 1 / i is replica set number decrement(): N[i] ≔ N[i] + 1 / i is replica set number Merge(P‘, N‘) ∀i ∈ [0, n-1] : P[i] ≔ max(P[i], P‘[i]) ∀i ∈ [0, n-1] : N[i] ≔ max(N[i], N‘[i])
  • 39. Non-negative Counter Problem: How to check a global invariant with local information only? •  Approach 1: Only dec when local state is > 0 •  Concurrent decs could still lead to negative value •  Approach 2: Externalize negative values as 0 •  inc(negative value) == noop(), violates counter semantics •  Approach 3: Local invariant – only allow dec if P[i] - N[i] > 0 •  Works, but may be too strong limitation •  Approach 4: Synchronize •  Works, but violates assumptions and prerequisites of CRDTs
  • 40. Sets
  • 41. Op-based Set (Naïve approach) Data Set S Init S ≔ {} Query(e) return e ∈ S Operations add(e) : S ≔ S ∪ {e} remove(e): S ≔ S {e}
  • 42. Op-based Set (Naïve approach) R1 R3 R2 S = {e} add(e) S = {} S = {} S = {} I I I S = {e} S = {} rmv(e) add(e) add(e) add(e) rmv(e) add(e) S = {e} S = {e} S = {e} S = {}
  • 43. State-based G-Set (grow only) Data Set S Init S ≔ {} Query(e) return e ∈ S Update add(e): S ≔ S ∪ {e} Merge(S‘) S = S ∪ S‘
  • 44. State-based 2P-Set (two-phase) Data Set A, R / A: added, R: removed Init A ≔ {}, R ≔ {} Query(e) return e ∈ A ∧ e ∉ R Update add(e): A ≔ A ∪ {e} remove(e): (pre query(e)) R ≔ R ∪ {e} Merge(A‘, R‘) A ≔ A ∪ A‘, R ≔ R ∪ R‘
  • 45. Op-based OR-Set (observed-remove) Data Set S / Set of pairs { (element e, unique tag u), ... } Init S ≔ {} Query(e) return ∃u : (e, u) ∈ S Operations add(e): S ≔ S ∪ { (e, u) } / u is generated unique tag remove(e): pre query(e) R ≔ { (e, u) | ∃u : (e, u) ∈ S } /at source („prepare“) S ≔ S R /downstream („execute“)
  • 46. Op-based OR-Set (observed-remove) R1 R3 R2 S = {ea} add(e) S = {} S = {} S = {} I I I S = {eb} S = {} rmv(e) add(e) add(eb) add(ea) rmv(ea) add(eb) S = {eb} S = {eb} S = {ea, eb} S = {eb}
  • 47. More datatypes •  Register •  Dictionary (Map) •  Tree •  Graph •  Array •  List plus more representations for each datatype
  • 48. Garbage collection •  Sets could grow infinitely in worst case •  Garbage collection possible, but a bit tricky •  Only remove updates that can be deleted safely and were received by all replicas •  Usually implemented using vector clocks •  Can tolerate up to n-1 crashes •  Live only while no replicas are crashed •  Can induce surprising behavior sometimes •  Sometimes stronger consensus is needed (Paxos, ...)
  • 49. Limitations of CRDTs •  Very weak consistency guarantees
 Strives for „quiescent consistency“ •  Eventually consistent
 Not suitable for high-volume ID generator or alike •  Not easy to understand and model •  Not all data structures representable Use if availability is extremely important
  • 50. Further reading 1.  Shapiro et al., Conflict-free Replicated Data Types, Inria Research report, 2011 2.  Shapiro et al., A comprehensive study of Convergent and Commutative Replicated Data Types,
 Inria Research report, 2011 3.  Basho Tag Archives: CRDT,
 https://basho.com/tag/crdt/ 4.  Leslie Lamport,
 Time, clocks, and the ordering of events in a distributed system,
 Communications of the ACM (1978)
  • 51. Wrap-up •  CAP requires rethinking consistency •  Strict Consistency
 ACID / 2PC •  Strong Consistency
 Quorum-based R&W, Paxos •  Eventual Consistency
 CRDT, Gossip, Hinted Handoffs Pick your consistency model based on your
 consistency and availability requirements
  • 52. The real world is not ACID Thus, it is perfectly fine to go for a relaxed consistency model
  • 53. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com