SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
M A K E Y O U R C H O I C E
C O N S I S T E N C Y, A VA I L A B I L I T Y, PA R T I T I O N
A n d re a G i u l i a n o
@ b i t _ s h a r k
D I S T R I B U T E D S Y S T E M S
W H AT A D I S T R I B U T E D S Y S T E M I S
“A distributed system is a software system in which
components located on networked computers communicate
and coordinate their actions by passing messages”
D I S T R I B U T E D S Y S T E M S
E X A M P L E S
D I S T R I B U T E D S Y S T E M S
R E P L I C AT I O N
R E P L I C AT E D S E R V I C E
P R O P E R T I E S
CONSISTENCY
AVAILABILITY
C O N S I S T E N C Y
The result of operations will be predictable
C O N S I S T E N C Y
Strong consistency
all replicas return the same value for the same object
C O N S I S T E N C Y
Strong consistency
all replicas return the same value for the same object
Weak consistency
different replicas can return different values for the same object
S T R O N G V S W E A K
C O N S I S T E N C Y
S T R O N G V S W E A K
C O N S I S T E N C Y
Strong consistency
Atomic, consistent, isolated, durable database
Weak consistency
Basically Available Soft-state Eventual consistency database
E X A M P L E
C O N S I S T E N C Y
put(price, 10)
E X A M P L E
C O N S I S T E N C Y
get(price)
price = 10
AVA I L A B I L I T Y
E X A M P L E
A VA I L A B I L I T Y
C O M M U N I C AT I O N
PA R T I T I O N T O L E R A N C E
continue to operate even in presence of partitions
PA R T I T I O N T O L E R A N C E
Network failure
groups at each side of a faulty entity network (switch, backbone)
Process failure
system split in two groups: correct nodes and crashed node
C A P T H E O R E M
“Of three properties of shared-data systems
(data consistency, system availability and
tolerance to network partitions) only two can
be achieved at any given moment in time.”
T H E P R O O F
C A P T H E O R E M
put(price, 10)
get(price)
price = 0
price = 0 price = 0
price = 0
no response
not consistent
not available
t2
t1
partition 1
partition 2
CONSISTENCY AVAILABILITY
PARTITION
TOLERANCE
➡ distributed databases
➡ distributed locking
➡ majority protocol
➡ active/passive replication
➡ quorum-based systems
BigTable
C A P T H E O R E M
I N P R A C T I C E
C A P T H E O R E M
CONSISTENCY AVAILABILITY
PARTITION
TOLERANCE
➡ web caches
➡ stateless systems
➡ DNS
DynamoDB
C A P T H E O R E M
CONSISTENCY AVAILABILITY
PARTITION
TOLERANCE
➡ Single site database
➡ cluster databases
➡ ldap
D Y N A M O
R E Q U I R E M E N T S
D Y N A M O
“customers should be able to view and add items
to their shopping cart even if disks are failing,
network routes are flapping, or data centers are
being destroyed by tornados.”
R E Q U I R E M E N T S
D Y N A M O
“customers should be able to view and add items
to their shopping cart even if disks are failing,
network routes are flapping, or data centers are
being destroyed by tornados.”
➡ reliable
➡ high scalable
➡ always available
S I M P L E I N T E R FA C E
D Y N A M O
get(key)
returns the object associated with the key and returns a
single object or a list of objects with conflicting versions
along with a context.
put(key, context, object)
determines where the replicas of the object should be
placed based on the associated key. The context
includes information such as the version of the object.
R E P L I C AT I O N : T H E C H O I C E
D Y N A M O
Synchronous replica coordination
‣ strong consistency
‣ availability tradeoff
Optimistic replication technique
‣ high availability
‣ conflicts probability
C O N F L I C T S : W H E N
D Y N A M O
At write time
‣ writes rejection probability
At read time
‣ “always writable” datastore
C O N F L I C T S : W H O
D Y N A M O
The data store
‣ e.g. “last write win” policy
The application
‣ resolution as implementation detail
A R I N G T O R U L E T H E M A L L
D Y N A M O
PA R T I T I O N I N G : T H E R I N G
D Y N A M O
A
B
C
DE
F
G
DATA
hash
R E P L I C AT I O N
D Y N A M O
A
B
C
DE
F
G
N = 3 D will store keys in the range (A, B], (B, C], (C, D]
DATA
hash
D ATA V E R S I O N I N G
D Y N A M O
put()
may return before the update has been propagated to
all replicas.
get()
subsequent get() may return an object that does not
have the latest update
R E C O N C I L I AT I O N
D Y N A M O
R E C O N C I L I AT I O N
D Y N A M O
Syntactic reconciliation
‣ new version subsumes the previous
Semantic reconciliation
‣ conflicting versions of the same object
V E C T O R C L O C K
D Y N A M O
V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
write
handled by Sx
V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
D2
[Sx,2]
write
handled by Sx
write
handled by Sx
V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
D2
[Sx,2]
D3
[Sx,2], [Sy,1]
write
handled by Sx
write
handled by Sx
handled by Sywrite
V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
D2
[Sx,2]
D3
[Sx,2], [Sy,1]
D4
[Sx,2], [Sz,1]
write
handled by Sx
write
handled by Sx
write
handled by Sy
write
handled by Sz
V E C T O R C L O C K
D Y N A M O
Definition
‣ list of (node, counter) pairs
D1
[Sx,1]
D2
[Sx,2]
D3
[Sx,2], [Sy,1]
D4
[Sx,2], [Sz,1]
D5 [Sx,3], [Sy,1], [Sz,1]
write
handled by Sx
write
handled by Sx
write
handled by Sy
write
handled by Sz
reconciled and
written by Sx
P U T ( ) A N D G E T ( )
D Y N A M O
R
‣ minimum number of nodes that must partecipate
in a read operation.
W
‣ minimum number of nodes that must participate
in a successful write operation
P U T ( ) A N D G E T ( )
D Y N A M O
put()
‣ the coordinator generates the vector clock for the new version and
writes the new version locally
‣ the new version is sent to N nodes
‣ the write is successful if W-1 nodes respond
get()
‣ the coordinator requests all existing versions of data
‣ the coordinator waits for R responses before returning the result
‣ the coordinator returns all the version causally unrelated
‣ the divergent versions are reconciled and written back
S L O P P Y Q U O R U M
D Y N A M O
A
B
C
DE
F
G
N = 3
W H Y I S A P ?
D Y N A M O
‣ requests served even if some replicas are not available
‣ if some node is down the write is stored to another node
‣ consistency conflicts resolved at read time or in the
background
‣ eventually, all the replicas will converge
‣ concurrent read/write operation can make distinct clients
see distinct versions of the same key
B I G TA B L E
R E Q U I R E M E N T S
G O O G L E B I G TA B L E
‣ scale to petabyte of data
‣ thousand of machines
‣ high availability
‣ high performance
D ATA M O D E L
G O O G L E B I G TA B L E
‣ sparse, distributed, persistent multi-dimensional
sorted map
(row: string, column: string, time: int64) string
R O W S
G O O G L E B I G TA B L E
‣ arbitrary strings
‣ read/write operations are atomic
‣ data is maintained in lexicographic order by row key
‣ each row range is called a tablet
maps.google.com com.google.maps
C O L U M N S
G O O G L E B I G TA B L E
‣ columns keys are grouped into sets: column families
‣ a column family must be created before data can be
stored under any column key in that family
‣ column key named as family:qualifier
‣ access control and both disk and memory
accounting are performed at the column-family level
T I M E S TA M P S
G O O G L E B I G TA B L E
C O N T E N T S :
c o m . e x a m p l e
< h t m l > …
< h t m l > …
t 1
t 2
D ATA M O D E L : E X A M P L E
G O O G L E B I G TA B L E
L A N G U A G E : C O N T E N T S : A N C H O R : C N N S I . C O M A N C H R : M Y L O O K . C A
c o m . e x a m p l e e n
< ! D O C T Y P E
h t m l P U B L I C
…
c o m . c n n . w w w e n
< ! D O C T Y P E
h t m l P U B L I C
…
“ c n n " “ c n n . c o m ”
c o m . c n n . w w w / f o o e n
< ! D O C T Y P E
h t m l P U B L I C
…
column familiesrow keys
sortedrows
D I F F E R E N C E S W I T H R D B M S
G O O G L E B I G TA B L E
R D B M S B I G TA B L E
q u e r y l a n g u a g e s p e c i f i c a p i
j o i n s n o re f e re n t i a l i n t e g r i t y
e x p l i c i t s o r t i n g
s o r t i n g d e f i n e d a p r i o r i
i n t h e c o l u m n f a m i l y
A R C H I T E C T U R E
G O O G L E B I G TA B L E
Google File System (GFS)
‣ store data files and logs
Google SSTable
‣ store BigTable data
Chubby
‣ high-available distributed lock service
C O M P O N E N T S
G O O G L E B I G TA B L E
library
‣ linked into every client
one master server
‣ assigning tablets to tablet server
‣ detecting the addition and expiration of tablet servers
‣ balancing tablet-server load
‣ garbaging collection of files in GFS
‣ handling schema changes
many tablet servers
‣ manages 10 to 100 tablets
‣ handles read and write requests to the tablets
‣ splits tablets that have grown too large
C O M P O N E N T S
G O O G L E B I G TA B L E
Master server
Client
Tablet server Tablet server Tablet server
Metadata
read/write
S TA R T U P A N D G R O W T H
G O O G L E B I G TA B L E
Chubby file
Root tablet
1st Metadata tablet
other
metadata
tablets
UserTableN
UserTable1
…
…
…
…
…
…
…
…
…
…
…
TA B L E T A S S I G N M E N T
G O O G L E B I G TA B L E
tablet server
‣ when started, creates and acquires a lock in Chubby
master
‣ grabs a unique master lock in Chubby
‣ scans Chubby to find live tablet servers
‣ asks each tablet server to discover its tablets
‣ scans the Metadata table to learn the full set of tablets
‣ builds a set of unassigned tablet server, for future tablet
assignment
W H Y I S C P ?
G O O G L E B I G TA B L E
‣ master death cause services no longer functioning
‣ tablet server death cause tablets unavailable
‣ Chubby death cause BigTable inability to execute
synchronization operations and to serve client requests
‣ Google File System is a CP system
$ W H O A M I
Andrea Giuliano
@bit_shark
www.andreagiuliano.it
joind.in/13224
Please rate the talk!
G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key-value Store”
F. Chang et al. “Bigtable: A Distributed Storage System for Structured Data”
Assets:
https://farm1.staticflickr.com/41/86744006_0026864df8_b_d.jpg
https://farm9.staticflickr.com/8305/7883634326_4e51a1a320_b_d.jpg
https://farm5.staticflickr.com/4145/4958650244_65b2eddffc_b_d.jpg
https://farm4.staticflickr.com/3677/10023456065_e54212c52e_b_d.jpg
https://farm4.staticflickr.com/3076/2871264822_261dafa44c_o_d.jpg
https://farm1.staticflickr.com/7/6111406_30005bdae5_b_d.jpg
https://farm4.staticflickr.com/3928/15416585502_92d5e608c7_b_d.jpg
https://farm8.staticflickr.com/7046/6873109431_d3b5199f7d_b_d.jpg
https://farm4.staticflickr.com/3007/2835755867_c530b0e0c6_o_d.jpg
https://farm3.staticflickr.com/2788/4202444169_2079db9580_o_d.jpg
https://farm1.staticflickr.com/55/129619657_907b480c7c_b_d.jpg
https://farm5.staticflickr.com/4046/4368269562_b3e05e3f06_b_d.jpg
https://farm8.staticflickr.com/7344/12137775834_d0cecc5004_k_d.jpg
https://farm5.staticflickr.com/4073/4895191036_1cb9b58d75_b_d.jpg
https://farm4.staticflickr.com/3144/3025249284_b77dec2d29_o_d.jpg
https://www.flickr.com/photos/avardwoolaver/7137096221
R E F E R E N C E S

Weitere ähnliche Inhalte

Andere mochten auch

advantages and disadvanteges of computer
advantages and disadvanteges  of computeradvantages and disadvanteges  of computer
advantages and disadvanteges of computerJay-R Diacamos
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and ReliabilityMaury Martinez
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnairesVenkitachalam R
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliabilitysongoten77
 
ADVANTAGES AND DIS-ADVANTAGES OF COMPUTER
ADVANTAGES AND DIS-ADVANTAGES OF COMPUTERADVANTAGES AND DIS-ADVANTAGES OF COMPUTER
ADVANTAGES AND DIS-ADVANTAGES OF COMPUTERJester Paquera
 
Precision attachments
Precision attachmentsPrecision attachments
Precision attachmentsAmit Bhargav
 
multimedia element
multimedia elementmultimedia element
multimedia elementAZMAN KADIR
 
Benefits Of Computer Software
Benefits Of Computer SoftwareBenefits Of Computer Software
Benefits Of Computer Softwarepoonam.rwalia
 

Andere mochten auch (10)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
advantages and disadvanteges of computer
advantages and disadvanteges  of computeradvantages and disadvanteges  of computer
advantages and disadvanteges of computer
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and Reliability
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
ADVANTAGES AND DIS-ADVANTAGES OF COMPUTER
ADVANTAGES AND DIS-ADVANTAGES OF COMPUTERADVANTAGES AND DIS-ADVANTAGES OF COMPUTER
ADVANTAGES AND DIS-ADVANTAGES OF COMPUTER
 
Precision attachments
Precision attachmentsPrecision attachments
Precision attachments
 
multimedia element
multimedia elementmultimedia element
multimedia element
 
Benefits Of Computer Software
Benefits Of Computer SoftwareBenefits Of Computer Software
Benefits Of Computer Software
 

Ähnlich wie Consistency, Availability, Partition: Make Your Choice

Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at ScaleDavid Simons
 
Microservices With Spring Boot and Spring Cloud Netflix
Microservices With Spring Boot and Spring Cloud NetflixMicroservices With Spring Boot and Spring Cloud Netflix
Microservices With Spring Boot and Spring Cloud NetflixKrzysztof Sobkowiak
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLDavid Simons
 
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
 
Choosing the Right Database
Choosing the Right DatabaseChoosing the Right Database
Choosing the Right DatabaseDavid Simons
 
Svelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylishSvelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylishThe Software House
 
Introduction to Java Profiling
Introduction to Java ProfilingIntroduction to Java Profiling
Introduction to Java ProfilingJerry Yoakum
 
PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!Blanca Mancilla
 
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPphp[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPAdam Englander
 
Building out a Global Data delivery platform - the business and technical use...
Building out a Global Data delivery platform - the business and technical use...Building out a Global Data delivery platform - the business and technical use...
Building out a Global Data delivery platform - the business and technical use...AWS Chicago
 
Zend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHPZend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHPAdam Englander
 
Cassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingCassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingBrian Enochson
 
Amsterdam meetup at ING June 18, 2019
Amsterdam meetup at ING June 18, 2019Amsterdam meetup at ING June 18, 2019
Amsterdam meetup at ING June 18, 2019confluent
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactorcklosowski
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitTyler Treat
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 

Ähnlich wie Consistency, Availability, Partition: Make Your Choice (20)

Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
 
Meteor WWNRW Intro
Meteor WWNRW IntroMeteor WWNRW Intro
Meteor WWNRW Intro
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
Microservices With Spring Boot and Spring Cloud Netflix
Microservices With Spring Boot and Spring Cloud NetflixMicroservices With Spring Boot and Spring Cloud Netflix
Microservices With Spring Boot and Spring Cloud Netflix
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQL
 
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...
 
Choosing the Right Database
Choosing the Right DatabaseChoosing the Right Database
Choosing the Right Database
 
Svelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylishSvelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylish
 
Vikram emerging technologies
Vikram emerging technologiesVikram emerging technologies
Vikram emerging technologies
 
Introduction to Java Profiling
Introduction to Java ProfilingIntroduction to Java Profiling
Introduction to Java Profiling
 
PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!
 
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHPphp[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
php[world] 2016 - You Don’t Need Node.js - Async Programming in PHP
 
Building out a Global Data delivery platform - the business and technical use...
Building out a Global Data delivery platform - the business and technical use...Building out a Global Data delivery platform - the business and technical use...
Building out a Global Data delivery platform - the business and technical use...
 
Zend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHPZend con 2016 - Asynchronous Prorgamming in PHP
Zend con 2016 - Asynchronous Prorgamming in PHP
 
Everybody Lies
Everybody LiesEverybody Lies
Everybody Lies
 
Cassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingCassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data Modeling
 
Amsterdam meetup at ING June 18, 2019
Amsterdam meetup at ING June 18, 2019Amsterdam meetup at ING June 18, 2019
Amsterdam meetup at ING June 18, 2019
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactor
 
Probabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profitProbabilistic algorithms for fun and pseudorandom profit
Probabilistic algorithms for fun and pseudorandom profit
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 

Mehr von Andrea Giuliano

CQRS, ReactJS, Docker in a nutshell
CQRS, ReactJS, Docker in a nutshellCQRS, ReactJS, Docker in a nutshell
CQRS, ReactJS, Docker in a nutshellAndrea Giuliano
 
Go fast in a graph world
Go fast in a graph worldGo fast in a graph world
Go fast in a graph worldAndrea Giuliano
 
Concurrent test frameworks
Concurrent test frameworksConcurrent test frameworks
Concurrent test frameworksAndrea Giuliano
 
Index management in depth
Index management in depthIndex management in depth
Index management in depthAndrea Giuliano
 
Asynchronous data processing
Asynchronous data processingAsynchronous data processing
Asynchronous data processingAndrea Giuliano
 
Think horizontally @Codemotion
Think horizontally @CodemotionThink horizontally @Codemotion
Think horizontally @CodemotionAndrea Giuliano
 
Index management in shallow depth
Index management in shallow depthIndex management in shallow depth
Index management in shallow depthAndrea Giuliano
 
Everything you always wanted to know about forms* *but were afraid to ask
Everything you always wanted to know about forms* *but were afraid to askEverything you always wanted to know about forms* *but were afraid to ask
Everything you always wanted to know about forms* *but were afraid to askAndrea Giuliano
 

Mehr von Andrea Giuliano (10)

CQRS, ReactJS, Docker in a nutshell
CQRS, ReactJS, Docker in a nutshellCQRS, ReactJS, Docker in a nutshell
CQRS, ReactJS, Docker in a nutshell
 
Go fast in a graph world
Go fast in a graph worldGo fast in a graph world
Go fast in a graph world
 
Concurrent test frameworks
Concurrent test frameworksConcurrent test frameworks
Concurrent test frameworks
 
Index management in depth
Index management in depthIndex management in depth
Index management in depth
 
Asynchronous data processing
Asynchronous data processingAsynchronous data processing
Asynchronous data processing
 
Think horizontally @Codemotion
Think horizontally @CodemotionThink horizontally @Codemotion
Think horizontally @Codemotion
 
Index management in shallow depth
Index management in shallow depthIndex management in shallow depth
Index management in shallow depth
 
Everything you always wanted to know about forms* *but were afraid to ask
Everything you always wanted to know about forms* *but were afraid to askEverything you always wanted to know about forms* *but were afraid to ask
Everything you always wanted to know about forms* *but were afraid to ask
 
Stub you!
Stub you!Stub you!
Stub you!
 
Let's test!
Let's test!Let's test!
Let's test!
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Consistency, Availability, Partition: Make Your Choice

  • 1. M A K E Y O U R C H O I C E C O N S I S T E N C Y, A VA I L A B I L I T Y, PA R T I T I O N A n d re a G i u l i a n o @ b i t _ s h a r k
  • 2. D I S T R I B U T E D S Y S T E M S
  • 3. W H AT A D I S T R I B U T E D S Y S T E M I S “A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages”
  • 4. D I S T R I B U T E D S Y S T E M S E X A M P L E S
  • 5. D I S T R I B U T E D S Y S T E M S R E P L I C AT I O N
  • 6. R E P L I C AT E D S E R V I C E P R O P E R T I E S CONSISTENCY AVAILABILITY
  • 7. C O N S I S T E N C Y The result of operations will be predictable
  • 8. C O N S I S T E N C Y Strong consistency all replicas return the same value for the same object
  • 9. C O N S I S T E N C Y Strong consistency all replicas return the same value for the same object Weak consistency different replicas can return different values for the same object
  • 10. S T R O N G V S W E A K C O N S I S T E N C Y
  • 11. S T R O N G V S W E A K C O N S I S T E N C Y Strong consistency Atomic, consistent, isolated, durable database Weak consistency Basically Available Soft-state Eventual consistency database
  • 12. E X A M P L E C O N S I S T E N C Y put(price, 10)
  • 13. E X A M P L E C O N S I S T E N C Y get(price) price = 10
  • 14. AVA I L A B I L I T Y
  • 15. E X A M P L E A VA I L A B I L I T Y
  • 16. C O M M U N I C AT I O N
  • 17. PA R T I T I O N T O L E R A N C E continue to operate even in presence of partitions
  • 18. PA R T I T I O N T O L E R A N C E Network failure groups at each side of a faulty entity network (switch, backbone) Process failure system split in two groups: correct nodes and crashed node
  • 19. C A P T H E O R E M “Of three properties of shared-data systems (data consistency, system availability and tolerance to network partitions) only two can be achieved at any given moment in time.”
  • 20. T H E P R O O F C A P T H E O R E M put(price, 10) get(price) price = 0 price = 0 price = 0 price = 0 no response not consistent not available t2 t1 partition 1 partition 2
  • 21. CONSISTENCY AVAILABILITY PARTITION TOLERANCE ➡ distributed databases ➡ distributed locking ➡ majority protocol ➡ active/passive replication ➡ quorum-based systems BigTable C A P T H E O R E M I N P R A C T I C E
  • 22. C A P T H E O R E M CONSISTENCY AVAILABILITY PARTITION TOLERANCE ➡ web caches ➡ stateless systems ➡ DNS DynamoDB
  • 23. C A P T H E O R E M CONSISTENCY AVAILABILITY PARTITION TOLERANCE ➡ Single site database ➡ cluster databases ➡ ldap
  • 24. D Y N A M O
  • 25. R E Q U I R E M E N T S D Y N A M O “customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados.”
  • 26. R E Q U I R E M E N T S D Y N A M O “customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados.” ➡ reliable ➡ high scalable ➡ always available
  • 27. S I M P L E I N T E R FA C E D Y N A M O get(key) returns the object associated with the key and returns a single object or a list of objects with conflicting versions along with a context. put(key, context, object) determines where the replicas of the object should be placed based on the associated key. The context includes information such as the version of the object.
  • 28. R E P L I C AT I O N : T H E C H O I C E D Y N A M O Synchronous replica coordination ‣ strong consistency ‣ availability tradeoff Optimistic replication technique ‣ high availability ‣ conflicts probability
  • 29. C O N F L I C T S : W H E N D Y N A M O At write time ‣ writes rejection probability At read time ‣ “always writable” datastore
  • 30. C O N F L I C T S : W H O D Y N A M O The data store ‣ e.g. “last write win” policy The application ‣ resolution as implementation detail
  • 31. A R I N G T O R U L E T H E M A L L D Y N A M O
  • 32. PA R T I T I O N I N G : T H E R I N G D Y N A M O A B C DE F G DATA hash
  • 33. R E P L I C AT I O N D Y N A M O A B C DE F G N = 3 D will store keys in the range (A, B], (B, C], (C, D] DATA hash
  • 34. D ATA V E R S I O N I N G D Y N A M O put() may return before the update has been propagated to all replicas. get() subsequent get() may return an object that does not have the latest update
  • 35. R E C O N C I L I AT I O N D Y N A M O
  • 36. R E C O N C I L I AT I O N D Y N A M O Syntactic reconciliation ‣ new version subsumes the previous Semantic reconciliation ‣ conflicting versions of the same object
  • 37. V E C T O R C L O C K D Y N A M O
  • 38. V E C T O R C L O C K D Y N A M O Definition ‣ list of (node, counter) pairs
  • 39. V E C T O R C L O C K D Y N A M O Definition ‣ list of (node, counter) pairs D1 [Sx,1] write handled by Sx
  • 40. V E C T O R C L O C K D Y N A M O Definition ‣ list of (node, counter) pairs D1 [Sx,1] D2 [Sx,2] write handled by Sx write handled by Sx
  • 41. V E C T O R C L O C K D Y N A M O Definition ‣ list of (node, counter) pairs D1 [Sx,1] D2 [Sx,2] D3 [Sx,2], [Sy,1] write handled by Sx write handled by Sx handled by Sywrite
  • 42. V E C T O R C L O C K D Y N A M O Definition ‣ list of (node, counter) pairs D1 [Sx,1] D2 [Sx,2] D3 [Sx,2], [Sy,1] D4 [Sx,2], [Sz,1] write handled by Sx write handled by Sx write handled by Sy write handled by Sz
  • 43. V E C T O R C L O C K D Y N A M O Definition ‣ list of (node, counter) pairs D1 [Sx,1] D2 [Sx,2] D3 [Sx,2], [Sy,1] D4 [Sx,2], [Sz,1] D5 [Sx,3], [Sy,1], [Sz,1] write handled by Sx write handled by Sx write handled by Sy write handled by Sz reconciled and written by Sx
  • 44. P U T ( ) A N D G E T ( ) D Y N A M O R ‣ minimum number of nodes that must partecipate in a read operation. W ‣ minimum number of nodes that must participate in a successful write operation
  • 45. P U T ( ) A N D G E T ( ) D Y N A M O put() ‣ the coordinator generates the vector clock for the new version and writes the new version locally ‣ the new version is sent to N nodes ‣ the write is successful if W-1 nodes respond get() ‣ the coordinator requests all existing versions of data ‣ the coordinator waits for R responses before returning the result ‣ the coordinator returns all the version causally unrelated ‣ the divergent versions are reconciled and written back
  • 46. S L O P P Y Q U O R U M D Y N A M O A B C DE F G N = 3
  • 47. W H Y I S A P ? D Y N A M O ‣ requests served even if some replicas are not available ‣ if some node is down the write is stored to another node ‣ consistency conflicts resolved at read time or in the background ‣ eventually, all the replicas will converge ‣ concurrent read/write operation can make distinct clients see distinct versions of the same key
  • 48. B I G TA B L E
  • 49. R E Q U I R E M E N T S G O O G L E B I G TA B L E ‣ scale to petabyte of data ‣ thousand of machines ‣ high availability ‣ high performance
  • 50. D ATA M O D E L G O O G L E B I G TA B L E ‣ sparse, distributed, persistent multi-dimensional sorted map (row: string, column: string, time: int64) string
  • 51. R O W S G O O G L E B I G TA B L E ‣ arbitrary strings ‣ read/write operations are atomic ‣ data is maintained in lexicographic order by row key ‣ each row range is called a tablet maps.google.com com.google.maps
  • 52. C O L U M N S G O O G L E B I G TA B L E ‣ columns keys are grouped into sets: column families ‣ a column family must be created before data can be stored under any column key in that family ‣ column key named as family:qualifier ‣ access control and both disk and memory accounting are performed at the column-family level
  • 53. T I M E S TA M P S G O O G L E B I G TA B L E C O N T E N T S : c o m . e x a m p l e < h t m l > … < h t m l > … t 1 t 2
  • 54. D ATA M O D E L : E X A M P L E G O O G L E B I G TA B L E L A N G U A G E : C O N T E N T S : A N C H O R : C N N S I . C O M A N C H R : M Y L O O K . C A c o m . e x a m p l e e n < ! D O C T Y P E h t m l P U B L I C … c o m . c n n . w w w e n < ! D O C T Y P E h t m l P U B L I C … “ c n n " “ c n n . c o m ” c o m . c n n . w w w / f o o e n < ! D O C T Y P E h t m l P U B L I C … column familiesrow keys sortedrows
  • 55. D I F F E R E N C E S W I T H R D B M S G O O G L E B I G TA B L E R D B M S B I G TA B L E q u e r y l a n g u a g e s p e c i f i c a p i j o i n s n o re f e re n t i a l i n t e g r i t y e x p l i c i t s o r t i n g s o r t i n g d e f i n e d a p r i o r i i n t h e c o l u m n f a m i l y
  • 56. A R C H I T E C T U R E G O O G L E B I G TA B L E Google File System (GFS) ‣ store data files and logs Google SSTable ‣ store BigTable data Chubby ‣ high-available distributed lock service
  • 57. C O M P O N E N T S G O O G L E B I G TA B L E library ‣ linked into every client one master server ‣ assigning tablets to tablet server ‣ detecting the addition and expiration of tablet servers ‣ balancing tablet-server load ‣ garbaging collection of files in GFS ‣ handling schema changes many tablet servers ‣ manages 10 to 100 tablets ‣ handles read and write requests to the tablets ‣ splits tablets that have grown too large
  • 58. C O M P O N E N T S G O O G L E B I G TA B L E Master server Client Tablet server Tablet server Tablet server Metadata read/write
  • 59. S TA R T U P A N D G R O W T H G O O G L E B I G TA B L E Chubby file Root tablet 1st Metadata tablet other metadata tablets UserTableN UserTable1 … … … … … … … … … … …
  • 60. TA B L E T A S S I G N M E N T G O O G L E B I G TA B L E tablet server ‣ when started, creates and acquires a lock in Chubby master ‣ grabs a unique master lock in Chubby ‣ scans Chubby to find live tablet servers ‣ asks each tablet server to discover its tablets ‣ scans the Metadata table to learn the full set of tablets ‣ builds a set of unassigned tablet server, for future tablet assignment
  • 61. W H Y I S C P ? G O O G L E B I G TA B L E ‣ master death cause services no longer functioning ‣ tablet server death cause tablets unavailable ‣ Chubby death cause BigTable inability to execute synchronization operations and to serve client requests ‣ Google File System is a CP system
  • 62. $ W H O A M I Andrea Giuliano @bit_shark www.andreagiuliano.it
  • 64. G. DeCandia et al. “Dynamo: Amazon’s Highly Available Key-value Store” F. Chang et al. “Bigtable: A Distributed Storage System for Structured Data” Assets: https://farm1.staticflickr.com/41/86744006_0026864df8_b_d.jpg https://farm9.staticflickr.com/8305/7883634326_4e51a1a320_b_d.jpg https://farm5.staticflickr.com/4145/4958650244_65b2eddffc_b_d.jpg https://farm4.staticflickr.com/3677/10023456065_e54212c52e_b_d.jpg https://farm4.staticflickr.com/3076/2871264822_261dafa44c_o_d.jpg https://farm1.staticflickr.com/7/6111406_30005bdae5_b_d.jpg https://farm4.staticflickr.com/3928/15416585502_92d5e608c7_b_d.jpg https://farm8.staticflickr.com/7046/6873109431_d3b5199f7d_b_d.jpg https://farm4.staticflickr.com/3007/2835755867_c530b0e0c6_o_d.jpg https://farm3.staticflickr.com/2788/4202444169_2079db9580_o_d.jpg https://farm1.staticflickr.com/55/129619657_907b480c7c_b_d.jpg https://farm5.staticflickr.com/4046/4368269562_b3e05e3f06_b_d.jpg https://farm8.staticflickr.com/7344/12137775834_d0cecc5004_k_d.jpg https://farm5.staticflickr.com/4073/4895191036_1cb9b58d75_b_d.jpg https://farm4.staticflickr.com/3144/3025249284_b77dec2d29_o_d.jpg https://www.flickr.com/photos/avardwoolaver/7137096221 R E F E R E N C E S