SlideShare a Scribd company logo
1 of 46
@supercoco9@supercoco9#distributed-devoxx#distributed-devoxx
everything you always wanted to know about
Highly Available Distributed Databases
everything you always wanted to know about
Highly Available Distributed Databases
Javier Ramirez
@supercoco9
https://teowaki.com
@supercoco9@supercoco9#distributed-devoxx#distributed-devoxx
Everything you always wanted to know about
highly available distributed databases
• Javier Ramirez:
–20 years in web development (C/Java/Ruby/Python)
–6 years in NoSQL (Redis, Mongo, Neo4j)
–4 years in Cloud (AWS, GCP)
–3 years in Big Data (BigQuery, Spark, Apache Beam/Dataflow)
–Google Developer Expert and Authorised trainer on the Google Cloud
Platform
My projects:
• https://teowaki.com
• https://aprendoaprogramar.com
IBM Data Center
in Japan during
and after
an earthquake
A squirrel did take out half of our
Santa Clara data centre two years back
Mike Christian, Yahoo Director of Engineering
Hayastan
Shakarian
a.k.a.
The Spade
Hacker
Cut-off
Armenia
from
the Internet
for almost
one day*
* By accident, while scavenging copper
Some data center outages reported in 2015:
* Amazon Web Services
* Apple iCloud
* Microsoft Azure
* IBM Softlayer
* Google Cloud Platform
* And of course every hosting with scheduled
maintenance operations (rackspace, digital
ocean, ovh...)
Complex systems can and will fail
You better distribute your data, or else...
Also, distributed databases can perform
better and run on cheaper hardware than
centralised ones
Most basic level:
Backup
And keep the copy
on a separate data centre*
* Vodafone once lost one year
of data on a fire because of this
Next
Level:
Replicas
(master-slave)
A main server sends a binary log of changes to one
or more replicas
* Also known as Write Ahead Log or WAL
Master-slave is good but
* All the operations are replicated on all slaves
* Good scalability on reads, but not on writes
* Cannot function during a network partition
* Single point of failure (SPOF)
Next Level:
Multi-Master Cluster
(master-master)
Every server can accept reads or writes, and send
its binary log to all the other servers
* also referred as update-anywhere
Multi-master is great, but:
* All the operations are replicated on all masters.
* When synchronous, high latency (Consistency achieved via locks,
coordination and serializable transactions)
* When asynchronous, typically poor conflict resolution
*Hard to scale up or down automatically
The system I want:
* Always ON, even with network partitions
* Scales out both reads and writes. Doesn't need to keep all the data in all the
servers
* Runs on cheap commodity diverse hardware
* Runs locally to my users (low latency)
* Grows/shrinks elastically and survives server failures
Then you need to let go of
many convenient things you take for granted in
databases
CAP Theorem
Everything is a trade-off
Next Level:
Distributed Data stores
Distributed DB design decisions
* data (keys) distribution
* data replication/durability
* conflict resolution
* membership
* status of the other peers
* operation under partitions and
during unavailability of peers
* incremental scalability
Data distribution
Consistent hashing based on the key
Usually implies operations work on single keys. Some
solutions, like Redis, allow the clients to group related
keys consistently. Some solutions, like BigTable, allow to
collocate data by group or family.
Queries are frequently limited to query by key or by
secondary indexes (say bye to the power of SQL)
Data distribution. The Ring
Data Replication
How many replicas of each? Typically at least 3, so in case of
conflicts there can be a quorum
Often, the distribution of keys is done taking into account the
physical location of nodes, so replicas live in different racks or
different datacentres
Replication: durability
If we want to have a durable system, we need at least to make sure the data is replicated in
at least 2 nodes before confirming the transaction to the client.
This is called the write quorum, and in many cases it can be configured individually.
Not all data are equally important, and not all systems have the same R/W ratio.
Systems can be configured to be “always writable” or “always readable”.
Conflicts
I see a record that I thought was deleted
I created a record but cannot see it
I have different values in two nodes
Something should be unique, but it's not
No-Conflict strategies
Quorum-based systems: Paxos, RAFT.
Require coordination of processes with continuous elections
of leaders and consensus.
Worse latency
Last Write Wins (LWW):
Doesn't require coordination.
Good latency
Conflict resolution
Can be done at Write time or at Read
time.
Vector clocks
* Don't need to sync time
* There are several
versions of a same item
* Need consolidation
to prune size
* Usually client needs to
fix the conflict and update
membership
gossip
infection-like
protocols
Gossip
A centralised server is a SPOF
Communicating state with each node is very time consuming
and doesn't support partitions
Gossip protocols communicate pairs of random nodes at
regular frequent intervals and exchange information.
Based on that information exchange, a new status is agreed
Gossip example
Incremental scalability
When a new node enters the system, the rest of nodes notice
via gossip.
The node claims a partition of the ring and asks
the replicas of the same partition to send data to it.
When the rest of nodes decide (after gossiping) that a node
has left the system and it's not a temporary failure, the data
assigned to the partitions of that node is copied to more
replicas to reach the N copies.
All the process is automatic and transparent.
Operation under partition:
Hinted Handoff
On a network partition, it can happen that we have less than
W nodes of the same segment in the current partition.
In this case, the data is replicated to W nodes, even if that
node wasn't responsible for the segment. The data is kept
with a “hint”, and stored in a special area.
Periodically, the server will try to contact the original
destination and will “hand off” the data to it.
Anti Entropy
A system with handoffs can be chaotic and not very
effective
Anti Entropy is implemented to make sure hints are
handed off or synchronized to other nodes
Anti entropy is usually achieved by using Merkle Trees, a
hash of hashes structure very efficient to compare
differences between nodes
All this features mean your clients need to
be aware of some internals of the system
Clients must
* Know which close nodes are responsible for each
segment of the ring, and hash locally**
* Be aware of when nodes become available or
unavailable**
* Decide on durability
* Handle conflict resolution, unless under LWW
** some solutions offer a load balancer proxy to abstract the client
from that complexity, but trading off latency
now you know how it works
* A system that always can work, even with network partitions
* That scales out both reads and writes
* On cheap commodity diverse hardware
* Running locally to your users (low latency)
* Can grow/shrink elastically and survive server failures
Extra level: Build your
own distributed database
Netflix dynomite, built in Java
Uber ringpop, built in JavaScript
Not
Scared
Of You
Anymore
@YourTwitterHandle#DVXFR14{session hashtag} @supercoco9@supercoco9#distributed-devoxx#distributed-devoxx
Q
&
A
Find related links at
http://bit.ly/teowaki-distributed-systems(https://teams.teowaki.com/teams/javier-community/link-categories/distributed-systems)
Cheers!
need help with cloud,
distributed systems or big data?
https://teowaki.com

More Related Content

What's hot

Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkTodd Fritz
 
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with RedisRedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with RedisRedis Labs
 
Introduction to failover clustering with sql server
Introduction to failover clustering with sql serverIntroduction to failover clustering with sql server
Introduction to failover clustering with sql serverEduardo Castro
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenParticular Software
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownDataStax
 
Cloud OS development
Cloud OS developmentCloud OS development
Cloud OS developmentSean Chang
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Isaac Chiang
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architectureMatteo Merli
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistencyseldo
 
Running an openstack instance
Running an openstack instanceRunning an openstack instance
Running an openstack instancezokahn
 
How Clarifai uses NATS and Kubernetes for Machine Learning
How Clarifai uses NATS and Kubernetes for Machine LearningHow Clarifai uses NATS and Kubernetes for Machine Learning
How Clarifai uses NATS and Kubernetes for Machine LearningApcera
 
RedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ Memory
RedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ MemoryRedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ Memory
RedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ MemoryRedis Labs
 
Taking the open cloud to 11
Taking the open cloud to 11Taking the open cloud to 11
Taking the open cloud to 11Joe Brockmeier
 
Architecting for the cloud cloud providers
Architecting for the cloud cloud providersArchitecting for the cloud cloud providers
Architecting for the cloud cloud providersLen Bass
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity securityLen Bass
 
NATS: Control Flow for Distributed Systems
NATS: Control Flow for Distributed SystemsNATS: Control Flow for Distributed Systems
NATS: Control Flow for Distributed SystemsApcera
 
Seminar on cloud computing by Prashant Gupta
Seminar on cloud computing by Prashant GuptaSeminar on cloud computing by Prashant Gupta
Seminar on cloud computing by Prashant GuptaPrashant Gupta
 

What's hot (20)

Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, SparkBuilding Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
Building Reactive Fast Data & the Data Lake with Akka, Kafka, Spark
 
Hazelcast 101
Hazelcast 101Hazelcast 101
Hazelcast 101
 
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with RedisRedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
RedisConf18 - Techniques for Synchronizing In-Memory Caches with Redis
 
Introduction to failover clustering with sql server
Introduction to failover clustering with sql serverIntroduction to failover clustering with sql server
Introduction to failover clustering with sql server
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
Cloud OS development
Cloud OS developmentCloud OS development
Cloud OS development
 
Cloud stack design camp on jun 15
Cloud stack design camp on jun 15Cloud stack design camp on jun 15
Cloud stack design camp on jun 15
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
Running an openstack instance
Running an openstack instanceRunning an openstack instance
Running an openstack instance
 
How Clarifai uses NATS and Kubernetes for Machine Learning
How Clarifai uses NATS and Kubernetes for Machine LearningHow Clarifai uses NATS and Kubernetes for Machine Learning
How Clarifai uses NATS and Kubernetes for Machine Learning
 
RedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ Memory
RedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ MemoryRedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ Memory
RedisConf18 - Re-architecting Redis-on-Flash with Intel 3DX Point™ Memory
 
Taking the open cloud to 11
Taking the open cloud to 11Taking the open cloud to 11
Taking the open cloud to 11
 
Architecting for the cloud cloud providers
Architecting for the cloud cloud providersArchitecting for the cloud cloud providers
Architecting for the cloud cloud providers
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
NATS: Control Flow for Distributed Systems
NATS: Control Flow for Distributed SystemsNATS: Control Flow for Distributed Systems
NATS: Control Flow for Distributed Systems
 
Txlf2012
Txlf2012Txlf2012
Txlf2012
 
Seminar on cloud computing by Prashant Gupta
Seminar on cloud computing by Prashant GuptaSeminar on cloud computing by Prashant Gupta
Seminar on cloud computing by Prashant Gupta
 

Viewers also liked

Challenges of industry formation
Challenges of industry formationChallenges of industry formation
Challenges of industry formationJeffrey Funk
 
Jp purchasing consultancy 2013
Jp purchasing consultancy 2013Jp purchasing consultancy 2013
Jp purchasing consultancy 2013Jan Piet Jacobi
 
Politica Y Democracia
Politica Y DemocraciaPolitica Y Democracia
Politica Y DemocraciaContinuum HQ
 
1. qué es un amigo
1. qué es un amigo1. qué es un amigo
1. qué es un amigo7-2
 
News von der Microsoft Ignite Zürich Juni 2015
News von der Microsoft Ignite Zürich Juni 2015News von der Microsoft Ignite Zürich Juni 2015
News von der Microsoft Ignite Zürich Juni 2015David Schneider
 
La educación tradicional terminada
La educación tradicional terminadaLa educación tradicional terminada
La educación tradicional terminadaJorge Peredo Chargoy
 
Loosely Coupled Complexity - Unleash the power of your Domain Model with Comm...
Loosely Coupled Complexity - Unleash the power of your Domain Model with Comm...Loosely Coupled Complexity - Unleash the power of your Domain Model with Comm...
Loosely Coupled Complexity - Unleash the power of your Domain Model with Comm...Alberto Brandolini
 
Presentación tercera semana transporte y compras
Presentación tercera semana transporte y comprasPresentación tercera semana transporte y compras
Presentación tercera semana transporte y comprasIesatecVirtual
 
Celulas NK y la Inmunidad Innata
Celulas NK y la Inmunidad InnataCelulas NK y la Inmunidad Innata
Celulas NK y la Inmunidad InnataMagy Flores
 
Virtual memory
Virtual memoryVirtual memory
Virtual memoryvatsaanadi
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Salah Amean
 

Viewers also liked (20)

Francisco lopez reyes01
Francisco lopez reyes01Francisco lopez reyes01
Francisco lopez reyes01
 
Challenges of industry formation
Challenges of industry formationChallenges of industry formation
Challenges of industry formation
 
Jp purchasing consultancy 2013
Jp purchasing consultancy 2013Jp purchasing consultancy 2013
Jp purchasing consultancy 2013
 
Politica Y Democracia
Politica Y DemocraciaPolitica Y Democracia
Politica Y Democracia
 
1. qué es un amigo
1. qué es un amigo1. qué es un amigo
1. qué es un amigo
 
Referat schweizerischer samariterbund
Referat schweizerischer samariterbundReferat schweizerischer samariterbund
Referat schweizerischer samariterbund
 
Logos-AT presentation
Logos-AT presentationLogos-AT presentation
Logos-AT presentation
 
TP8
TP8TP8
TP8
 
25 el-oido1
25 el-oido125 el-oido1
25 el-oido1
 
MEAVI MILAD KITAB
MEAVI MILAD KITAB MEAVI MILAD KITAB
MEAVI MILAD KITAB
 
Da silva
Da silvaDa silva
Da silva
 
News von der Microsoft Ignite Zürich Juni 2015
News von der Microsoft Ignite Zürich Juni 2015News von der Microsoft Ignite Zürich Juni 2015
News von der Microsoft Ignite Zürich Juni 2015
 
La educación tradicional terminada
La educación tradicional terminadaLa educación tradicional terminada
La educación tradicional terminada
 
Loosely Coupled Complexity - Unleash the power of your Domain Model with Comm...
Loosely Coupled Complexity - Unleash the power of your Domain Model with Comm...Loosely Coupled Complexity - Unleash the power of your Domain Model with Comm...
Loosely Coupled Complexity - Unleash the power of your Domain Model with Comm...
 
Presentación tercera semana transporte y compras
Presentación tercera semana transporte y comprasPresentación tercera semana transporte y compras
Presentación tercera semana transporte y compras
 
Celulas NK y la Inmunidad Innata
Celulas NK y la Inmunidad InnataCelulas NK y la Inmunidad Innata
Celulas NK y la Inmunidad Innata
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
 
Bad news messages
Bad news messagesBad news messages
Bad news messages
 
Planificacion inicial 1
Planificacion inicial 1Planificacion inicial 1
Planificacion inicial 1
 

Similar to Everything you always wanted to know about Distributed databases, at devoxx london, by javier ramirez, teowaki

Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Codemotion
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Data Consitency Patterns in Cloud Native Applications
Data Consitency Patterns in Cloud Native ApplicationsData Consitency Patterns in Cloud Native Applications
Data Consitency Patterns in Cloud Native ApplicationsRyan Knight
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxTekle12
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
Container independent failover framework
Container independent failover frameworkContainer independent failover framework
Container independent failover frameworktelestax
 
Container Independent failover framework - Mobicents Summit 2011
Container Independent failover framework - Mobicents Summit 2011Container Independent failover framework - Mobicents Summit 2011
Container Independent failover framework - Mobicents Summit 2011telestax
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptxlevichan1
 

Similar to Everything you always wanted to know about Distributed databases, at devoxx london, by javier ramirez, teowaki (20)

Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...Everything you always wanted to know about highly available distributed datab...
Everything you always wanted to know about highly available distributed datab...
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Data Consitency Patterns in Cloud Native Applications
Data Consitency Patterns in Cloud Native ApplicationsData Consitency Patterns in Cloud Native Applications
Data Consitency Patterns in Cloud Native Applications
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Chapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptxChapter Introductionn to distributed system .pptx
Chapter Introductionn to distributed system .pptx
 
Introduction
IntroductionIntroduction
Introduction
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Cloud storage
Cloud storageCloud storage
Cloud storage
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
test
testtest
test
 
HeartBeat
HeartBeatHeartBeat
HeartBeat
 
Container independent failover framework
Container independent failover frameworkContainer independent failover framework
Container independent failover framework
 
Container Independent failover framework - Mobicents Summit 2011
Container Independent failover framework - Mobicents Summit 2011Container Independent failover framework - Mobicents Summit 2011
Container Independent failover framework - Mobicents Summit 2011
 
MongoDB
MongoDBMongoDB
MongoDB
 
17-NoSQL.pptx
17-NoSQL.pptx17-NoSQL.pptx
17-NoSQL.pptx
 

More from javier ramirez

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfestjavier ramirez
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databasejavier ramirez
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBjavier ramirez
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Databasejavier ramirez
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728javier ramirez
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022javier ramirez
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...javier ramirez
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragónjavier ramirez
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessjavier ramirez
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloudjavier ramirez
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analyticsjavier ramirez
 
Getting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelineGetting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelinejavier ramirez
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Divejavier ramirez
 
Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)javier ramirez
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSjavier ramirez
 

More from javier ramirez (20)

¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest¿Se puede vivir del open source? T3chfest
¿Se puede vivir del open source? T3chfest
 
QuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series databaseQuestDB: The building blocks of a fast open-source time-series database
QuestDB: The building blocks of a fast open-source time-series database
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Deduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDBDeduplicating and analysing time-series data with Apache Beam and QuestDB
Deduplicating and analysing time-series data with Apache Beam and QuestDB
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Your Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic DatabaseYour Timestamps Deserve Better than a Generic Database
Your Timestamps Deserve Better than a Generic Database
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728QuestDB-Community-Call-20220728
QuestDB-Community-Call-20220728
 
Processing and analysing streaming data with Python. Pycon Italy 2022
Processing and analysing streaming  data with Python. Pycon Italy 2022Processing and analysing streaming  data with Python. Pycon Italy 2022
Processing and analysing streaming data with Python. Pycon Italy 2022
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
Servicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en AragónServicios e infraestructura de AWS y la próxima región en Aragón
Servicios e infraestructura de AWS y la próxima región en Aragón
 
Primeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverlessPrimeros pasos en desarrollo serverless
Primeros pasos en desarrollo serverless
 
How AWS is reinventing the cloud
How AWS is reinventing the cloudHow AWS is reinventing the cloud
How AWS is reinventing the cloud
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
 
Getting started with streaming analytics
Getting started with streaming analyticsGetting started with streaming analytics
Getting started with streaming analytics
 
Getting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipelineGetting started with streaming analytics: Setting up a pipeline
Getting started with streaming analytics: Setting up a pipeline
 
Getting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep DiveGetting started with streaming analytics: Deep Dive
Getting started with streaming analytics: Deep Dive
 
Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)Getting started with streaming analytics: streaming basics (1 of 3)
Getting started with streaming analytics: streaming basics (1 of 3)
 
Monitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWSMonitorización de seguridad y detección de amenazas con AWS
Monitorización de seguridad y detección de amenazas con AWS
 

Recently uploaded

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Everything you always wanted to know about Distributed databases, at devoxx london, by javier ramirez, teowaki

  • 1. @supercoco9@supercoco9#distributed-devoxx#distributed-devoxx everything you always wanted to know about Highly Available Distributed Databases everything you always wanted to know about Highly Available Distributed Databases Javier Ramirez @supercoco9 https://teowaki.com
  • 2. @supercoco9@supercoco9#distributed-devoxx#distributed-devoxx Everything you always wanted to know about highly available distributed databases • Javier Ramirez: –20 years in web development (C/Java/Ruby/Python) –6 years in NoSQL (Redis, Mongo, Neo4j) –4 years in Cloud (AWS, GCP) –3 years in Big Data (BigQuery, Spark, Apache Beam/Dataflow) –Google Developer Expert and Authorised trainer on the Google Cloud Platform My projects: • https://teowaki.com • https://aprendoaprogramar.com
  • 3. IBM Data Center in Japan during and after an earthquake
  • 4.
  • 5.
  • 6. A squirrel did take out half of our Santa Clara data centre two years back Mike Christian, Yahoo Director of Engineering
  • 7.
  • 9. Cut-off Armenia from the Internet for almost one day* * By accident, while scavenging copper
  • 10. Some data center outages reported in 2015: * Amazon Web Services * Apple iCloud * Microsoft Azure * IBM Softlayer * Google Cloud Platform * And of course every hosting with scheduled maintenance operations (rackspace, digital ocean, ovh...)
  • 11. Complex systems can and will fail
  • 12. You better distribute your data, or else... Also, distributed databases can perform better and run on cheaper hardware than centralised ones
  • 14. And keep the copy on a separate data centre* * Vodafone once lost one year of data on a fire because of this
  • 16. A main server sends a binary log of changes to one or more replicas * Also known as Write Ahead Log or WAL
  • 17. Master-slave is good but * All the operations are replicated on all slaves * Good scalability on reads, but not on writes * Cannot function during a network partition * Single point of failure (SPOF)
  • 19. Every server can accept reads or writes, and send its binary log to all the other servers * also referred as update-anywhere
  • 20. Multi-master is great, but: * All the operations are replicated on all masters. * When synchronous, high latency (Consistency achieved via locks, coordination and serializable transactions) * When asynchronous, typically poor conflict resolution *Hard to scale up or down automatically
  • 21. The system I want: * Always ON, even with network partitions * Scales out both reads and writes. Doesn't need to keep all the data in all the servers * Runs on cheap commodity diverse hardware * Runs locally to my users (low latency) * Grows/shrinks elastically and survives server failures
  • 22. Then you need to let go of many convenient things you take for granted in databases
  • 25.
  • 26. Distributed DB design decisions * data (keys) distribution * data replication/durability * conflict resolution * membership * status of the other peers * operation under partitions and during unavailability of peers * incremental scalability
  • 27. Data distribution Consistent hashing based on the key Usually implies operations work on single keys. Some solutions, like Redis, allow the clients to group related keys consistently. Some solutions, like BigTable, allow to collocate data by group or family. Queries are frequently limited to query by key or by secondary indexes (say bye to the power of SQL)
  • 29. Data Replication How many replicas of each? Typically at least 3, so in case of conflicts there can be a quorum Often, the distribution of keys is done taking into account the physical location of nodes, so replicas live in different racks or different datacentres
  • 30. Replication: durability If we want to have a durable system, we need at least to make sure the data is replicated in at least 2 nodes before confirming the transaction to the client. This is called the write quorum, and in many cases it can be configured individually. Not all data are equally important, and not all systems have the same R/W ratio. Systems can be configured to be “always writable” or “always readable”.
  • 31. Conflicts I see a record that I thought was deleted I created a record but cannot see it I have different values in two nodes Something should be unique, but it's not
  • 32. No-Conflict strategies Quorum-based systems: Paxos, RAFT. Require coordination of processes with continuous elections of leaders and consensus. Worse latency Last Write Wins (LWW): Doesn't require coordination. Good latency
  • 33. Conflict resolution Can be done at Write time or at Read time.
  • 34. Vector clocks * Don't need to sync time * There are several versions of a same item * Need consolidation to prune size * Usually client needs to fix the conflict and update
  • 36. Gossip A centralised server is a SPOF Communicating state with each node is very time consuming and doesn't support partitions Gossip protocols communicate pairs of random nodes at regular frequent intervals and exchange information. Based on that information exchange, a new status is agreed
  • 38. Incremental scalability When a new node enters the system, the rest of nodes notice via gossip. The node claims a partition of the ring and asks the replicas of the same partition to send data to it. When the rest of nodes decide (after gossiping) that a node has left the system and it's not a temporary failure, the data assigned to the partitions of that node is copied to more replicas to reach the N copies. All the process is automatic and transparent.
  • 39. Operation under partition: Hinted Handoff On a network partition, it can happen that we have less than W nodes of the same segment in the current partition. In this case, the data is replicated to W nodes, even if that node wasn't responsible for the segment. The data is kept with a “hint”, and stored in a special area. Periodically, the server will try to contact the original destination and will “hand off” the data to it.
  • 40. Anti Entropy A system with handoffs can be chaotic and not very effective Anti Entropy is implemented to make sure hints are handed off or synchronized to other nodes Anti entropy is usually achieved by using Merkle Trees, a hash of hashes structure very efficient to compare differences between nodes
  • 41. All this features mean your clients need to be aware of some internals of the system
  • 42. Clients must * Know which close nodes are responsible for each segment of the ring, and hash locally** * Be aware of when nodes become available or unavailable** * Decide on durability * Handle conflict resolution, unless under LWW ** some solutions offer a load balancer proxy to abstract the client from that complexity, but trading off latency
  • 43. now you know how it works * A system that always can work, even with network partitions * That scales out both reads and writes * On cheap commodity diverse hardware * Running locally to your users (low latency) * Can grow/shrink elastically and survive server failures
  • 44. Extra level: Build your own distributed database Netflix dynomite, built in Java Uber ringpop, built in JavaScript
  • 46. @YourTwitterHandle#DVXFR14{session hashtag} @supercoco9@supercoco9#distributed-devoxx#distributed-devoxx Q & A Find related links at http://bit.ly/teowaki-distributed-systems(https://teams.teowaki.com/teams/javier-community/link-categories/distributed-systems) Cheers! need help with cloud, distributed systems or big data? https://teowaki.com

Editor's Notes

  1. 2011
  2. in 2011
  3. A squirrel did take out half of our Santa Clara data centre two years back Mike Christian, Yahoo Director of Engineering 2012, at a conference that's the reason why google wraps submarine fibre cables in kevlar, so shark bites won't damage them
  4. rackspace was taken down when a truck driver had an accident during a delivery to the data centre
  5. hurricanes, truck drivers, sharks eating trans oceanic cable, and of course electronic and mechanical failures, human errors, and malicious attacks
  6. Starbucks customers couldn't buy any coffees a whole morning Tinder users lost temporarily their matches for a few hours Twilio did good Netflix had a few problems in the past, but now they are awesome
  7. of course this doesn't give you high availability, but at least prevent from data lost to an extent (depending on your backup practices)
  8. Frequently used not only on relational databases, but on every kind of distributed system. Redis when configured as master-slave works in a very similar way too
  9. So the more writes you have, the busiest all of your servers will be When I say “write” I mean updates and deletions too Recovery is not fully automatic and, at best, requires some extra coordination
  10. OrientDB is quite good, so I put it into distributed databases The more writes you have, the more load in the whole system Also, the usual case is all the data lives on all the servers, and that simply doesn't scale netflix several thousands cassandra nodes facebook: several tenths of thousands nodes for analytics
  11. Cheap hardware: important to be heterogeneus! or else it's very difficult to support netflix several thousands cassandra nodes facebook: several tenths of thousands nodes for analytics
  12. Forget about: flexible queries, table design where everything can be queried no matter what (even if slow) transactions strong consistency delegating all the complexity to the servers
  13. Eventually consistent Eric Brewer
  14. you know some of the names on relational, traditional, non distributed databases mysql mariadb oracle postgresql sql server ibm db2 sqlite SAP HANA
  15. The Amazon Dynamo paper and the Google BigTable paper are behind many of the concepts of modern distributed databases, together with the work of Leslie Lamport, the creator of Latex and a member of Microsoft Research There is a new generation of systems based on the Google Spanner paper
  16. some systems allow to define virtual nodes, so a physical node contains in reality several nodes that's one way of allow heterogeneity of the system
  17. Parameters W and R can also be configured to LOCAL_QUORUM, so they need agreement only from local nodes and not across datacenters by combining global quorum for reads and local quorum for reads, netflix gets 500 ms from the time it writes on one region until it can be read from another, while keeping very fast reads
  18. usually due to load balancing, concurrency, or network partitions
  19. riak: crdt
  20. systems based in gossip for membership and liveliness can be extended adding extra monitoring information. This solution, for example, is used at CERN to monitor grids of thousands of nodes and monitor memory/cpu usage Amazon dynamo uses gossip to send ring distribution information, apart from using it to check disconnected/failed/new nodes
  21. Adding more than one node at a time is tricky
  22. Cheap hardware: important to be heterogeneus! netflix several thousands cassandra nodes facebook: several tenths of thousands nodes for analytics
  23. Netflix performance: Chaos Monkeys and 500 ms between recovery across regions Of course you can always read the source of any open source solution, but it's easier to plug a generic ring/membership and extend it