Replication and Consistency in Cassandra... What Does it All Mean? (Christopher Bradford, DataStax) | C* Summit 2016

•Als PPTX, PDF herunterladen•

2 gefällt mir•5,345 views

Many users set the replication strategy on their keyspaces to NetworkTopologyStrategy and move on with modeling their data or developing the next big application. But what does that replication strategy really mean? Let's explore replication and consistency in Cassandra. How are replicas chosen? Where does node topology (location in a cluster) come into play? What can I expect when nodes are down I'm querying with a Consistency Level of local quorum? If a rack goes down can I still respond to quorum queries? These questions may be simple to test, but have nuances that should be understood. This talk will dive into these topics in a visual and technical manner. Seasoned Cassandra veterans and new users alike stand to gain knowledge about these critical Cassandra components. About the Speaker Christopher Bradford Solutions Architect, DataStax High performance drives Christopher Bradford. He has worked across various industries including the federal government, higher education, social news syndication, low latency HD video delivery and usability research. Chris combines application engineering principles and systems administration experience to design and implement performant systems. He has architected applications and systems to create highly available, fault tolerant, distributed services in a myriad environments.

Software

Christopher Bradford
Replication and consistency in Cassandra... What does it all mean?

Christopher Bradford
Solutions Architect with DataStax
Built the world’s smallest C* cluster
Twitter: @bradfordcp
GitHub: bradfordcp
© DataStax, All Rights Reserved. 3

CAP Theorem
© DataStax, All Rights Reserved. 5
Pick 2 of the 3
Consistency
Availability Partition Tolerance

CAP Theorem
© DataStax, All Rights Reserved. 6
Consistency
Every read receives
the most recent
write or an error
Consistency
Availability Partition Tolerance

CAP Theorem
© DataStax, All Rights Reserved. 7
Every request
receives a response
Consistency
Availability Partition Tolerance
Availability

CAP Theorem
© DataStax, All Rights Reserved. 8
Partition Tolerance
The system continues
to operate despite
arbitrary partitioning
Consistency
Availability Partition Tolerance

CAP Theorem Evolved
© DataStax, All Rights Reserved. 9
The modern CAP goal should be to maximize
combinations of consistency and availability that
make sense for the specific application. Such an
approach incorporates plans for operation during
a partition and for recovery afterward, thus
helping designers think about CAP beyond its
historically perceived limitations.
- Eric Brewer
C
A P

CAP Theorem
© DataStax, All Rights Reserved. 10
Cassandra’s View
AP – Availability &
Partition tolerance above
all else.
Consistency
Availability Partition Tolerance

Replication
Availability & Partition Tolerance

Replication
© DataStax, All Rights Reserved. 12
Client
Coordinator
Replica
Replica
Replica
Write

Replication
© DataStax, All Rights Reserved. 13
Client
Coordinator
Replica
Replica
Replica
Write
+1 Hint

Replication
© DataStax, All Rights Reserved. 14
Client
Coordinator
Replica
Replica
Replica
Read

$Configuring Replication Replication is defined at the keyspace level. © DataStax, All Rights Reserved. 15 Strategy Parameters CREATE KEYSPACE cassandra_summit WITH REPLICATION = { ‘class’: ‘SimpleStrategy’, ‘replication_factor’: 3 };$

1 Simple Strategy
2 Network Topology Strategy
Replication Strategies
© DataStax, All Rights Reserved. 16

Simple Strategy
© DataStax, All Rights Reserved. 17
Client
Coordinator
Replica
Replica
Replica
Request
Class: SimpleStrategy
Parameters:
• replication_factor

Simple Strategy
© DataStax, All Rights Reserved. 18
Client
Coordinator
Replica
Replica
Replica
Request

Network Topology Strategy
© DataStax, All Rights Reserved. 19
Client
Coordinator
Replica
Replica
Replica
Request
Class: NetworkTopologyStrategy
Parameters:
• dc_name: replication_factor

Network Topology Strategy
© DataStax, All Rights Reserved. 20
Client Coordinator
ReplicaRequest
Rack 1
Rack 2
Replica
Replica

Network Topology Strategy
© DataStax, All Rights Reserved. 21
Client
Coordinator
ReplicaRequest
Rack 1
Rack 3
Replica
Replica
Rack 2
Tools:
nodetool status ks
nodetool getendpoints ks table val

Consistency
Balancing performance and correctness

Consistency Levels
© DataStax, All Rights Reserved. 24

Consistent Reads
• ALL
• QUORUM
• LOCAL_QUORUM
• ONE
• LOCAL_ONE
• SERIAL
© DataStax, All Rights Reserved. 25
Replica
Replica
Replica

Consistent Writes
• ALL
• QUORUM
• LOCAL_QUORUM
• ONE
• LOCAL_ONE
• ANY
© DataStax, All Rights Reserved. 26
Replica
Replica
Replica

Consistency Failures
© DataStax, All Rights Reserved. 27
What happens when
the desired
consistency level
cannot be
achieved?
Replica
Replica
Replica

Failure Recovery
© DataStax, All Rights Reserved. 28
Staying Consistent
In the event of a failure
how do replicas get
the latest data?
Replica
Replica
Replica

Conclusion
© DataStax, All Rights Reserved.29

Empfohlen

Parquet performance tuning: the missing guideRyan Blue

How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward

Intro to Delta LakeDatabricks

A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks

Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation

Introduction to PySparkRussell Jurney

Modern real-time streaming architecturesArun Kejariwal

Building a Virtual Data Lake with Apache ArrowDremio Corporation

Empfohlen

Parquet performance tuning: the missing guideRyan Blue

How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward

Intro to Delta LakeDatabricks

A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks

Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation

Introduction to PySparkRussell Jurney

Modern real-time streaming architecturesArun Kejariwal

Building a Virtual Data Lake with Apache ArrowDremio Corporation

Building an open data platform with apache icebergAlluxio, Inc.

Performant Streaming in Production: Preventing Common Pitfalls when Productio...Databricks

Streaming SQL with Apache CalciteJulian Hyde

Cassandra Operations at Netflixgreggulrich

Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Databricks

Apache Spark on K8S Best Practice and Performance in the CloudDatabricks

Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit

Making Structured Streaming Ready for ProductionDatabricks

How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit

NoSQL Databases: Why, what and whenLorenzo Alberton

Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks

Free Training: How to Build a LakehouseDatabricks

Achieving Lakehouse Models with Spark 3.0Databricks

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasFlink Forward

How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB

Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward

Data Lakehouse Symposium | Day 4Databricks

The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem

Big data architectures and the data lakeJames Serra

Simplify and Scale Data Engineering Pipelines with Delta LakeDatabricks

Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy

Cql – cassandra query languageCourtney Robinson

Weitere ähnliche Inhalte

Was ist angesagt?

Building an open data platform with apache icebergAlluxio, Inc.

Performant Streaming in Production: Preventing Common Pitfalls when Productio...Databricks

Streaming SQL with Apache CalciteJulian Hyde

Cassandra Operations at Netflixgreggulrich

Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Databricks

Apache Spark on K8S Best Practice and Performance in the CloudDatabricks

Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit

Making Structured Streaming Ready for ProductionDatabricks

How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit

NoSQL Databases: Why, what and whenLorenzo Alberton

Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks

Free Training: How to Build a LakehouseDatabricks

Achieving Lakehouse Models with Spark 3.0Databricks

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasFlink Forward

How to Build a Scylla Database Cluster that Fits Your NeedsScyllaDB

Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward

Data Lakehouse Symposium | Day 4Databricks

The columnar roadmap: Apache Parquet and Apache ArrowJulien Le Dem

Big data architectures and the data lakeJames Serra

Simplify and Scale Data Engineering Pipelines with Delta LakeDatabricks

Was ist angesagt? (20)

Building an open data platform with apache iceberg

Performant Streaming in Production: Preventing Common Pitfalls when Productio...

Streaming SQL with Apache Calcite

Cassandra Operations at Netflix

Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...

Apache Spark on K8S Best Practice and Performance in the Cloud

Flexible and Real-Time Stream Processing with Apache Flink

Making Structured Streaming Ready for Production

How Uber scaled its Real Time Infrastructure to Trillion events per day

NoSQL Databases: Why, what and when

Designing Structured Streaming Pipelines—How to Architect Things Right

Free Training: How to Build a Lakehouse

Achieving Lakehouse Models with Spark 3.0

Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas

How to Build a Scylla Database Cluster that Fits Your Needs

Exactly-Once Financial Data Processing at Scale with Flink and Pinot

Data Lakehouse Symposium | Day 4

The columnar roadmap: Apache Parquet and Apache Arrow

Big data architectures and the data lake

Simplify and Scale Data Engineering Pipelines with Delta Lake

Andere mochten auch

Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy

Cql – cassandra query languageCourtney Robinson

Introduction to cassandraNguyen Quang

Introduction to Apache CassandraRobert Stupp

Migrating Netflix from Datacenter Oracle to Global CassandraAdrian Cockcroft

Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy

Cassandra at eBay - Cassandra Summit 2012Jay Patel

An Overview of Apache CassandraDataStax

Andere mochten auch (8)

Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...

Cql – cassandra query language

Introduction to cassandra

Introduction to Apache Cassandra

Migrating Netflix from Datacenter Oracle to Global Cassandra

Solr & Cassandra: Searching Cassandra with DataStax Enterprise

Cassandra at eBay - Cassandra Summit 2012

An Overview of Apache Cassandra

Ähnlich wie Replication and Consistency in Cassandra... What Does it All Mean? (Christopher Bradford, DataStax) | C* Summit 2016

Scaling DataStax in DockerDataStax

DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...Daniel Cohen

The Sierra Supercomputer: Science and Technology on a Missioninside-BigData.com

Data Con LA 2019 - Patterns for Persistence and Streaming in Cloud Architectu...Data Con LA

Patterns for Persistence and Streaming in Microservice ArchitecturesJeffrey Carpenter

Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit

Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Cédrick Lunven

Cassandra Tuning - above and beyondMatija Gobec

Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...DataStax

MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...Mydbops

GumGum: Multi-Region Cassandra in AWSDataStax Academy

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax

Clipper: A Low-Latency Online Prediction Serving SystemDatabricks

Effectively Migrating to Cassandra from a Relational DatabaseTodd McGrath

Containers and KubernetesAltoros

Boston hug-2012-07Ted Dunning

Keep Your Kafka Cloud Costs in Check with ShowbacksHostedbyConfluent

Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax

Performance is not an Option - gRPC and CassandraDave Bechberger

OpenEBS CAS SDC India - 2018OpenEBS

Ähnlich wie Replication and Consistency in Cassandra... What Does it All Mean? (Christopher Bradford, DataStax) | C* Summit 2016 (20)

Scaling DataStax in Docker

DataStax Enterprise & Apache Cassandra – Essentials for Financial Services – ...

The Sierra Supercomputer: Science and Technology on a Mission

Data Con LA 2019 - Patterns for Persistence and Streaming in Cloud Architectu...

Patterns for Persistence and Streaming in Microservice Architectures

Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...

Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)

Cassandra Tuning - above and beyond

Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...

MySQL Transformation Case Study: 80% Cost Savings & Uninterrupted Availabilit...

GumGum: Multi-Region Cassandra in AWS

Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...

Clipper: A Low-Latency Online Prediction Serving System

Effectively Migrating to Cassandra from a Relational Database

Containers and Kubernetes

Boston hug-2012-07

Keep Your Kafka Cloud Costs in Check with Showbacks

Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...

Performance is not an Option - gRPC and Cassandra

OpenEBS CAS SDC India - 2018

Mehr von DataStax

Is Your Enterprise Ready to Shine This Holiday Season?DataStax

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax

Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax

Best Practices for Getting to Production with DataStax Enterprise GraphDataStax

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax

Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax

Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax

Introduction to Apache Cassandra™ + What’s New in 4.0DataStax

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax

Designing a Distributed Cloud Database for DummiesDataStax

How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax

How to Evaluate Cloud Databases for eCommerceDataStax

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax

Datastax - The Architect's guide to customer experience (CX)DataStax

An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax

Mehr von DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?

Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...

Running DataStax Enterprise in VMware Cloud and Hybrid Environments

Best Practices for Getting to Production with DataStax Enterprise Graph

Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey

Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...

Webinar | Better Together: Apache Cassandra and Apache Kafka

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise

Introduction to Apache Cassandra™ + What’s New in 4.0

Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...

Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities

Designing a Distributed Cloud Database for Dummies

How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud

How to Evaluate Cloud Databases for eCommerce

Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...

Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...

Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...

Datastax - The Architect's guide to customer experience (CX)

An Operational Data Layer is Critical for Transformative Banking Applications

Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Kürzlich hochgeladen

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Software Quality Assurance Interview QuestionsArshad QA

Define the academic and professional writing..pdfPearlKirahMaeRagusta1

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

Exploring the Best Video Editing App.pdfproinshot.com

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

TECUNIQUE: Success Stories: IT Service providermohitmore19

10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10

Right Money Management App For Your Financial GoalsJhone kinadey

AI & Machine Learning Presentation TemplatePresentation.STUDIO

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Kürzlich hochgeladen (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Software Quality Assurance Interview Questions

Define the academic and professional writing..pdf

Diamond Application Development Crafting Solutions with Precision

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

Exploring the Best Video Editing App.pdf

Microsoft AI Transformation Partner Playbook.pdf

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

Unlocking the Future of AI Agents with Large Language Models

TECUNIQUE: Success Stories: IT Service provider

10 Trends Likely to Shape Enterprise Technology in 2024

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf

Right Money Management App For Your Financial Goals

AI & Machine Learning Presentation Template

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

How To Use Server-Side Rendering with Nuxt.js

Replication and Consistency in Cassandra... What Does it All Mean? (Christopher Bradford, DataStax) | C* Summit 2016

1. Christopher Bradford Replication and consistency in Cassandra... What does it all mean?

2. Who is this guy?

4. Introduction

9. CAP Theorem Evolved © DataStax, All Rights Reserved. 9 The modern CAP goal should be to maximize combinations of consistency and availability that make sense for the specific application. Such an approach incorporates plans for operation during a partition and for recovery afterward, thus helping designers think about CAP beyond its historically perceived limitations. - Eric Brewer C A P

11. Replication Availability & Partition Tolerance

15. Configuring Replication Replication is defined at the keyspace level. © DataStax, All Rights Reserved. 15 Strategy Parameters CREATE KEYSPACE cassandra_summit WITH REPLICATION = { ‘class’: ‘SimpleStrategy’, ‘replication_factor’: 3 };

22. Consistency Balancing performance and correctness

23. Tunable Consistency

30. Questions?

Hinweis der Redaktion

I have contributed to multiple open source projects around distributed computing
Today we are talking about Consistency & Replication. We want to dive into how are these key components are implemented in Cassandra and what do they mean to Developers and Operations?
We’ll start with the CAP theorem. The CAP theorem says that it is impossible for a distributed system to provide guarantees around the following areas. Autumn of 1998
Consistency - Every read receives the most recent write or an error
Availability - Every request receives a response Note the data returned is not guaranteed to be the most recent
Wikipedia says “due to network failures”, given my week it is due to disk controller errors. Partition Tolerance – the system continues despite arbitrary partitioning
Eric Brewer, creator of the CAP theorem, argues that the pick 2 statement was a bit misleading. Really system designers need to choose where to be flexible in the case of a partition. Some systems prefer availability while Cassandra focuses on consistency.
Consistency may be flexible to meet the needs of the application!
How are availability and partition tolerance achieved? Through replication
Let’s look at how replication affects availability and partition tolerance. Assume in this environment we have a replication factor of 3. This means every piece of data written to our cluster is stored 3 times. The client issues a write to a node in the cluster. This node is known as the coordinator for the duration of this request. The coordinator hashes the PARTITION KEY portion of the PRIMARY KEY and uses a REPLICATION STRATEGY to determine where the replicas live and forwards the request to those nodes.
When one of the replicas is down the coordinator still writes to the other two replicas. This satisfies the availability portion of the CAP Theorem A hint is then stored on the coordinator. Should the replica that was down come back up within a certain time period (default 3 hours) the hint will be replayed against that host. This helps get the node back in sync with the rest of the cluster. This satisfies the partition tolerant portion of the CAP theorem
Reads behave in a similar manner (although without the need for a hint). The coordinator is aware that one of the replicas is down and responds with data from the two replicas that are still available.
Replication is defined at the keyspace level. The number of replicas and placement strategy are supplied during keyspace creation. The replication strategy is specified with the class parameter. Other parameters may also be required, but they are tied to the strategy being used.
Cassandra ships with a few replication strategies. Let’s look at those now.
The Simple Strategy is just that, Simple. The replicas for a particular piece of data circle the ring in order.
If the token hashes differently then it may look like this. WARNING: Be very careful with SimpleStrategy in multi-DC environment. Depending on the consistency level and load balancing policy writes may throw exceptions as hosts are not necessarily routable.
Network Topology Strategy layers additional logic into the replica selection process. It utilizes topology information to place replicas in a manner which ensures an even higher level of reliability. In our first example let’s assume every node is in the same DC and physical rack. The replica selection process looks similar to the SimpleStrategy, there is nothing too intense here. Note you must use an appropriate Snitch. The snitch defines how topology information is shared in the cluster. The GossipingPropertyFileSnitch uses the cassandra-rackdc.properties file to share topology info with peers where the SimpleSnitch has hardcoded values 
In this example we have two racks, the aptly named rack 1 and rack 2. Now the network topology strategy will try and protect against an entire rack failure by placing replicas on different racks. In fact it won’t even attempt to write to another node in the same rack until all other rack options have been exhausted.
In this example we have two racks added a third rack. Now one replica will be placed on each of the racks. This will lead to an imbalance in the cluster. The singular node in rack 1 will contain 100% of the data in the cluster.
Remember how we discussed how the CAP theorem has evolved over time. It’s clear in Cassandra that this is at the heart. Consistency can be tuned on a per-query level to enhance performance or ensure consistency.
We call this tunable consistency. Consistency can even be tuned depending on cluster health to ensure application availability. Let’s look at some of these levels.
The consistency level is used to determine the number of replicas that must respond for any given query. Some levels are unique to the type of query being performed (read vs write). Others span datacenters and are not expected to be extremely performant.
Discuss advantages of each, speed, correctness, failure tolerance.
Discuss advantages of each, speed, correctness, failure tolerance. Note that even though we may return after one replica acknowledged the write we still attempt the write on other replicas. Bring up immediate consistency.
An exception is thrown! At this point you can alert the user to a potential issue or try the request with a different consistency level. This process may be automated with a RetryPolicy. This is a big win for your application as you can stay available for users while alerting them that there may be a slight delay on changes.
Cassandra has a number of anti-entropy processes to ensure nodes respond with the latest information. We were first introduced to this earlier with hinted handoffs. Should hints expire a manual repair of the node will synchronize it with its neighbors. There is also a read repair system in place. This takes the data returned by multiple replicas, compares them, and returns the latest value. A write is also issued which fixes any replicas that are out of date. This was taken a step further by Netflix with the Cassandra tickler. This process performs a read at the ALL consistency level across every primary key value forcing replicas to be brought in sync.
Replication Controls where copies live Set on the keyspace level Are imperative both during a and p situations Consistency Dictates trade-offs between performance and correctness Achieves synchronization of replicas Consistency levels Both are core building blocks of Cassandra. Understand their usage and don’t be afraid to jump in the code. The classes that make this up are fairly simple and easy to reason about.