SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Cassandra Insider
By :
Bhavya Aggarwal
Manjot kaur
CONTENTS
● Why NoSQL
● Features of Cassandra
● Gossip Protocol
● Data Distribution in Cassandra
● Write Path
● Read Path
WHY NOSQL
● Within corporations, around 80% of data is
unstructured.
● Availability and Scalability issues with RDBMS.
● NoSQL dbs have horizontal scalability and high
availability, in some cases at the cost of strong
consistency and ACID semantics.
CASSANDRA
● Apache Cassandra is a massively scalable
NoSQL database.
Big Companies using cassandra
More than 30,000 Companies use(or have used)
Apache Cassandra in Production.
FEATURES
● Distributed
● Decentralized
● Linearly scalability
● Tunable consistency
Distributed
Distributed i.e. capable of running on multiple
machines while appearing to users as a unified
whole.
Decentralized
● Decentralized i.e every node is identical
● There is no single point of failure.
Linear Scalability
It means that your cluster can seamlessly scale up
and scale back down.
Tunable Consistency
You can have strict, weak or causal consistency in
cassandra with the help of Replication Factor and
Consistency Level.
Brewer’s CAP Theorem
Cassandra vs RDBMS
Cassandra RDBMS
ACID ❌ yes
Foreign Keys ❌ yes
Joins ❌ yes
Secondary Indexes yes yes
Distributed yes ❌
Linear Scalability yes ❌
Fault Tolerance yes ❌
Cassandra Architecture
In cassandra all the nodes are identical.
A Cassandra cluster has no special nodes i.e. the
cluster has no masters, no slaves or elected leaders.
Cassandra cluster
Cassandra supports a masterless ring architecture.
Tracking Nodes
Lets see how cassandra keeps a track of nodes in a
cluster.
● Gossip Protocol
● Snitches
Gossip protocol
A node/initiator in a cluster chooses a node/peer
randomly to gossip with.
Sends the metadata it has about itself and other
nodes in the cluster.
Receives metadata/updates that the other node has.
Main points
● Every node gossips with every other node in a
cluster every second.
● The Gossiper class maintains a list of nodes that
are alive and dead.
● The gossiper runs every second on a timer on
every node of a cluster.
3 Way Handshake
Snitches
The job of a snitch is to determine relative host
proximity for each node in a cluster, which is used to
determine which nodes to read and write from.
Example: Snitch in Read
Operation
While reading data cassandra must contact a number
of replicas determined by the consistency level. For
fast read operations, it selects a single replica to
query for the full object, and take hash values from
others in order to ensure the latest version of the
requested data is returned.
Snitch finds the closest replica and the coordinator
node queries it for full data.
Example: Snitch in Read
Operation
Data Distribution Across Nodes
● Tokens
● Partitioners
Single Token Architecture
Rings and Tokens
● Each node in the ring is assigned one or more
ranges of data described by a token, which
determines its position in the ring.
● A token is a 64-bit integer ID used to identify each
partition.
Partitioners
● A partitioner, is a hash function for computing the
token of a partition key.
● Each row of data is distributed within the ring
according to the value of the partition key token
calculated by the partitioner at every node.
● Murmur3Partitioner is the default partitioner.
Virtual Nodes
● Cassandra’s 1.2 release introduced the concept of
virtual nodes, instead of assigning a single token
to a node, a range of tokens is assigned.
● By default, each node will be assigned 256 of
these tokens, meaning that it contains 256 virtual
nodes.
Vnode Ring Architecture
Advantages
● Tokens are generated automatically by cassandra.
● Smaller Partitions.
● Less load on nodes.
Replication Strategies
● Cassandra replicates data across nodes in a
manner transparent to the user, and the replication
factor is the number of nodes in your cluster that
will receive copies (replicas) of the same data.
● If your replication factor is 3, then three nodes in
the ring will have copies of each row.
Replication in SimpleStrategy
Consistency Levels
● For read queries, the consistency level specifies
how many replica nodes must respond to a read
request before returning the data.
● For write operations, the consistency level
specifies how many replica nodes must respond
for the write to be reported as successful to the
client.
A Write Request in Cassandra
Write Path in Cassandra
Interactions Within a Node
Hinted Handoff
Tombstones
When you execute a delete operation, the data is not
immediately deleted. Instead, it’s treated as an
update operation that places a tombstone on the
value. A tombstone is a deletion marker that is
required to suppress older data in SSTables until
compaction can run.
READ PATH
Row cache and Key cache
Request flow
Bloom Filters
● Bloom filters condense a larger data set into a
digest string using a hash function.
● The digest strings are stored in memory and are
used to improve performance by reducing the
need for disk access on key lookups.
● So a Bloom filter is a special kind of cache. When
a query is performed, the Bloom filter is checked
first before accessing disk.
Compaction
Replica synchronization
Read repair refers to the synchronization of replicas
as data is read. While reading if any replicas have out
of date values a read repair is performed immediately
to update the out of date replicas.
Anti-entropy repair (manual repair) is a manually
initiated operation performed on nodes as part of a
regular maintenance process. This type of repair is
executed by running nodetool repair on a node to
execute a major compaction
References
● https://docs.datastax.com/en/landing_page/doc/landing_
● https://www.youtube.com/watch?v=FuP1Fvrv6ZQ
● https://www.youtube.com/watch?v=FNfiYJm1GJs&t=153
● Cassandra The Definative Guide O’REILLY 2nd
Edition.
Thank you

Weitere ähnliche Inhalte

Ähnlich wie Cassandra Insider

cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
Cassandra advanced-I
Cassandra advanced-ICassandra advanced-I
Cassandra advanced-Iachudhivi
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppthothyfa
 
Cassandra consistency
Cassandra consistencyCassandra consistency
Cassandra consistencyzqhxuyuan
 
Database Shrading and cassandra architecture
Database Shrading and cassandra architectureDatabase Shrading and cassandra architecture
Database Shrading and cassandra architectureSoupik Chowdhury
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0Asis Mohanty
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Lviv Startup Club
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 

Ähnlich wie Cassandra Insider (20)

Cassandra
CassandraCassandra
Cassandra
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
cassandra.pptx
cassandra.pptxcassandra.pptx
cassandra.pptx
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra advanced-I
Cassandra advanced-ICassandra advanced-I
Cassandra advanced-I
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 
5266732.ppt
5266732.ppt5266732.ppt
5266732.ppt
 
Cassandra consistency
Cassandra consistencyCassandra consistency
Cassandra consistency
 
Database Shrading and cassandra architecture
Database Shrading and cassandra architectureDatabase Shrading and cassandra architecture
Database Shrading and cassandra architecture
 
Cassandra tutorial
Cassandra tutorialCassandra tutorial
Cassandra tutorial
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Cassandra Learning
Cassandra LearningCassandra Learning
Cassandra Learning
 
Cassandra basics 2.0
Cassandra basics 2.0Cassandra basics 2.0
Cassandra basics 2.0
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 

Mehr von Knoldus Inc.

Authentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptxAuthentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptxKnoldus Inc.
 
OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)Knoldus Inc.
 
Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxKnoldus Inc.
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingKnoldus Inc.
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionKnoldus Inc.
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxKnoldus Inc.
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptxKnoldus Inc.
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfKnoldus Inc.
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxKnoldus Inc.
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesKnoldus Inc.
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxKnoldus Inc.
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 

Mehr von Knoldus Inc. (20)

Authentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptxAuthentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptx
 
OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)
 
Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 

Kürzlich hochgeladen

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 

Kürzlich hochgeladen (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 

Cassandra Insider

  • 1. Cassandra Insider By : Bhavya Aggarwal Manjot kaur
  • 2. CONTENTS ● Why NoSQL ● Features of Cassandra ● Gossip Protocol ● Data Distribution in Cassandra ● Write Path ● Read Path
  • 3. WHY NOSQL ● Within corporations, around 80% of data is unstructured. ● Availability and Scalability issues with RDBMS. ● NoSQL dbs have horizontal scalability and high availability, in some cases at the cost of strong consistency and ACID semantics.
  • 4. CASSANDRA ● Apache Cassandra is a massively scalable NoSQL database.
  • 5. Big Companies using cassandra More than 30,000 Companies use(or have used) Apache Cassandra in Production.
  • 6. FEATURES ● Distributed ● Decentralized ● Linearly scalability ● Tunable consistency
  • 7. Distributed Distributed i.e. capable of running on multiple machines while appearing to users as a unified whole.
  • 8. Decentralized ● Decentralized i.e every node is identical ● There is no single point of failure.
  • 9. Linear Scalability It means that your cluster can seamlessly scale up and scale back down.
  • 10. Tunable Consistency You can have strict, weak or causal consistency in cassandra with the help of Replication Factor and Consistency Level.
  • 12. Cassandra vs RDBMS Cassandra RDBMS ACID ❌ yes Foreign Keys ❌ yes Joins ❌ yes Secondary Indexes yes yes Distributed yes ❌ Linear Scalability yes ❌ Fault Tolerance yes ❌
  • 13.
  • 14. Cassandra Architecture In cassandra all the nodes are identical. A Cassandra cluster has no special nodes i.e. the cluster has no masters, no slaves or elected leaders.
  • 15. Cassandra cluster Cassandra supports a masterless ring architecture.
  • 16. Tracking Nodes Lets see how cassandra keeps a track of nodes in a cluster. ● Gossip Protocol ● Snitches
  • 17. Gossip protocol A node/initiator in a cluster chooses a node/peer randomly to gossip with. Sends the metadata it has about itself and other nodes in the cluster. Receives metadata/updates that the other node has.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Main points ● Every node gossips with every other node in a cluster every second. ● The Gossiper class maintains a list of nodes that are alive and dead. ● The gossiper runs every second on a timer on every node of a cluster.
  • 26. Snitches The job of a snitch is to determine relative host proximity for each node in a cluster, which is used to determine which nodes to read and write from.
  • 27. Example: Snitch in Read Operation While reading data cassandra must contact a number of replicas determined by the consistency level. For fast read operations, it selects a single replica to query for the full object, and take hash values from others in order to ensure the latest version of the requested data is returned. Snitch finds the closest replica and the coordinator node queries it for full data.
  • 28. Example: Snitch in Read Operation
  • 29. Data Distribution Across Nodes ● Tokens ● Partitioners
  • 31. Rings and Tokens ● Each node in the ring is assigned one or more ranges of data described by a token, which determines its position in the ring. ● A token is a 64-bit integer ID used to identify each partition.
  • 32. Partitioners ● A partitioner, is a hash function for computing the token of a partition key. ● Each row of data is distributed within the ring according to the value of the partition key token calculated by the partitioner at every node. ● Murmur3Partitioner is the default partitioner.
  • 33. Virtual Nodes ● Cassandra’s 1.2 release introduced the concept of virtual nodes, instead of assigning a single token to a node, a range of tokens is assigned. ● By default, each node will be assigned 256 of these tokens, meaning that it contains 256 virtual nodes.
  • 35. Advantages ● Tokens are generated automatically by cassandra. ● Smaller Partitions. ● Less load on nodes.
  • 36. Replication Strategies ● Cassandra replicates data across nodes in a manner transparent to the user, and the replication factor is the number of nodes in your cluster that will receive copies (replicas) of the same data. ● If your replication factor is 3, then three nodes in the ring will have copies of each row.
  • 38. Consistency Levels ● For read queries, the consistency level specifies how many replica nodes must respond to a read request before returning the data. ● For write operations, the consistency level specifies how many replica nodes must respond for the write to be reported as successful to the client.
  • 39. A Write Request in Cassandra
  • 40. Write Path in Cassandra
  • 43. Tombstones When you execute a delete operation, the data is not immediately deleted. Instead, it’s treated as an update operation that places a tombstone on the value. A tombstone is a deletion marker that is required to suppress older data in SSTables until compaction can run.
  • 45. Row cache and Key cache Request flow
  • 46. Bloom Filters ● Bloom filters condense a larger data set into a digest string using a hash function. ● The digest strings are stored in memory and are used to improve performance by reducing the need for disk access on key lookups. ● So a Bloom filter is a special kind of cache. When a query is performed, the Bloom filter is checked first before accessing disk.
  • 48. Replica synchronization Read repair refers to the synchronization of replicas as data is read. While reading if any replicas have out of date values a read repair is performed immediately to update the out of date replicas. Anti-entropy repair (manual repair) is a manually initiated operation performed on nodes as part of a regular maintenance process. This type of repair is executed by running nodetool repair on a node to execute a major compaction
  • 49. References ● https://docs.datastax.com/en/landing_page/doc/landing_ ● https://www.youtube.com/watch?v=FuP1Fvrv6ZQ ● https://www.youtube.com/watch?v=FNfiYJm1GJs&t=153 ● Cassandra The Definative Guide O’REILLY 2nd Edition.

Hinweis der Redaktion

  1. UNDER HIGH LOADS JOINS MAKES OUR QUERIES SLOW SO WE TEND TO DENORMALIZE OUR TABLES
  2. Big companies effectively managing their big data . Started with facebook Inbox search in 2009.
  3. We have a cluster in cassandra , which is a group of several nodes. A node is a cassandra server/or a cassandra instance that we run on a machine.
  4. There is no master- slave architecture in cassandra, no special nodes every node is same and have similar responsibilities in cassandra. There is no single point of failure means that is any node in the cluster fails then it does not affect any functionalities(read/ write) of cassandra. Cassandra stores replicas in various nodes so if a node fails then also the data belonging to that node can be retrieved.
  5. If we add nodes to our cluster then the throughput increases linearly without affecting performance. Cassandra can handle data loads gracefully.
  6. We set replication factor per keyspace in cassandra . Replication Factor = How many replicas we want for our data in our system. Consistency can be set per read write query .
  7. Cassandra has partition tolerance and availability and is eventually consistent.
  8. A row must is indexed by partition key and can searched only by partition key. We define the partition key while defining the table itself. We have to set replication factor and strategy for every keyspace in cassandra.
  9. So how does nodes in cassandra store information about other nodes in a cluster ?
  10. A communication protocol
  11. Explain replicas for read and write path.
  12. Partitioner is present at every node of the cluster. This partition key token generated by the partitioner is compared to the token values for the various nodes to identify the range, and therefore the node, that owns the data. Token ranges are represented by the org.apache.cassandra.dht.Range class.
  13. Early versions of Cassandra assigned a single token to each node, in a fairly static manner, requiring you to calculate tokens for each node.
  14. To understand read and write paths we must understand Replication Strategies and consistency level.
  15. Use for a single data center only. If you ever intend more than one data center, use the NetworkTopologyStrategy.
  16. Because Cassandra is eventually consistent, updates to other replica nodes may continue in the background. ALL, QUORUM, ONE are some of the consistency levels available. Consistency level can be configured on a cluster, datacenter, or individual I/O operation basis. Consistency among participating nodes can be set globally and also controlled on a per-operation basis (for example insert or update) using Cassandra’s drivers and client libraries.
  17. Suppose a write request is sent to Cassandra, but a replica node where the write belongs is not available ,then the coordinator will create a hint for the other node and store it and once it detects via gossip that the other is back online, the coordinator node will send hint to other node. consider a cluster consisting of three nodes, A, B, and C,with a replication factor of 2. When a row K is written to the coordinator (node A in this case), even if node C is down, the consistency level of ONE or QUORUM can be met. Why? Both nodes A and B will receive the data, so the consistency level requirement is met. A hint is stored for node C and written when node C comes up. In the meantime, the coordinator can acknowledge that the write succeeded.
  18. A compaction operation in Cassandra is performed in order to merge SSTables. During compaction, the data in SSTables is merged: the keys are merged, columns are combined, tombstones are discarded, and a new index is created. Compaction is the process of freeing up space by merging large accumulated datafiles