SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Overview of
Cassandra
Outline
● History/motivation
● Semi structured data in Cassandra
  ○ CFs and SuperCFs
● Architecture of Cassandra system
  ○   Distribution of content
  ○   Replication of content
  ○   Consistency level
  ○   Node internals
  ○   Gossip
● Thrift API
● Design patterns - denormalization
History/motivation
● Initially developed by facebook for Inbox
  Search
  ○ in late 2007/early 2008
● Designed for
  ○ node failure - commodity hardware
  ○ scale - can increase number of nodes easily to
    accommodate increasing demand
  ○ fast write access while delivering good read
    performance
● Combination of Bigtable and Dynamo
● Was operational for over 2 years
  ○ Dropped in favour of HBase
History/motivation
● Released as open source in July 2008
● Apache liked it
  ○ Became Apache Incubator project in March 2009
  ○ Became Apache top level project in Feb 2010
● Active project with releases every few
  months
  ○ currently on version 1.1
    ■ production ready, but still evolving
Why it's interesting (in this
context)...
● Has seen significant growth in last couple of
  years
● Enough deployments to be credible
   ○ Netflix, Ooyala, Digg, Cisco,
● Is scalable and robust enough for big data
  problems
   ○ no single point of failure
● Complex system
   ○ perhaps excessively complex today
Cassandra - semi
structured data
● Column based database
  ○ has similarities to standard RDBMS
● Terminology:
  ○ Keystore -> database
  ○ ColumnFamily -> table
Cassandra - semi
structured data
● No specific schema is required
  ○ although it is possible to define schema
    ■ can include typing information for parts of
        schema to minimize data integrity problems
● Rows can have large numbers of columns
  ○ limit on number of columns is 2B
● Column values should not exceed some MB
● SuperColumns are columns embedded
  within columns
  ○ third level in a map
  ○ little discussion of SC here
Supercolumns depicted
Cassandra - secondary
indexing
● Columns can be indexed
  ○ so-called 'secondary indexing'
    ■ row keys form the primary index
● Some debate abt the merits of secondary
  indexing in cassandra
  ○ secondary indexing is an atomic operation
    ■ unlike alternative 'manual' indexing approach
  ○ causes change in thinking regarding NoSQL design
    ■ very similar to classical RDBMS thinking
Cassandra Architecture
● Cluster configuration typical
● All nodes peers
   ○ although there are some seeds which should be
     more reliable, larger nodes
● Peers have common view of tokenspace
   ○ tokenspace is a ring
      ■ of size 2^127
   ○ peers have responsibility for some part of ring
      ■ ie some range of tokens within ring
● Row key/keyspace mapped to token
   ○ used to determine which node is responsible for row
     data
Cassandra - Cluster and
Tokenspace
Cassandra - Data
Distribution
● Map from RowKey to token determines data
  distribution
● RandomPartitioner is most important map
  ○   generates MD5 hash of rowkey
  ○   distributes data evenly over nodes in cluster
  ○   highly preferred solution
  ○   constraint that it is not possible to iterate over rows
● OrderedPartitioner
  ○ generates token based on simply byte mapping of
    row key
  ○ most probably results in uneven distribution of data
  ○ can be used to iterate over rows
Cassandra - Data
Replication
● Multiple levels of replication supported
   ○ can support arbitrary level of replication
   ○ replication factors specified per keyspace
● Two replication strategies
   ○ RackUnaware
     ■ Make replicas in next n nodes along token ring
   ○ RackAware
     ■ Makes one replica in remote data centre
     ■ Make remaining replicas in next nodes along
       token ring
          ●   good ring configuration should result in diversity over data
              centres
Cassandra - Consistency
Level
● A mechanism to trade off latency with data
  consistency
  ○ Write case:
    ■ Faster response <-> less sure data written
       properly
  ○ Read case:
    ■ Faster response <-> less sure most recent data
       read
● Related to data replication above
  ○ replication factor determines meaningful levels for
    consistency level
Cassandra - Consistency
  Level - Write
Level     Behavior
ANY       Ensure that the write has been written to at least 1 node, including HintedHandoff recipients.
ONE       Ensure that the write has been written to at least 1 replica's commit log and memory table
          before responding to the client.
TWO       Ensure that the write has been written to at least 2 replica's before responding to the client.
THREE     Ensure that the write has been written to at least 3 replica's before responding to the client.
QUORUM    Ensure that the write has been written to N / 2 + 1 replicas before responding to the client.
LOCAL_Q   Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within the local
UORUM     datacenter (requires NetworkTopologyStrategy)
EACH_QU   Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each datacenter
ORUM      (requires NetworkTopologyStrategy)
ALL       Ensure that the write is written to all N replicas before responding to the client. Any
          unresponsive replicas will fail the operation.
Cassandra - Consistency
  Level - Read
Level     Behavior
ANY       Not supported. You probably want ONE instead.
ONE       Will return the record returned by the first replica to respond. A consistency check is always
          done in a background thread to fix any consistency issues when ConsistencyLevel.ONE is
          used. This means subsequent calls will have correct data even if the initial read gets an older
          value. (This is calledReadRepair)
TWO       Will query 2 replicas and return the record with the most recent timestamp. Again, the
          remaining replicas will be checked in the background.
THREE     Will query 3 replicas and return the record with the most recent timestamp.
QUORUM    Will query all replicas and return the record with the most recent timestamp once it has at least
          a majority of replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the
          background.
LOCAL_Q   Returns the record with the most recent timestamp once a majority of replicas within the local
UORUM     datacenter have replied.
EACH_QU   Returns the record with the most recent timestamp once a majority of replicas within each
ORUM      datacenter have replied.
ALL       Will query all replicas and return the record with the most recent timestamp once all replicas
          have replied. Any unresponsive replicas will fail the operation.
Cassandra - Node Internals
● Node comprises
  ○ commit log
    ■ list of pending writes
  ○ memtable
    ■ data written to system resident in memory
  ○ SSTables
    ■ per CF file containing persistent data
● Memtable writes when out of space, too
  many keys or after time period
● SSTables comprise of
  ○ Data - sorted strings
  ○ Index, Bloom Filter
Cassandra - Node Internals
● Compaction occurs from time to time
  ○ cleans up SSTable
  ○ removes redundant rows
  ○ regenerates indexes
Cassandra - Behaviour -
Write
● Write properties:
  ○   No reads
  ○   No seeks
  ○   Fast!
  ○   Atomic within CF
  ○   Always writable
Cassandra - Behaviour -
Read
● Read Path:
  ○   Any node
  ○   Partitioner
  ○   Wait for R responses
  ○   Wait for N-R responses in background and perform
      read repair
● Read Properties:
  ○   Read multiple SSTables
  ○   Slower than writes (but stil fast)
  ○   Seeks can be mitigated with more RAM
  ○   Scales to billions of rows
Cassandra - Gossip
● Gossip protocol used to relay information
  between nodes in cluster
● Proactive communications mechanism to
  share information
  ○ nodes proactively share what they know with
    random other nodes
● Token space information exchanged via
  gossip
● Failure detection based on gossip
  ○ heartbeat mechanism
Thrift API - basic calls
● insert(key, column_parent, column,
    consistency_level)
    ○ key is row/keyspace identifier
    ○ column_parent is either column identifier
       ■ can be column name or super column idenfier
    ○ column is column data
●   get(key, column_path, consistency_level)
    ○ returns a column corresponding to the key
●   get_slice(key, column_parent,
    slice_predicate, consistency_level)
    ○ typically returns set of columns corresponding to key
Thrift API - other
operations
● get multiple rows
● delete row
● batch operations
  ○ important for speeding up system
  ○ can batch up mix of add, insert and delete
    operations
● keyspace and cluster management
Denormalization
● Cassandra requires query oriented design
  ○ determine queries first, design data models
    accordingly
  ○ in contrast to standard RDBMS
     ■ normalize data at design time
     ■ construct arbitrary queries usually based on joins
● Quite fundamental difference in approach
  ○ typically results in quite different data models
● Common use of valueless columns
  ○ column name contains data
    ■ good for time series data
  ○ can have very many columns in given row
Denormalization
● Standard SQL
   ○ SELECT * FROM USER WHERE CITY = 'Dublin'
● Typically create CF which groups users by
  city
   ○ row key is city identifer
   ○ columns are user IDs
● Can get UID of all users in given city by
  querying this CF
   ○ give city as row-key
Other considerations...
● SuperColumnFamily
  ○ when it is useful?
● Multi data centre deployments
  ○ Cassandra can leverage topology to maximize
    resiliency
● Reaction to node failure
● Reconfiguration of system
  ○ introduction of new nodes into existing system


● It is a complex system with many working
  parts

Weitere ähnliche Inhalte

Was ist angesagt?

Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
Markus Klems
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
 

Was ist angesagt? (20)

NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Kafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer ConsumersKafka Intro With Simple Java Producer Consumers
Kafka Intro With Simple Java Producer Consumers
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Cassandra background-and-architecture
Cassandra background-and-architectureCassandra background-and-architecture
Cassandra background-and-architecture
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Cassandra
CassandraCassandra
Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Data Stores @ Netflix
Data Stores @ NetflixData Stores @ Netflix
Data Stores @ Netflix
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
Amazon RDS with Amazon Aurora | AWS Public Sector Summit 2016
 

Ähnlich wie Cassandra overview

Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Boris Yen
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
shimi_k
 

Ähnlich wie Cassandra overview (20)

An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Apache cassandra an introduction
Apache cassandra  an introductionApache cassandra  an introduction
Apache cassandra an introduction
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
How Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfHow Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdf
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra1.2
Cassandra1.2Cassandra1.2
Cassandra1.2
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 

Mehr von Sean Murphy (8)

Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Demonstration
DemonstrationDemonstration
Demonstration
 
Overview of no sql
Overview of no sqlOverview of no sql
Overview of no sql
 
No sql course introduction
No sql course   introductionNo sql course   introduction
No sql course introduction
 
Rss talk
Rss talkRss talk
Rss talk
 
Rss announcements
Rss announcementsRss announcements
Rss announcements
 
Rocco pres-v1
Rocco pres-v1Rocco pres-v1
Rocco pres-v1
 
UCD Android Workshop
UCD Android WorkshopUCD Android Workshop
UCD Android Workshop
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Cassandra overview

  • 2. Outline ● History/motivation ● Semi structured data in Cassandra ○ CFs and SuperCFs ● Architecture of Cassandra system ○ Distribution of content ○ Replication of content ○ Consistency level ○ Node internals ○ Gossip ● Thrift API ● Design patterns - denormalization
  • 3. History/motivation ● Initially developed by facebook for Inbox Search ○ in late 2007/early 2008 ● Designed for ○ node failure - commodity hardware ○ scale - can increase number of nodes easily to accommodate increasing demand ○ fast write access while delivering good read performance ● Combination of Bigtable and Dynamo ● Was operational for over 2 years ○ Dropped in favour of HBase
  • 4. History/motivation ● Released as open source in July 2008 ● Apache liked it ○ Became Apache Incubator project in March 2009 ○ Became Apache top level project in Feb 2010 ● Active project with releases every few months ○ currently on version 1.1 ■ production ready, but still evolving
  • 5. Why it's interesting (in this context)... ● Has seen significant growth in last couple of years ● Enough deployments to be credible ○ Netflix, Ooyala, Digg, Cisco, ● Is scalable and robust enough for big data problems ○ no single point of failure ● Complex system ○ perhaps excessively complex today
  • 6. Cassandra - semi structured data ● Column based database ○ has similarities to standard RDBMS ● Terminology: ○ Keystore -> database ○ ColumnFamily -> table
  • 7. Cassandra - semi structured data ● No specific schema is required ○ although it is possible to define schema ■ can include typing information for parts of schema to minimize data integrity problems ● Rows can have large numbers of columns ○ limit on number of columns is 2B ● Column values should not exceed some MB ● SuperColumns are columns embedded within columns ○ third level in a map ○ little discussion of SC here
  • 9. Cassandra - secondary indexing ● Columns can be indexed ○ so-called 'secondary indexing' ■ row keys form the primary index ● Some debate abt the merits of secondary indexing in cassandra ○ secondary indexing is an atomic operation ■ unlike alternative 'manual' indexing approach ○ causes change in thinking regarding NoSQL design ■ very similar to classical RDBMS thinking
  • 10. Cassandra Architecture ● Cluster configuration typical ● All nodes peers ○ although there are some seeds which should be more reliable, larger nodes ● Peers have common view of tokenspace ○ tokenspace is a ring ■ of size 2^127 ○ peers have responsibility for some part of ring ■ ie some range of tokens within ring ● Row key/keyspace mapped to token ○ used to determine which node is responsible for row data
  • 11. Cassandra - Cluster and Tokenspace
  • 12. Cassandra - Data Distribution ● Map from RowKey to token determines data distribution ● RandomPartitioner is most important map ○ generates MD5 hash of rowkey ○ distributes data evenly over nodes in cluster ○ highly preferred solution ○ constraint that it is not possible to iterate over rows ● OrderedPartitioner ○ generates token based on simply byte mapping of row key ○ most probably results in uneven distribution of data ○ can be used to iterate over rows
  • 13. Cassandra - Data Replication ● Multiple levels of replication supported ○ can support arbitrary level of replication ○ replication factors specified per keyspace ● Two replication strategies ○ RackUnaware ■ Make replicas in next n nodes along token ring ○ RackAware ■ Makes one replica in remote data centre ■ Make remaining replicas in next nodes along token ring ● good ring configuration should result in diversity over data centres
  • 14. Cassandra - Consistency Level ● A mechanism to trade off latency with data consistency ○ Write case: ■ Faster response <-> less sure data written properly ○ Read case: ■ Faster response <-> less sure most recent data read ● Related to data replication above ○ replication factor determines meaningful levels for consistency level
  • 15. Cassandra - Consistency Level - Write Level Behavior ANY Ensure that the write has been written to at least 1 node, including HintedHandoff recipients. ONE Ensure that the write has been written to at least 1 replica's commit log and memory table before responding to the client. TWO Ensure that the write has been written to at least 2 replica's before responding to the client. THREE Ensure that the write has been written to at least 3 replica's before responding to the client. QUORUM Ensure that the write has been written to N / 2 + 1 replicas before responding to the client. LOCAL_Q Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within the local UORUM datacenter (requires NetworkTopologyStrategy) EACH_QU Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each datacenter ORUM (requires NetworkTopologyStrategy) ALL Ensure that the write is written to all N replicas before responding to the client. Any unresponsive replicas will fail the operation.
  • 16. Cassandra - Consistency Level - Read Level Behavior ANY Not supported. You probably want ONE instead. ONE Will return the record returned by the first replica to respond. A consistency check is always done in a background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent calls will have correct data even if the initial read gets an older value. (This is calledReadRepair) TWO Will query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas will be checked in the background. THREE Will query 3 replicas and return the record with the most recent timestamp. QUORUM Will query all replicas and return the record with the most recent timestamp once it has at least a majority of replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background. LOCAL_Q Returns the record with the most recent timestamp once a majority of replicas within the local UORUM datacenter have replied. EACH_QU Returns the record with the most recent timestamp once a majority of replicas within each ORUM datacenter have replied. ALL Will query all replicas and return the record with the most recent timestamp once all replicas have replied. Any unresponsive replicas will fail the operation.
  • 17. Cassandra - Node Internals ● Node comprises ○ commit log ■ list of pending writes ○ memtable ■ data written to system resident in memory ○ SSTables ■ per CF file containing persistent data ● Memtable writes when out of space, too many keys or after time period ● SSTables comprise of ○ Data - sorted strings ○ Index, Bloom Filter
  • 18. Cassandra - Node Internals ● Compaction occurs from time to time ○ cleans up SSTable ○ removes redundant rows ○ regenerates indexes
  • 19. Cassandra - Behaviour - Write ● Write properties: ○ No reads ○ No seeks ○ Fast! ○ Atomic within CF ○ Always writable
  • 20. Cassandra - Behaviour - Read ● Read Path: ○ Any node ○ Partitioner ○ Wait for R responses ○ Wait for N-R responses in background and perform read repair ● Read Properties: ○ Read multiple SSTables ○ Slower than writes (but stil fast) ○ Seeks can be mitigated with more RAM ○ Scales to billions of rows
  • 21. Cassandra - Gossip ● Gossip protocol used to relay information between nodes in cluster ● Proactive communications mechanism to share information ○ nodes proactively share what they know with random other nodes ● Token space information exchanged via gossip ● Failure detection based on gossip ○ heartbeat mechanism
  • 22. Thrift API - basic calls ● insert(key, column_parent, column, consistency_level) ○ key is row/keyspace identifier ○ column_parent is either column identifier ■ can be column name or super column idenfier ○ column is column data ● get(key, column_path, consistency_level) ○ returns a column corresponding to the key ● get_slice(key, column_parent, slice_predicate, consistency_level) ○ typically returns set of columns corresponding to key
  • 23. Thrift API - other operations ● get multiple rows ● delete row ● batch operations ○ important for speeding up system ○ can batch up mix of add, insert and delete operations ● keyspace and cluster management
  • 24. Denormalization ● Cassandra requires query oriented design ○ determine queries first, design data models accordingly ○ in contrast to standard RDBMS ■ normalize data at design time ■ construct arbitrary queries usually based on joins ● Quite fundamental difference in approach ○ typically results in quite different data models ● Common use of valueless columns ○ column name contains data ■ good for time series data ○ can have very many columns in given row
  • 25. Denormalization ● Standard SQL ○ SELECT * FROM USER WHERE CITY = 'Dublin' ● Typically create CF which groups users by city ○ row key is city identifer ○ columns are user IDs ● Can get UID of all users in given city by querying this CF ○ give city as row-key
  • 26. Other considerations... ● SuperColumnFamily ○ when it is useful? ● Multi data centre deployments ○ Cassandra can leverage topology to maximize resiliency ● Reaction to node failure ● Reconfiguration of system ○ introduction of new nodes into existing system ● It is a complex system with many working parts