SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Cassandra FTW


   Andrew Byde
 Principal Scientist
Menu

• Introduction
• Data model + storage architecture
• Partitioning + replication
• Consistency
• De-normalisation
History + design
History

• 2007: Started at Facebook for inbox search
• July 2008: Open sourced by Facebook
• March 2009: Apache Incubator
• February 2010: Apache top-level project
• May 2011:Version 0.8
What it’s good for

• Horizontal scalability
• No single-point of failure -- symmetric
• Multi-data centre support
• Very high write workloads
• Tuneable consistency -- per operation
What it’s not so good for

• Transactions
• Read heavy workloads
• Low latency applications
 •   compared to in-memory dbs
Data model
Keyspaces and Column Families
     SQL                                     Cassandra

  Database   row/key col_1    col_2
                                              Keyspace
                row/key col_1     col_1
                    row/   col_1    col_1


   Table                                    Column Family
Column Family

rowkey: {
  column: value,
  column: value,
  ...
 }

        ...every value is timestamped
Super Column Family
 rowkey: {
  supercol: {
      column: value,
      column: value,
      ...
     }
     supercol: {
      column: value,
      column: value,
      ...
     }
   }
Rows and columns
       col1   col2   col3   col4   col5   col6   col7
row1           x                    x      x
row2    x      x      x      x      x
row3           x      x             x      x      x
row4           x      x      x             x
row5           x             x      x      x
row6           x
row7    x      x             x
Reads
• get
• get_slice          One row, some cols
 • name predicate
 • slice range
• multiget_slice     Multiple rows
• get_range_slices
get
       col1   col2   col3   col4   col5   col6   col7
row1           x                    x      x
row2    x      x      x      x      x
row3           x      x             x      x      x
row4           x      x      x             x
row5           x             x      x      x
row6           x
row7    x      x             x
get_slice: name predicate
       col1   col2   col3   col4   col5   col6   col7
row1           x                    x      x
row2    x      x      x      x      x
row3           x      x             x      x      x
row4           x      x      x             x
row5           x             x      x      x
row6           x
row7    x      x             x
get_slice: slice range
       col1   col2   col3   col4   col5   col6   col7
row1           x                    x      x
row2    x      x      x      x      x
row3    x      x      x             x      x      x
row4           x      x      x             x
row5           x             x      x      x
row6           x
row7    x      x             x
multiget_slice: name
       predicate
       col1   col2   col3   col4   col5   col6   col7
row1           x                    x      x
row2    x      x      x      x      x
row3           x      x             x      x      x
row4           x      x      x             x
row5           x             x      x      x
row6           x
row7    x      x             x
get_range_slices: slice range
         col1   col2   col3   col4   col5   col6   col7
  row1           x                    x      x
  row2    x      x      x      x      x
  row3           x      x             x      x      x
  row4           x      x      x             x
  row5           x             x      x      x
  row6           x
  row7    x      x             x
Storage
architecture
Data Layout
                      writes
                         key-value insert
    on-disk
un-ordered
commit log                                         in-memory
...                                              (key,col)-sorted
                                                    memtable
                             flush
              on-disk        01001101110101000   01001101110101000



          (key,col)-sorted                                           ...
              SSTables
Data Layout
                  SSTables


                   SSTable
Bloom Filter        01001101110101000



   Index
    Data
Data Layout
              reads
                     ?



 01001101110101000       01001101110101000   010011011101010001111010101001
Data Layout
              reads
                     ?


           X             X
 01001101110101000       01001101110101000   010011011101010001111010101001
Distribution:

Partitioning +
 Replication
Partitioning + Replication



(k, v)
         ?
Partitioning + Replication
• Partitioning data on to nodes
 • load balancing
 • row-based
• Replication
 • to protect against failure
 • better availability
Partitioning
• Random: take hash of row key
 •   good for load balancing

 •   bad for range queries

• Ordered: subdivide key space
 •   bad for load balancing

 •   good for range queries

• Or build your own...
Simple Replication



(k, v)




           Nodes arranged on a ‘ring’
Simple Replication
                     Primary location




(k, v)




           Nodes arranged on a ‘ring’
Simple Replication
                     Primary location




(k, v)                              Extra copies
                                   are successors
                                     on the ring


           Nodes arranged on a ‘ring’
Topology-aware
           Replication
• Snitch : node IP      (DataCenter, rack)

• EC2Snitch
  •   Region   DC; availability_zone   rack

• PropertyFileSnitch
  •   Configured from a file
Topology-aware
  Replication
               DC 1     DC 2




 (k, v)


          r1      r2   r1   r2
Topology-aware
  Replication
               DC 1     DC 2




 (k, v)


          r1      r2   r1   r2
Topology-aware
                 Replication
                              DC 1     DC 2
extra copies
to different
data center

                (k, v)


                         r1      r2   r1   r2
Topology-aware
                 Replication
                               DC 1     DC 2
extra copies
to different
data center

                 (k, v)

spread across
racks within a            r1      r2   r1   r2
 data center
Distribution:

Consistency
Consistency Level
• How many replicas must respond in order to
  declare success
• W/N must succeed for write to succeed
 •   write with client-generated timestamp

• R/N must succeed for read to succeed
 •   return most recent, by timestamp

• Tuneable per request
Consistency Level

• 1, 2, 3 responses
• Quorum (more than half)
• Quorum in local data center
• Quorum in each data center
Maintaining consistency

• Read repair
• Hinted handoff
• Anti-entropy
Read repair
• If the replicas disagree on read, send most
  recent data back

                     n1

   read k?           n2

                     n3
Read repair
• If the replicas disagree on read, send most
  recent data back

                     n1      v, t1

   read k?           n2      not found!

                     n3      v’, t2
Read repair
• If the replicas disagree on read, send most
  recent data back

                     n1      v, t1

                     n2      not found!

   user              n3      v’, t2
Read repair
• If the replicas disagree on read, send most
  recent data back

                     n1

                     n2

                     n3      write (k, v’, t2)
Hinted handoff

• When a node is unavailable
• Writes can be written to any node as a hint
• Delivered when the node comes back
  online
Anti-entropy

• Equivalent to ‘read repair all’
• Requires reading all data (woah)
    •   (Although only hashes are sent to calculate diffs)

•        Manual process
De-normalisation
De-normalisation

• Disk space is much cheaper than disk seeks
• Read at 100 MB/s, seek at 100 IO/s
• => copy data to avoid seeks
Inbox query
                         user2

        user1     msg1
                         user3
                  msg2


                  msg3   user4
                   ...



Q? inbox for
   user3
Data-centric model
   m1: {
     sender: user1
     content: “Mary had a little lamb”
     recipients: user2, user3
   }


• but how to do ‘recipients’ for Inbox?
• one-to-many modelled by a join table
To join
m1: {                                        user2: {
  sender: user1                                m1: true
  subject: “A rhyme”
  content: “Mary had a little lamb”          }
}                                            user3: {
m2: {
  sender: user1                                m1: true
  subject: “colours”                           m2: true
  content: “Its fleece was white as snow”
}                                            }
m3: {                                        user4: {
  sender: user1
  subject: “loyalty”                           m2: true
  content: “And everywhere that Mary went”     m3: true
}
                                             }
.. or not to join
• Joins are expensive, so de-normalise to trade
  off space for time
• We can have lots of columns, so think BIG:
• Make message id a time-typed super-column.
• This makes get_slice an efficient way of
  searching for messages in a time window
Super Column Family
     user2: {
       m1: {
         sender: user1
         subject: “A rhyme”
       }
     }
     user3: {
       m1: {
         sender: user1
         subject: “A rhyme”
       }
       m2: {
         sender: user1
         subject: “colours”
       }
     }
     ...
De-normalisation +
         Cassandra
• have to write a copy of the record for each
  recipient ... but writes are very cheap
• get_slice fetches columns for a particular
  row, so gets received messages for a user
• on-disk column order is optimal for this
  query
Conclusion
What it’s good for

• Horizontal scalability
• No single-point of failure -- symmetric
• Multi-data centre support
• Very high write workloads
• Tuneable consistency -- per operation
Q?

Weitere ähnliche Inhalte

Ähnlich wie Cassandra deep-dive @ NoSQLNow!

Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
2011.06.20 stratified-btree
2011.06.20 stratified-btree2011.06.20 stratified-btree
2011.06.20 stratified-btreeAcunu
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Cassandra: Two data centers and great performance
Cassandra: Two data centers and great performanceCassandra: Two data centers and great performance
Cassandra: Two data centers and great performanceDATAVERSITY
 
Amir Salihefendic: Redis - the hacker's database
Amir Salihefendic: Redis - the hacker's databaseAmir Salihefendic: Redis - the hacker's database
Amir Salihefendic: Redis - the hacker's databaseit-people
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Ted Dunning
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Boris Yen
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache CassandraJacky Chu
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency modelsrogerbodamer
 
Playing Go with Clojure
Playing Go with ClojurePlaying Go with Clojure
Playing Go with Clojureztellman
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: CassandraAcunu
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with CassandraRyan King
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupMichael Wynholds
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 

Ähnlich wie Cassandra deep-dive @ NoSQLNow! (20)

Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
2011.06.20 stratified-btree
2011.06.20 stratified-btree2011.06.20 stratified-btree
2011.06.20 stratified-btree
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Cassandra: Two data centers and great performance
Cassandra: Two data centers and great performanceCassandra: Two data centers and great performance
Cassandra: Two data centers and great performance
 
Amir Salihefendic: Redis - the hacker's database
Amir Salihefendic: Redis - the hacker's databaseAmir Salihefendic: Redis - the hacker's database
Amir Salihefendic: Redis - the hacker's database
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
 
Talk About Apache Cassandra
Talk About Apache CassandraTalk About Apache Cassandra
Talk About Apache Cassandra
 
Thoughts on consistency models
Thoughts on consistency modelsThoughts on consistency models
Thoughts on consistency models
 
Playing Go with Clojure
Playing Go with ClojurePlaying Go with Clojure
Playing Go with Clojure
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Paris Data Geeks
Paris Data GeeksParis Data Geeks
Paris Data Geeks
 

Mehr von Acunu

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinAcunu
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsAcunu
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu
 
All Your Base
All Your BaseAll Your Base
All Your BaseAcunu
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraAcunu
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonAcunu
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time CassandraAcunu
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Acunu
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with CassandraAcunu
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your businessAcunu
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraAcunu
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraAcunu
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsAcunu
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation CassandraAcunu
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Acunu
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixAcunu
 

Mehr von Acunu (20)

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
 
All Your Base
All Your BaseAll Your Base
All Your Base
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX London
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
 

Kürzlich hochgeladen

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Cassandra deep-dive @ NoSQLNow!

  • 1. Cassandra FTW Andrew Byde Principal Scientist
  • 2. Menu • Introduction • Data model + storage architecture • Partitioning + replication • Consistency • De-normalisation
  • 4. History • 2007: Started at Facebook for inbox search • July 2008: Open sourced by Facebook • March 2009: Apache Incubator • February 2010: Apache top-level project • May 2011:Version 0.8
  • 5. What it’s good for • Horizontal scalability • No single-point of failure -- symmetric • Multi-data centre support • Very high write workloads • Tuneable consistency -- per operation
  • 6. What it’s not so good for • Transactions • Read heavy workloads • Low latency applications • compared to in-memory dbs
  • 8. Keyspaces and Column Families SQL Cassandra Database row/key col_1 col_2 Keyspace row/key col_1 col_1 row/ col_1 col_1 Table Column Family
  • 9. Column Family rowkey: { column: value, column: value, ... } ...every value is timestamped
  • 10. Super Column Family rowkey: { supercol: { column: value, column: value, ... } supercol: { column: value, column: value, ... } }
  • 11. Rows and columns col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x
  • 12. Reads • get • get_slice One row, some cols • name predicate • slice range • multiget_slice Multiple rows • get_range_slices
  • 13. get col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x
  • 14. get_slice: name predicate col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x
  • 15. get_slice: slice range col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x x row4 x x x x row5 x x x x row6 x row7 x x x
  • 16. multiget_slice: name predicate col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x
  • 17. get_range_slices: slice range col1 col2 col3 col4 col5 col6 col7 row1 x x x row2 x x x x x row3 x x x x x row4 x x x x row5 x x x x row6 x row7 x x x
  • 19. Data Layout writes key-value insert on-disk un-ordered commit log in-memory ... (key,col)-sorted memtable flush on-disk 01001101110101000 01001101110101000 (key,col)-sorted ... SSTables
  • 20. Data Layout SSTables SSTable Bloom Filter 01001101110101000 Index Data
  • 21. Data Layout reads ? 01001101110101000 01001101110101000 010011011101010001111010101001
  • 22. Data Layout reads ? X X 01001101110101000 01001101110101000 010011011101010001111010101001
  • 25. Partitioning + Replication • Partitioning data on to nodes • load balancing • row-based • Replication • to protect against failure • better availability
  • 26. Partitioning • Random: take hash of row key • good for load balancing • bad for range queries • Ordered: subdivide key space • bad for load balancing • good for range queries • Or build your own...
  • 27. Simple Replication (k, v) Nodes arranged on a ‘ring’
  • 28. Simple Replication Primary location (k, v) Nodes arranged on a ‘ring’
  • 29. Simple Replication Primary location (k, v) Extra copies are successors on the ring Nodes arranged on a ‘ring’
  • 30. Topology-aware Replication • Snitch : node IP (DataCenter, rack) • EC2Snitch • Region DC; availability_zone rack • PropertyFileSnitch • Configured from a file
  • 31. Topology-aware Replication DC 1 DC 2 (k, v) r1 r2 r1 r2
  • 32. Topology-aware Replication DC 1 DC 2 (k, v) r1 r2 r1 r2
  • 33. Topology-aware Replication DC 1 DC 2 extra copies to different data center (k, v) r1 r2 r1 r2
  • 34. Topology-aware Replication DC 1 DC 2 extra copies to different data center (k, v) spread across racks within a r1 r2 r1 r2 data center
  • 36. Consistency Level • How many replicas must respond in order to declare success • W/N must succeed for write to succeed • write with client-generated timestamp • R/N must succeed for read to succeed • return most recent, by timestamp • Tuneable per request
  • 37. Consistency Level • 1, 2, 3 responses • Quorum (more than half) • Quorum in local data center • Quorum in each data center
  • 38. Maintaining consistency • Read repair • Hinted handoff • Anti-entropy
  • 39. Read repair • If the replicas disagree on read, send most recent data back n1 read k? n2 n3
  • 40. Read repair • If the replicas disagree on read, send most recent data back n1 v, t1 read k? n2 not found! n3 v’, t2
  • 41. Read repair • If the replicas disagree on read, send most recent data back n1 v, t1 n2 not found! user n3 v’, t2
  • 42. Read repair • If the replicas disagree on read, send most recent data back n1 n2 n3 write (k, v’, t2)
  • 43. Hinted handoff • When a node is unavailable • Writes can be written to any node as a hint • Delivered when the node comes back online
  • 44. Anti-entropy • Equivalent to ‘read repair all’ • Requires reading all data (woah) • (Although only hashes are sent to calculate diffs) • Manual process
  • 46. De-normalisation • Disk space is much cheaper than disk seeks • Read at 100 MB/s, seek at 100 IO/s • => copy data to avoid seeks
  • 47. Inbox query user2 user1 msg1 user3 msg2 msg3 user4 ... Q? inbox for user3
  • 48. Data-centric model m1: { sender: user1 content: “Mary had a little lamb” recipients: user2, user3 } • but how to do ‘recipients’ for Inbox? • one-to-many modelled by a join table
  • 49. To join m1: { user2: { sender: user1 m1: true subject: “A rhyme” content: “Mary had a little lamb” } } user3: { m2: { sender: user1 m1: true subject: “colours” m2: true content: “Its fleece was white as snow” } } m3: { user4: { sender: user1 subject: “loyalty” m2: true content: “And everywhere that Mary went” m3: true } }
  • 50. .. or not to join • Joins are expensive, so de-normalise to trade off space for time • We can have lots of columns, so think BIG: • Make message id a time-typed super-column. • This makes get_slice an efficient way of searching for messages in a time window
  • 51. Super Column Family user2: { m1: { sender: user1 subject: “A rhyme” } } user3: { m1: { sender: user1 subject: “A rhyme” } m2: { sender: user1 subject: “colours” } } ...
  • 52. De-normalisation + Cassandra • have to write a copy of the record for each recipient ... but writes are very cheap • get_slice fetches columns for a particular row, so gets received messages for a user • on-disk column order is optimal for this query
  • 54. What it’s good for • Horizontal scalability • No single-point of failure -- symmetric • Multi-data centre support • Very high write workloads • Tuneable consistency -- per operation
  • 55.
  • 56. Q?

Hinweis der Redaktion

  1. We provide Cassandra training and support and the Acunu Data Platform, high performance storage software that incorporates Cassandra.  Come and talk to us if you want to know more.  We have an ebook to give away to those that want to dive into Cassandra details.\nYou've probably heard about 'eventual consistency / scale out / de-norm … I'm going to explain what they mean.\n\n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. but... Tables fixed structure, described in a schema. \nColumns much more flexible; no fixed schema in the RDBMS sense; little structure. \nAdd a column whenever you want. \nDon't need the same columns in each row, etc etc.\n
  11. * two-level map\n* everything in Cassandra has a timestamp which is used to help with consistency. \n* You might use your own timestamp as a key but you don't normally do anything with the internal timestamps.\n* (Of course this means your clocks need to be reasonably accurate, so you can tell people they need to use NTP).\n\n
  12. * three-level map\n
  13. * three level map\n
  14. \n
  15. * sparse\n* up to 2 billion rows\n* ... but big rows are a problem (repair etc done based on row)\n* on a single node, data sorted by row key\n
  16. * Queries are all key based. I.e. the ‘WHERE’ is all on key, the above differ in the SELECT * \n
  17. \n
  18. \n
  19. \n
  20. * note that the predicate is on NAME -- can’t do ‘WHERE col3=x’ with this\n
  21. \n
  22. \n
  23. * memtable default is skip list\n* background compaction of SSTables\n* BENEFIT IS SEQUENTIAL WRITES\n
  24. * data is sorted, key then value\n* compactions are streaming, hence efficient\n\n
  25. * reads go everywhere in parallel\n* Bloom filters are per-row, so help with get_slice but not multi-row range queries\n
  26. \n
  27. Amazon Dynamo\nconnect to any node in the cluster\nnodes talk to one another using a p2p protocol called ‘gossip’ -- entirely symmetric.\n\n
  28. \n
  29. \n
  30. Hash ring based: keys are hashed; regions of hash output space are claimed by nodes\n
  31. Hash ring based: keys are hashed; regions of hash output space are claimed by nodes\n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. PER REQUEST\n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. * merkel trees\n
  50. * at scale you have to optimise for queries\n* de-normalisation not specific to cassandra\n\n
  51. * de-normalisation not specific to cassandra\n* but it’s well suited because writes are relatively cheap, and little infrastructure for queries\n
  52. get inbox for user 3\n
  53. \n
  54. * extra table holding recipient -> msg\n* have to a point query per message to show the inbox for a user\n\n
  55. \n
  56. * note, content not duplicated, only subject -- row would become too large\n* columns need to be ordered by time decreasing -- custom comparator\n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n