SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Rethinking Topology in Cassandra


                            ApacheCon North America
                                February 28, 2013



                                   Eric Evans
                               eevans@acunu.com
                                  @jericevans


Thursday, February 28, 13                             1
DHT 101



Thursday, February 28, 13             2
DHT 101
                             partitioning
                                Z   A




Thursday, February 28, 13                   3
DHT 101
                                    partitioning



                                Z                  A


                            Y                          B


                                                   C



Thursday, February 28, 13                                  4
DHT 101
                                    partitioning



                                Z                  A


                            Y       Key = Aaa          B


                                                   C



Thursday, February 28, 13                                  5
DHT 101
                                    replica placement



                                Z                       A


                            Y         Key = Aaa             B


                                                        C



Thursday, February 28, 13                                       6
DHT 101
                                       consistency




                        Consistency
                        Availability
                        Partition tolerance


Thursday, February 28, 13                            7
DHT 101
                            scenario: consistency level = one


                                 A
                                                                W

                                      ?



                                  ?



Thursday, February 28, 13                                           8
DHT 101
                            scenario: consistency level = all


                                 A
                                                                R

                                      ?



                                  ?



Thursday, February 28, 13                                           9
DHT 101
                              scenario: quorum write


                              A
                                                       W

                    R+W > N         B



                                ?



Thursday, February 28, 13                                  10
DHT 101
                              scenario: quorum read


                              ?



                    R+W > N        B


                                                      R
                               C



Thursday, February 28, 13                                 11
Awesome, yes?




Thursday, February 28, 13                   12
Well...




Thursday, February 28, 13             13
Problem:
                            Poor load distribution




Thursday, February 28, 13                            14
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        15
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        16
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        17
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        18
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        19
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         20
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         21
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         22
Problem:
                            Poor data distribution




Thursday, February 28, 13                            23
Distributing Data
                                    A



                                          C
                             D




                                    B

Thursday, February 28, 13                       24
Distributing Data
                                     A
                                 E


                                          C
                             D




                                     B

Thursday, February 28, 13                       25
Distributing Data
                                      A   A
                                  E


                                              C
                             D
                                              C
                              D


                                      B B

Thursday, February 28, 13                         26
Distributing Data
                                      A   A
                                  E


                                              C
                             D
                                              C
                              D


                                      B B

Thursday, February 28, 13                         27
Distributing Data
                                     A
                                 H       E


                                             C
                             D


                                 G       F
                                     B

Thursday, February 28, 13                        28
Distributing Data
                                     A
                                 H       E


                                             C
                             D


                                 G       F
                                     B

Thursday, February 28, 13                        29
Virtual Nodes



Thursday, February 28, 13                   30
In a nutshell...

               host


                                               host


               host



Thursday, February 28, 13                             31
Benefits
                     • Operationally simpler (no token
                            management)
                     •      Better distribution of load
                     •      Concurrent streaming involving all hosts
                     •      Smaller partitions mean greater reliability
                     •      Supports heterogenous hardware


Thursday, February 28, 13                                                 32
Strategies

                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment



Thursday, February 28, 13                           33
Strategy
                                        Automatic Sharding



                     • Partitions are split when data exceeds a
                            threshold
                     • Newly created partitions are relocated to a
                            host with lower data load
                     • Similar to sharding performed by Bigtable,
                            or Mongo auto-sharding



Thursday, February 28, 13                                            34
Strategy
                                     Fixed Partition Assignment

                     • Namespace divided into Q evenly-sized
                            partitions
                     • Q/N partitions assigned per host (where N
                            is the number of hosts)
                     • Joining hosts “steal” partitions evenly from
                            existing hosts.
                     • Used by Dynamo and Voldemort (described
                            in Dynamo paper as “strategy 3”)


Thursday, February 28, 13                                             35
Strategy
                                    Random Token Assignment



                     • Each host assigned T random tokens
                     • T random tokens generated for joining
                            hosts; New tokens divide existing ranges
                     • Similar to libketama; Identical to Classic
                            Cassandra when T=1



Thursday, February 28, 13                                              36
Considerations

                     1. Number of partitions
                     2. Partition size
                     3. How 1 changes with more nodes and data
                     4. How 2 changes with more nodes and data




Thursday, February 28, 13                                        37
Evaluating
                            Strategy         No. Partitions   Partition size

                            Random                  O(N)         O(B/N)


                             Fixed                  O(1)          O(B)


                      Auto-sharding                 O(B)          O(1)

                B ~ total data size, N ~ number of hosts


Thursday, February 28, 13                                                      38
Evaluating
                     • Automatic sharding
                       • partition size constant (great)
                       • number of partitions scales linearly with
                            data size (bad)
                     • Fixed partition assignment
                     • Random token assignment

Thursday, February 28, 13                                            39
Evaluating
                     •      Automatic sharding
                     •      Fixed partition assignment
                            •   Number of partitions is constant (good)
                            •   Partition size scales linearly with data size
                                (bad)
                            •   Higher operational complexity (bad)
                     •      Random token assignment


Thursday, February 28, 13                                                       40
Evaluating
                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment
                       • Number of partitions scales linearly with
                              number of hosts (good ok)
                            • Partition size increases with more data;
                              decreases with more hosts (good)


Thursday, February 28, 13                                                41
Evaluating


                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment



Thursday, February 28, 13                           42
Cassandra



Thursday, February 28, 13               43
Configuration
                              conf/cassandra.yaml


                # Comma separated list of tokens,
                # (new installs only).
                initial_token:<token>,<token>,<token>

                or

                # Number of tokens to generate.
                num_tokens: 256



Thursday, February 28, 13                               44
Configuration
                                    nodetool info

       Token           :    (invoke with -T/--tokens to see all 256 tokens)
       ID              :    64090651-6034-41d5-bfc6-ddd24957f164
       Gossip active   :    true
       Thrift active   :    true
       Load            :    92.69 KB
       Generation No   :    1351030018
       Uptime (seconds):    45
       Heap Memory (MB):    95.16 / 1956.00
       Data Center     :    datacenter1
       Rack            :    rack1
       Exceptions      :    0
       Key Cache       :    size 240 (bytes), capacity 101711872 (bytes ...
       Row Cache       :    size 0 (bytes), capacity 0 (bytes), 0 hits, ...




Thursday, February 28, 13                                                     45
Configuration
                                              nodetool ring
       Datacenter: datacenter1
       ==========
       Replicas: 2

       Address              Rack    Status State    Load         Owns     Token
                                                                          9022770486425350384
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -9182469192098976078
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -9054823614314102214
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8970752544645156769
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8927190060345427739
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8880475677109843259
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8817876497520861779
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8810512134942064901
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8661764562509480261
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8641550925069186492
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8636224350654790732
       ...
       ...




Thursday, February 28, 13                                                                        46
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               47
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               48
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               49
Migration
                                    A




                            C               B




Thursday, February 28, 13                       50
Migration
                            edit conf/cassandra.yaml and restart




                # Number of tokens to generate.
                num_tokens: 256




Thursday, February 28, 13                                          51
Migration
                            convert to T contiguous tokens in existing ranges

                                             A AA
                                        A AA               B




                                                               A
                                       A




                                                                AA
                                      A
                                     A




                                                                 AA A
                                    A
                                    A
                                    A




                                                                AAA AA
                                      A A
                                            A




                                                           C
                                           A
                                          A



                                         A
                                         A

                                        A
                                        A
                                        A
                                        A




Thursday, February 28, 13                                                       52
Migration
                                      shuffle

                                     A AA
                                A AA           B




                                                   A
                               A




                                                    AA
                              A
                             A




                                                    AA A
                            A
                            A
                            A




                                                    AAA AA
                             A A
                                   A




                                               C
                                  A
                                 A



                                A
                                A

                               A
                               A
                               A
                               A




Thursday, February 28, 13                                    53
Shuffle

                     • Range transfers are queued on each host
                     • Hosts initiate transfer of ranges to self
                     • Pay attention to the logs!


Thursday, February 28, 13                                          54
Shuffle
                                          bin/shuffle
       Usage: shuffle [options] <sub-command>

       Sub-commands:
        create              Initialize a new shuffle operation
        ls                  List pending relocations
        clear               Clear pending relocations
        en[able]            Enable shuffling
        dis[able]           Disable shuffling

       Options:
        -dc, --only-dc              Apply only to named DC (create only)
        -tp, --thrift-port          Thrift port number (Default: 9160)
        -p,   --port                JMX port number (Default: 7199)
        -tf, --thrift-framed        Enable framed transport for Thrift (Default: false)
        -en, --and-enable           Immediately enable shuffling (create only)
        -H,   --help                Print help information
        -h,   --host                JMX hostname or IP address (Default: localhost)
        -th, --thrift-host          Thrift hostname or IP address (Default: JMX host)




Thursday, February 28, 13                                                                 55
Performance



Thursday, February 28, 13                 56
removenode
                  400


                  300


                  200


                 100


                     0
                            Cassandra 1.2   Cassandra 1.1




Thursday, February 28, 13                                   57
bootstrap
                  500


                  375


                 250


                 125


                     0
                            Cassandra 1.2   Cassandra 1.1




Thursday, February 28, 13                                   58
The End
          • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan
               Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan
               Sivasubramanian, Peter Vosshall and Werner Vogels “Dynamo: Amazon’s
               Highly Available Key-value Store” Web.

          • Low, Richard. “Improving Cassandra's uptime with virtual nodes” Web.
          • Overton, Sam. “Virtual Nodes Strategies.” Web.
          • Overton, Sam. “Virtual Nodes: Performance Results.” Web.
          • Jones, Richard. "libketama - a consistent hashing algo for memcache
               clients” Web.



Thursday, February 28, 13                                                            59

Weitere ähnliche Inhalte

Andere mochten auch

Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced CassandraEric Evans
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Eric Evans
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - TrifactaVictor Coustenoble
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)Eric Evans
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3Eric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In CassandraEric Evans
 
It's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDIt's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDEric Evans
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkVictor Coustenoble
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in CassandraEric Evans
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsEran Chinthaka Withana
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 

Andere mochten auch (20)

Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Webinar Degetel DataStax
Webinar Degetel DataStaxWebinar Degetel DataStax
Webinar Degetel DataStax
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - Trifacta
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
DataStax Enterprise BBL
DataStax Enterprise BBLDataStax Enterprise BBL
DataStax Enterprise BBL
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In Cassandra
 
It's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDIt's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRD
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in Cassandra
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and Clouds
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
 

Mehr von Eric Evans

Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLEric Evans
 
NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?Eric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraEric Evans
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
An Introduction To Cassandra
An Introduction To CassandraAn Introduction To Cassandra
An Introduction To CassandraEric Evans
 
Cassandra In A Nutshell
Cassandra In A NutshellCassandra In A Nutshell
Cassandra In A NutshellEric Evans
 

Mehr von Eric Evans (9)

Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQL
 
NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
An Introduction To Cassandra
An Introduction To CassandraAn Introduction To Cassandra
An Introduction To Cassandra
 
Cassandra In A Nutshell
Cassandra In A NutshellCassandra In A Nutshell
Cassandra In A Nutshell
 

Kürzlich hochgeladen

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Rethinking Topology In Cassandra (ApacheCon NA)

  • 1. Rethinking Topology in Cassandra ApacheCon North America February 28, 2013 Eric Evans eevans@acunu.com @jericevans Thursday, February 28, 13 1
  • 3. DHT 101 partitioning Z A Thursday, February 28, 13 3
  • 4. DHT 101 partitioning Z A Y B C Thursday, February 28, 13 4
  • 5. DHT 101 partitioning Z A Y Key = Aaa B C Thursday, February 28, 13 5
  • 6. DHT 101 replica placement Z A Y Key = Aaa B C Thursday, February 28, 13 6
  • 7. DHT 101 consistency Consistency Availability Partition tolerance Thursday, February 28, 13 7
  • 8. DHT 101 scenario: consistency level = one A W ? ? Thursday, February 28, 13 8
  • 9. DHT 101 scenario: consistency level = all A R ? ? Thursday, February 28, 13 9
  • 10. DHT 101 scenario: quorum write A W R+W > N B ? Thursday, February 28, 13 10
  • 11. DHT 101 scenario: quorum read ? R+W > N B R C Thursday, February 28, 13 11
  • 14. Problem: Poor load distribution Thursday, February 28, 13 14
  • 15. Distributing Load Z A Y B C M Thursday, February 28, 13 15
  • 16. Distributing Load Z A Y B C M Thursday, February 28, 13 16
  • 17. Distributing Load Z A Y B C M Thursday, February 28, 13 17
  • 18. Distributing Load Z A Y B C M Thursday, February 28, 13 18
  • 19. Distributing Load Z A Y B C M Thursday, February 28, 13 19
  • 20. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 20
  • 21. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 21
  • 22. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 22
  • 23. Problem: Poor data distribution Thursday, February 28, 13 23
  • 24. Distributing Data A C D B Thursday, February 28, 13 24
  • 25. Distributing Data A E C D B Thursday, February 28, 13 25
  • 26. Distributing Data A A E C D C D B B Thursday, February 28, 13 26
  • 27. Distributing Data A A E C D C D B B Thursday, February 28, 13 27
  • 28. Distributing Data A H E C D G F B Thursday, February 28, 13 28
  • 29. Distributing Data A H E C D G F B Thursday, February 28, 13 29
  • 31. In a nutshell... host host host Thursday, February 28, 13 31
  • 32. Benefits • Operationally simpler (no token management) • Better distribution of load • Concurrent streaming involving all hosts • Smaller partitions mean greater reliability • Supports heterogenous hardware Thursday, February 28, 13 32
  • 33. Strategies • Automatic sharding • Fixed partition assignment • Random token assignment Thursday, February 28, 13 33
  • 34. Strategy Automatic Sharding • Partitions are split when data exceeds a threshold • Newly created partitions are relocated to a host with lower data load • Similar to sharding performed by Bigtable, or Mongo auto-sharding Thursday, February 28, 13 34
  • 35. Strategy Fixed Partition Assignment • Namespace divided into Q evenly-sized partitions • Q/N partitions assigned per host (where N is the number of hosts) • Joining hosts “steal” partitions evenly from existing hosts. • Used by Dynamo and Voldemort (described in Dynamo paper as “strategy 3”) Thursday, February 28, 13 35
  • 36. Strategy Random Token Assignment • Each host assigned T random tokens • T random tokens generated for joining hosts; New tokens divide existing ranges • Similar to libketama; Identical to Classic Cassandra when T=1 Thursday, February 28, 13 36
  • 37. Considerations 1. Number of partitions 2. Partition size 3. How 1 changes with more nodes and data 4. How 2 changes with more nodes and data Thursday, February 28, 13 37
  • 38. Evaluating Strategy No. Partitions Partition size Random O(N) O(B/N) Fixed O(1) O(B) Auto-sharding O(B) O(1) B ~ total data size, N ~ number of hosts Thursday, February 28, 13 38
  • 39. Evaluating • Automatic sharding • partition size constant (great) • number of partitions scales linearly with data size (bad) • Fixed partition assignment • Random token assignment Thursday, February 28, 13 39
  • 40. Evaluating • Automatic sharding • Fixed partition assignment • Number of partitions is constant (good) • Partition size scales linearly with data size (bad) • Higher operational complexity (bad) • Random token assignment Thursday, February 28, 13 40
  • 41. Evaluating • Automatic sharding • Fixed partition assignment • Random token assignment • Number of partitions scales linearly with number of hosts (good ok) • Partition size increases with more data; decreases with more hosts (good) Thursday, February 28, 13 41
  • 42. Evaluating • Automatic sharding • Fixed partition assignment • Random token assignment Thursday, February 28, 13 42
  • 44. Configuration conf/cassandra.yaml # Comma separated list of tokens, # (new installs only). initial_token:<token>,<token>,<token> or # Number of tokens to generate. num_tokens: 256 Thursday, February 28, 13 44
  • 45. Configuration nodetool info Token : (invoke with -T/--tokens to see all 256 tokens) ID : 64090651-6034-41d5-bfc6-ddd24957f164 Gossip active : true Thrift active : true Load : 92.69 KB Generation No : 1351030018 Uptime (seconds): 45 Heap Memory (MB): 95.16 / 1956.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 Key Cache : size 240 (bytes), capacity 101711872 (bytes ... Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, ... Thursday, February 28, 13 45
  • 46. Configuration nodetool ring Datacenter: datacenter1 ========== Replicas: 2 Address Rack Status State Load Owns Token 9022770486425350384 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9182469192098976078 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9054823614314102214 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8970752544645156769 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8927190060345427739 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8880475677109843259 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8817876497520861779 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8810512134942064901 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8661764562509480261 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8641550925069186492 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8636224350654790732 ... ... Thursday, February 28, 13 46
  • 47. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 47
  • 48. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 48
  • 49. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 49
  • 50. Migration A C B Thursday, February 28, 13 50
  • 51. Migration edit conf/cassandra.yaml and restart # Number of tokens to generate. num_tokens: 256 Thursday, February 28, 13 51
  • 52. Migration convert to T contiguous tokens in existing ranges A AA A AA B A A AA A A AA A A A A AAA AA A A A C A A A A A A A A Thursday, February 28, 13 52
  • 53. Migration shuffle A AA A AA B A A AA A A AA A A A A AAA AA A A A C A A A A A A A A Thursday, February 28, 13 53
  • 54. Shuffle • Range transfers are queued on each host • Hosts initiate transfer of ranges to self • Pay attention to the logs! Thursday, February 28, 13 54
  • 55. Shuffle bin/shuffle Usage: shuffle [options] <sub-command> Sub-commands: create Initialize a new shuffle operation ls List pending relocations clear Clear pending relocations en[able] Enable shuffling dis[able] Disable shuffling Options: -dc, --only-dc Apply only to named DC (create only) -tp, --thrift-port Thrift port number (Default: 9160) -p, --port JMX port number (Default: 7199) -tf, --thrift-framed Enable framed transport for Thrift (Default: false) -en, --and-enable Immediately enable shuffling (create only) -H, --help Print help information -h, --host JMX hostname or IP address (Default: localhost) -th, --thrift-host Thrift hostname or IP address (Default: JMX host) Thursday, February 28, 13 55
  • 57. removenode 400 300 200 100 0 Cassandra 1.2 Cassandra 1.1 Thursday, February 28, 13 57
  • 58. bootstrap 500 375 250 125 0 Cassandra 1.2 Cassandra 1.1 Thursday, February 28, 13 58
  • 59. The End • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels “Dynamo: Amazon’s Highly Available Key-value Store” Web. • Low, Richard. “Improving Cassandra's uptime with virtual nodes” Web. • Overton, Sam. “Virtual Nodes Strategies.” Web. • Overton, Sam. “Virtual Nodes: Performance Results.” Web. • Jones, Richard. "libketama - a consistent hashing algo for memcache clients” Web. Thursday, February 28, 13 59