SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Rethinking Topology in Cassandra


                            ApacheCon North America
                                February 28, 2013



                                   Eric Evans
                               eevans@acunu.com
                                  @jericevans


Thursday, February 28, 13                             1
DHT 101



Thursday, February 28, 13             2
DHT 101
                             partitioning
                                Z   A




Thursday, February 28, 13                   3
DHT 101
                                    partitioning



                                Z                  A


                            Y                          B


                                                   C



Thursday, February 28, 13                                  4
DHT 101
                                    partitioning



                                Z                  A


                            Y       Key = Aaa          B


                                                   C



Thursday, February 28, 13                                  5
DHT 101
                                    replica placement



                                Z                       A


                            Y         Key = Aaa             B


                                                        C



Thursday, February 28, 13                                       6
DHT 101
                                       consistency




                        Consistency
                        Availability
                        Partition tolerance


Thursday, February 28, 13                            7
DHT 101
                            scenario: consistency level = one


                                 A
                                                                W

                                      ?



                                  ?



Thursday, February 28, 13                                           8
DHT 101
                            scenario: consistency level = all


                                 A
                                                                R

                                      ?



                                  ?



Thursday, February 28, 13                                           9
DHT 101
                              scenario: quorum write


                              A
                                                       W

                    R+W > N         B



                                ?



Thursday, February 28, 13                                  10
DHT 101
                              scenario: quorum read


                              ?



                    R+W > N        B


                                                      R
                               C



Thursday, February 28, 13                                 11
Awesome, yes?




Thursday, February 28, 13                   12
Well...




Thursday, February 28, 13             13
Problem:
                            Poor load distribution




Thursday, February 28, 13                            14
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        15
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        16
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        17
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        18
Distributing Load

                                 Z       A


                             Y               B


                                         C
                                     M

Thursday, February 28, 13                        19
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         20
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         21
Distributing Load

                                 Z       A
                                         A1


                             Y                B


                                         C
                                     M

Thursday, February 28, 13                         22
Problem:
                            Poor data distribution




Thursday, February 28, 13                            23
Distributing Data
                                    A



                                          C
                             D




                                    B

Thursday, February 28, 13                       24
Distributing Data
                                     A
                                 E


                                          C
                             D




                                     B

Thursday, February 28, 13                       25
Distributing Data
                                      A   A
                                  E


                                              C
                             D
                                              C
                              D


                                      B B

Thursday, February 28, 13                         26
Distributing Data
                                      A   A
                                  E


                                              C
                             D
                                              C
                              D


                                      B B

Thursday, February 28, 13                         27
Distributing Data
                                     A
                                 H       E


                                             C
                             D


                                 G       F
                                     B

Thursday, February 28, 13                        28
Distributing Data
                                     A
                                 H       E


                                             C
                             D


                                 G       F
                                     B

Thursday, February 28, 13                        29
Virtual Nodes



Thursday, February 28, 13                   30
In a nutshell...

               host


                                               host


               host



Thursday, February 28, 13                             31
Benefits
                     • Operationally simpler (no token
                            management)
                     •      Better distribution of load
                     •      Concurrent streaming involving all hosts
                     •      Smaller partitions mean greater reliability
                     •      Supports heterogenous hardware


Thursday, February 28, 13                                                 32
Strategies

                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment



Thursday, February 28, 13                           33
Strategy
                                        Automatic Sharding



                     • Partitions are split when data exceeds a
                            threshold
                     • Newly created partitions are relocated to a
                            host with lower data load
                     • Similar to sharding performed by Bigtable,
                            or Mongo auto-sharding



Thursday, February 28, 13                                            34
Strategy
                                     Fixed Partition Assignment

                     • Namespace divided into Q evenly-sized
                            partitions
                     • Q/N partitions assigned per host (where N
                            is the number of hosts)
                     • Joining hosts “steal” partitions evenly from
                            existing hosts.
                     • Used by Dynamo and Voldemort (described
                            in Dynamo paper as “strategy 3”)


Thursday, February 28, 13                                             35
Strategy
                                    Random Token Assignment



                     • Each host assigned T random tokens
                     • T random tokens generated for joining
                            hosts; New tokens divide existing ranges
                     • Similar to libketama; Identical to Classic
                            Cassandra when T=1



Thursday, February 28, 13                                              36
Considerations

                     1. Number of partitions
                     2. Partition size
                     3. How 1 changes with more nodes and data
                     4. How 2 changes with more nodes and data




Thursday, February 28, 13                                        37
Evaluating
                            Strategy         No. Partitions   Partition size

                            Random                  O(N)         O(B/N)


                             Fixed                  O(1)          O(B)


                      Auto-sharding                 O(B)          O(1)

                B ~ total data size, N ~ number of hosts


Thursday, February 28, 13                                                      38
Evaluating
                     • Automatic sharding
                       • partition size constant (great)
                       • number of partitions scales linearly with
                            data size (bad)
                     • Fixed partition assignment
                     • Random token assignment

Thursday, February 28, 13                                            39
Evaluating
                     •      Automatic sharding
                     •      Fixed partition assignment
                            •   Number of partitions is constant (good)
                            •   Partition size scales linearly with data size
                                (bad)
                            •   Higher operational complexity (bad)
                     •      Random token assignment


Thursday, February 28, 13                                                       40
Evaluating
                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment
                       • Number of partitions scales linearly with
                              number of hosts (good ok)
                            • Partition size increases with more data;
                              decreases with more hosts (good)


Thursday, February 28, 13                                                41
Evaluating


                     • Automatic sharding
                     • Fixed partition assignment
                     • Random token assignment



Thursday, February 28, 13                           42
Cassandra



Thursday, February 28, 13               43
Configuration
                              conf/cassandra.yaml


                # Comma separated list of tokens,
                # (new installs only).
                initial_token:<token>,<token>,<token>

                or

                # Number of tokens to generate.
                num_tokens: 256



Thursday, February 28, 13                               44
Configuration
                                    nodetool info

       Token           :    (invoke with -T/--tokens to see all 256 tokens)
       ID              :    64090651-6034-41d5-bfc6-ddd24957f164
       Gossip active   :    true
       Thrift active   :    true
       Load            :    92.69 KB
       Generation No   :    1351030018
       Uptime (seconds):    45
       Heap Memory (MB):    95.16 / 1956.00
       Data Center     :    datacenter1
       Rack            :    rack1
       Exceptions      :    0
       Key Cache       :    size 240 (bytes), capacity 101711872 (bytes ...
       Row Cache       :    size 0 (bytes), capacity 0 (bytes), 0 hits, ...




Thursday, February 28, 13                                                     45
Configuration
                                              nodetool ring
       Datacenter: datacenter1
       ==========
       Replicas: 2

       Address              Rack    Status State    Load         Owns     Token
                                                                          9022770486425350384
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -9182469192098976078
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -9054823614314102214
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8970752544645156769
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8927190060345427739
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8880475677109843259
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8817876497520861779
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8810512134942064901
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8661764562509480261
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8641550925069186492
       127.0.0.1            rack1   Up     Normal   97.24   KB   66.03%   -8636224350654790732
       ...
       ...




Thursday, February 28, 13                                                                        46
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               47
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               48
Configuration
                                   nodetool status




       Datacenter: datacenter1
       =======================
       Status=Up/Down
       |/ State=Normal/Leaving/Joining/Moving
       -- Address   Load    Tokens Owns   Host ID                               Rack
       UN 10.0.0.1 97.2 KB 256     66.0% 64090651-6034-41d5-bfc6-ddd24957f164   rack1
       UN 10.0.0.2 92.7 KB 256     66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c   rack1
       UN 10.0.0.3 92.6 KB 256     67.7% e4eef159-cb77-4627-84c4-14efbc868082   rack1




Thursday, February 28, 13                                                               49
Migration
                                    A




                            C               B




Thursday, February 28, 13                       50
Migration
                            edit conf/cassandra.yaml and restart




                # Number of tokens to generate.
                num_tokens: 256




Thursday, February 28, 13                                          51
Migration
                            convert to T contiguous tokens in existing ranges

                                             A AA
                                        A AA               B




                                                               A
                                       A




                                                                AA
                                      A
                                     A




                                                                 AA A
                                    A
                                    A
                                    A




                                                                AAA AA
                                      A A
                                            A




                                                           C
                                           A
                                          A



                                         A
                                         A

                                        A
                                        A
                                        A
                                        A




Thursday, February 28, 13                                                       52
Migration
                                      shuffle

                                     A AA
                                A AA           B




                                                   A
                               A




                                                    AA
                              A
                             A




                                                    AA A
                            A
                            A
                            A




                                                    AAA AA
                             A A
                                   A




                                               C
                                  A
                                 A



                                A
                                A

                               A
                               A
                               A
                               A




Thursday, February 28, 13                                    53
Shuffle

                     • Range transfers are queued on each host
                     • Hosts initiate transfer of ranges to self
                     • Pay attention to the logs!


Thursday, February 28, 13                                          54
Shuffle
                                          bin/shuffle
       Usage: shuffle [options] <sub-command>

       Sub-commands:
        create              Initialize a new shuffle operation
        ls                  List pending relocations
        clear               Clear pending relocations
        en[able]            Enable shuffling
        dis[able]           Disable shuffling

       Options:
        -dc, --only-dc              Apply only to named DC (create only)
        -tp, --thrift-port          Thrift port number (Default: 9160)
        -p,   --port                JMX port number (Default: 7199)
        -tf, --thrift-framed        Enable framed transport for Thrift (Default: false)
        -en, --and-enable           Immediately enable shuffling (create only)
        -H,   --help                Print help information
        -h,   --host                JMX hostname or IP address (Default: localhost)
        -th, --thrift-host          Thrift hostname or IP address (Default: JMX host)




Thursday, February 28, 13                                                                 55
Performance



Thursday, February 28, 13                 56
removenode
                  400


                  300


                  200


                 100


                     0
                            Cassandra 1.2   Cassandra 1.1




Thursday, February 28, 13                                   57
bootstrap
                  500


                  375


                 250


                 125


                     0
                            Cassandra 1.2   Cassandra 1.1




Thursday, February 28, 13                                   58
The End
          • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan
               Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan
               Sivasubramanian, Peter Vosshall and Werner Vogels “Dynamo: Amazon’s
               Highly Available Key-value Store” Web.

          • Low, Richard. “Improving Cassandra's uptime with virtual nodes” Web.
          • Overton, Sam. “Virtual Nodes Strategies.” Web.
          • Overton, Sam. “Virtual Nodes: Performance Results.” Web.
          • Jones, Richard. "libketama - a consistent hashing algo for memcache
               clients” Web.



Thursday, February 28, 13                                                            59

Weitere ähnliche Inhalte

Andere mochten auch

Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseEric Evans
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced CassandraEric Evans
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Eric Evans
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - TrifactaVictor Coustenoble
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)Eric Evans
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architectureT Jake Luciani
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3Eric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraEric Evans
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In CassandraEric Evans
 
It's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDIt's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDEric Evans
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkVictor Coustenoble
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in CassandraEric Evans
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsEran Chinthaka Withana
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 

Andere mochten auch (20)

Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 
Castle enhanced Cassandra
Castle enhanced CassandraCastle enhanced Cassandra
Castle enhanced Cassandra
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Webinar Degetel DataStax
Webinar Degetel DataStaxWebinar Degetel DataStax
Webinar Degetel DataStax
 
Webinaire Business&Decision - Trifacta
Webinaire  Business&Decision - TrifactaWebinaire  Business&Decision - Trifacta
Webinaire Business&Decision - Trifacta
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
DataStax Enterprise BBL
DataStax Enterprise BBLDataStax Enterprise BBL
DataStax Enterprise BBL
 
CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)CQL In Cassandra 1.0 (and beyond)
CQL In Cassandra 1.0 (and beyond)
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
Virtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in CassandraVirtual Nodes: Rethinking Topology in Cassandra
Virtual Nodes: Rethinking Topology in Cassandra
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In Cassandra
 
It's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRDIt's not you, it's me: Ending a 15 year relationship with RRD
It's not you, it's me: Ending a 15 year relationship with RRD
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
 
Time series storage in Cassandra
Time series storage in CassandraTime series storage in Cassandra
Time series storage in Cassandra
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and Clouds
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
 

Mehr von Eric Evans

Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLEric Evans
 
NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?Eric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraEric Evans
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
An Introduction To Cassandra
An Introduction To CassandraAn Introduction To Cassandra
An Introduction To CassandraEric Evans
 
Cassandra In A Nutshell
Cassandra In A NutshellCassandra In A Nutshell
Cassandra In A NutshellEric Evans
 

Mehr von Eric Evans (9)

Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQL
 
NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?NoSQL Yes, But YesCQL, No?
NoSQL Yes, But YesCQL, No?
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
An Introduction To Cassandra
An Introduction To CassandraAn Introduction To Cassandra
An Introduction To Cassandra
 
Cassandra In A Nutshell
Cassandra In A NutshellCassandra In A Nutshell
Cassandra In A Nutshell
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Rethinking Topology in Cassandra with Virtual Nodes

  • 1. Rethinking Topology in Cassandra ApacheCon North America February 28, 2013 Eric Evans eevans@acunu.com @jericevans Thursday, February 28, 13 1
  • 3. DHT 101 partitioning Z A Thursday, February 28, 13 3
  • 4. DHT 101 partitioning Z A Y B C Thursday, February 28, 13 4
  • 5. DHT 101 partitioning Z A Y Key = Aaa B C Thursday, February 28, 13 5
  • 6. DHT 101 replica placement Z A Y Key = Aaa B C Thursday, February 28, 13 6
  • 7. DHT 101 consistency Consistency Availability Partition tolerance Thursday, February 28, 13 7
  • 8. DHT 101 scenario: consistency level = one A W ? ? Thursday, February 28, 13 8
  • 9. DHT 101 scenario: consistency level = all A R ? ? Thursday, February 28, 13 9
  • 10. DHT 101 scenario: quorum write A W R+W > N B ? Thursday, February 28, 13 10
  • 11. DHT 101 scenario: quorum read ? R+W > N B R C Thursday, February 28, 13 11
  • 14. Problem: Poor load distribution Thursday, February 28, 13 14
  • 15. Distributing Load Z A Y B C M Thursday, February 28, 13 15
  • 16. Distributing Load Z A Y B C M Thursday, February 28, 13 16
  • 17. Distributing Load Z A Y B C M Thursday, February 28, 13 17
  • 18. Distributing Load Z A Y B C M Thursday, February 28, 13 18
  • 19. Distributing Load Z A Y B C M Thursday, February 28, 13 19
  • 20. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 20
  • 21. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 21
  • 22. Distributing Load Z A A1 Y B C M Thursday, February 28, 13 22
  • 23. Problem: Poor data distribution Thursday, February 28, 13 23
  • 24. Distributing Data A C D B Thursday, February 28, 13 24
  • 25. Distributing Data A E C D B Thursday, February 28, 13 25
  • 26. Distributing Data A A E C D C D B B Thursday, February 28, 13 26
  • 27. Distributing Data A A E C D C D B B Thursday, February 28, 13 27
  • 28. Distributing Data A H E C D G F B Thursday, February 28, 13 28
  • 29. Distributing Data A H E C D G F B Thursday, February 28, 13 29
  • 31. In a nutshell... host host host Thursday, February 28, 13 31
  • 32. Benefits • Operationally simpler (no token management) • Better distribution of load • Concurrent streaming involving all hosts • Smaller partitions mean greater reliability • Supports heterogenous hardware Thursday, February 28, 13 32
  • 33. Strategies • Automatic sharding • Fixed partition assignment • Random token assignment Thursday, February 28, 13 33
  • 34. Strategy Automatic Sharding • Partitions are split when data exceeds a threshold • Newly created partitions are relocated to a host with lower data load • Similar to sharding performed by Bigtable, or Mongo auto-sharding Thursday, February 28, 13 34
  • 35. Strategy Fixed Partition Assignment • Namespace divided into Q evenly-sized partitions • Q/N partitions assigned per host (where N is the number of hosts) • Joining hosts “steal” partitions evenly from existing hosts. • Used by Dynamo and Voldemort (described in Dynamo paper as “strategy 3”) Thursday, February 28, 13 35
  • 36. Strategy Random Token Assignment • Each host assigned T random tokens • T random tokens generated for joining hosts; New tokens divide existing ranges • Similar to libketama; Identical to Classic Cassandra when T=1 Thursday, February 28, 13 36
  • 37. Considerations 1. Number of partitions 2. Partition size 3. How 1 changes with more nodes and data 4. How 2 changes with more nodes and data Thursday, February 28, 13 37
  • 38. Evaluating Strategy No. Partitions Partition size Random O(N) O(B/N) Fixed O(1) O(B) Auto-sharding O(B) O(1) B ~ total data size, N ~ number of hosts Thursday, February 28, 13 38
  • 39. Evaluating • Automatic sharding • partition size constant (great) • number of partitions scales linearly with data size (bad) • Fixed partition assignment • Random token assignment Thursday, February 28, 13 39
  • 40. Evaluating • Automatic sharding • Fixed partition assignment • Number of partitions is constant (good) • Partition size scales linearly with data size (bad) • Higher operational complexity (bad) • Random token assignment Thursday, February 28, 13 40
  • 41. Evaluating • Automatic sharding • Fixed partition assignment • Random token assignment • Number of partitions scales linearly with number of hosts (good ok) • Partition size increases with more data; decreases with more hosts (good) Thursday, February 28, 13 41
  • 42. Evaluating • Automatic sharding • Fixed partition assignment • Random token assignment Thursday, February 28, 13 42
  • 44. Configuration conf/cassandra.yaml # Comma separated list of tokens, # (new installs only). initial_token:<token>,<token>,<token> or # Number of tokens to generate. num_tokens: 256 Thursday, February 28, 13 44
  • 45. Configuration nodetool info Token : (invoke with -T/--tokens to see all 256 tokens) ID : 64090651-6034-41d5-bfc6-ddd24957f164 Gossip active : true Thrift active : true Load : 92.69 KB Generation No : 1351030018 Uptime (seconds): 45 Heap Memory (MB): 95.16 / 1956.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 Key Cache : size 240 (bytes), capacity 101711872 (bytes ... Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, ... Thursday, February 28, 13 45
  • 46. Configuration nodetool ring Datacenter: datacenter1 ========== Replicas: 2 Address Rack Status State Load Owns Token 9022770486425350384 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9182469192098976078 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -9054823614314102214 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8970752544645156769 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8927190060345427739 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8880475677109843259 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8817876497520861779 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8810512134942064901 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8661764562509480261 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8641550925069186492 127.0.0.1 rack1 Up Normal 97.24 KB 66.03% -8636224350654790732 ... ... Thursday, February 28, 13 46
  • 47. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 47
  • 48. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 48
  • 49. Configuration nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.0.0.1 97.2 KB 256 66.0% 64090651-6034-41d5-bfc6-ddd24957f164 rack1 UN 10.0.0.2 92.7 KB 256 66.2% b3c3b03c-9202-4e7b-811a-9de89656ec4c rack1 UN 10.0.0.3 92.6 KB 256 67.7% e4eef159-cb77-4627-84c4-14efbc868082 rack1 Thursday, February 28, 13 49
  • 50. Migration A C B Thursday, February 28, 13 50
  • 51. Migration edit conf/cassandra.yaml and restart # Number of tokens to generate. num_tokens: 256 Thursday, February 28, 13 51
  • 52. Migration convert to T contiguous tokens in existing ranges A AA A AA B A A AA A A AA A A A A AAA AA A A A C A A A A A A A A Thursday, February 28, 13 52
  • 53. Migration shuffle A AA A AA B A A AA A A AA A A A A AAA AA A A A C A A A A A A A A Thursday, February 28, 13 53
  • 54. Shuffle • Range transfers are queued on each host • Hosts initiate transfer of ranges to self • Pay attention to the logs! Thursday, February 28, 13 54
  • 55. Shuffle bin/shuffle Usage: shuffle [options] <sub-command> Sub-commands: create Initialize a new shuffle operation ls List pending relocations clear Clear pending relocations en[able] Enable shuffling dis[able] Disable shuffling Options: -dc, --only-dc Apply only to named DC (create only) -tp, --thrift-port Thrift port number (Default: 9160) -p, --port JMX port number (Default: 7199) -tf, --thrift-framed Enable framed transport for Thrift (Default: false) -en, --and-enable Immediately enable shuffling (create only) -H, --help Print help information -h, --host JMX hostname or IP address (Default: localhost) -th, --thrift-host Thrift hostname or IP address (Default: JMX host) Thursday, February 28, 13 55
  • 57. removenode 400 300 200 100 0 Cassandra 1.2 Cassandra 1.1 Thursday, February 28, 13 57
  • 58. bootstrap 500 375 250 125 0 Cassandra 1.2 Cassandra 1.1 Thursday, February 28, 13 58
  • 59. The End • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels “Dynamo: Amazon’s Highly Available Key-value Store” Web. • Low, Richard. “Improving Cassandra's uptime with virtual nodes” Web. • Overton, Sam. “Virtual Nodes Strategies.” Web. • Overton, Sam. “Virtual Nodes: Performance Results.” Web. • Jones, Richard. "libketama - a consistent hashing algo for memcache clients” Web. Thursday, February 28, 13 59