SlideShare ist ein Scribd-Unternehmen logo
1 von 60
DATASTAX C*OLLEGE CREDIT:

AN INTRODUCTION TO
 APACHE CASSANDRA
                      Aaron Morton
Apache Cassandra Committer, Data Stax MVP for Apache Cassandra
                      @aaronmorton
                   www.thelastpickle.com


            Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
Overview
  The Cluster
The Data Model
    The API
Cassandra
  - Started at Facebook
  - Open sourced in 2008
  - Top Level Apache project
since 2010.
Used by...

   Netflix, Twitter,
 Reddit, Rackspace...
Inspiration
  - Google Big Table (2006)
  - Amazon Dynamo (2007)
Why Cassandra?
 - Scale
 - Operations
 - Data Model
Why Cassandra?
 Is My App a Good Fit for Apache Cassandra?
 Eric Lubow (CTO, SimpleReach)
 Wednesday October 24 @ 8:30AM PST

 http://www.datastax.com/resources/webinars/collegecredit
Overview
 The Cluster
The Data Model
   The API
Store ‘foo’ key with Replication Factor 3.
                              Node 1 - 'foo'




                     Node 4                    Node 2 - 'foo'




                              Node 3 - 'foo'
Consistent Hashing.
 - Evenly map keys to nodes
 - Minimise key movements
when nodes join or leave
Partitioner.
     RandomPartitioner
   transforms Keys to Tokens
           using MD5.
               (Default, there are others.)
Keys and Tokens?
    key     'fop'   'foo'




  token 0    10     90      99
Token Ring.
                          99   0
                  'foo'            'fop'
              token: 90            token: 10
Token Ranges.
                                   Node 1
                                   token: 0

                            76-0               1-25




                  Node 4                              Node 2
                token: 75                             token: 25




                                   Node 3
                                   token: 50
Locate Token Range.
                                              Node 1
                                              token: 0


                      'foo'
                      token: 90


                                    Node 4                Node 2
                                  token: 75               token: 25




                                              Node 3
                                              token: 50
Replication Strategy selects
Replication Factor number of
      nodes for a row.
SimpleStrategy with RF 3.
                                          Node 1
                                          token: 0


                  'foo'
                  token: 90


                                Node 4                Node 2
                              token: 75               token: 25




                                          Node 3
                                          token: 50
NetworkTopologyStrategy uses a
 Replication Factor per Data
           Centre.
            (Default.)
Multi DC Replication with RF 3 and RF 2.
                         Node 1                              Node 10
                         token: 0                            token: 1


 'foo'
 token: 90


               Node 4    West DC     Node 2       Node 40    East DC     Node 20
             token: 75               token: 25   token: 76               token: 26




                         Node 3                              Node 30
                         token: 50                           token: 51
The Snitch knows which Data
Centre and Rack the Node is
             in.
SimpleSnitch.
 Places all nodes in the same
        DC and Rack.
          (Default, there are others.)
PropertyFileSnitch.
 DC and Rack is specified per
   node via configuration.
EC2Snitch.
DC is set to AWS Region and
 a Rack to Availability Zone.
DynamicSnitch.
Re-orders nodes according to
their observed performance.
           (Wraps other snitch.)
The Client and the Coordinator.
                                            Node 1
                                            token: 0


                    'foo'
                    token: 90


                                  Node 4                Node 2
                                token: 75               token: 25




                                            Node 3
                    Client
                                            token: 50
Gossip.
Nodes share information with
a small number of neighbours.
Who share information with a
   small number of neigh..
Multi DC Client and the Coordinator.
                          Node 1                              Node 10
                          token: 0                            token: 1


  'foo'
  token: 90


                Node 4                Node 2       Node 40                Node 20
              token: 75               token: 25   token: 76               token: 26




                          Node 3                              Node 30
  Client
                          token: 50                           token: 51
Consistency Level (CL).
  - Specified for each request
  - Number of nodes to wait
for.
Consistency Level (CL)
  - Any*
  - One, Two Three
  - QUORUM
  - LOCAL_QUORUM, EACH_QUOURM*
QUOURM at Replication Factor...
   Replication
                 2 or 3   4 or 5   6 or 7
     Factor




   QUOURM          2        3        4
QUOURM at with RF3.
                                         Node 1
                                         token: 0


                 'foo'
                 token: 90


                               Node 4                Node 2
                             token: 75               token: 25




                                         Node 3
                 Client
                                         token: 50
Write ‘foo’ at QUOURM with Hinted Handoff.
                                             Node 1
                                             'foo'


                     'foo'
                     token: 90


                                  Node 4              Node 2
                              'foo' for #3            'foo'




                                             Node 3
                     Client
Read ‘foo’ at QUOURM.
                                       Node 1
                                       'foo'


                  'foo'
                  token: 90


                              Node 4            Node 2
                                                'foo'




                                       Node 3
                  Client
Consistency Level
nodes must agree.
Column Timestamps
 used to resolve
    differences.
Resolving differences.
    Column        Node 1           Node 2           Node 3
                    cromulent        cromulent
      purple                                         <missing>
                 (timestamp 10)   (timestamp 10)

                    embiggens        embiggens       debigulator
     monkey
                 (timestamp 10)   (timestamp 10)   (timestamp 5)

                     tomato           tomato           tomacco
    dishwasher
                 (timestamp 10)   (timestamp 10)   (timestamp 15)
Consistent read for ‘foo’ at QUOURM.
                    Node 1                                         Node 1



                   cromulent


                           cromulent
          Node 4                       Node 2            Node 4               Node 2

                   embiggins                                      cromulent
                                                    cromulent




 Client                                         Client
                    Node 3                                         Node 3
Strong Consistency

          W+R>N
  (#Write Nodes + #Read Nodes> Replication Factor)
Achieving Strong Consistency.
  - QUOURM Read + QUORUM Write
  - ALL Read + ONE Write
  - ONE Read + ALL Write
Eventual Consistency.

       W + R <= N
Achieving Consistency.
  - Hinted Handoff
  - Read Repair
  - Scheduled nodetool repair
Overview
  The Cluster
The Data Model
    The API
Data Model so far.


     Row Key:   Column        Column   Column


                  (Incomplete.)
Data Model.
                           Keyspace

               Column Family   Column Family   Column Family
                  Column          Column          Column
    Row Key:      Column          Column          Column
                  Column          Column          Column


      (Column Family and Table mean the same.)
Rows are the unit of
   replication.
The Column Family
   is the unit of
      storage.
Inside the Column Family.
                            Keyspace

                                Column Family
                Column: name, value, timestamp
     Row Key:   Column: name, value, timestamp
                Column: name, value, timestamp



                   (Also TTL Columns)
Basic Data Types
  - ASCII, UTF8
  - Integer, Long, Float, Double, Boolean
  - Date
  - UUID
  - Bytes
  - Counter*
Composite Data Types
   - Two or more Basic types
   - Ordered by each component
   - e.g. (IntegerType, UTF8) to hold
(timestamp, user_name)
Data Modelling.
  Data Modelling for Apache Cassandra
  Aaron Morton (Cassandra Committer)
  Wednesday November 7 @ 11AM PST

  http://www.datastax.com/resources/webinars/collegecredit
Overview
  The Cluster
The Data Model
   The API
The API.
  - Original Thrift based RPC
  - Declarative Cassandra Query Language
(CQL)
RPC via Python pycassa.

# pycassa - Python

>>> col_fam = pycassa.ColumnFamily(connection_pool,
'ColumnFamily1')

>>> col_fam.insert('row_key', {'col_name': 'col_val'})
RPC via Python pycassa...

# pycassa - Python

>>> col_fam.get('row_key')
{'col_name': 'col_val', 'col_name2': 'col_val2'}

>>> col_fam.multi_get(['row_key'], [‘col_name’])
{‘row_key’ : {'col_name': 'col_val'}}
RPC via Python pycassa...

# pycassa - Python

>>> col_fam.remove('row_key')

>>> col_fam.remove('row_key', [‘col_name’])
CQL.

# Cassandra Query Language (CQL)

INSERT INTO ColumnFamily1 (KEY, col_name) VALUES ('row_key',
'col_value');
CQL...

# Cassandra Query Language (CQL)

SELECT * FROM ColumnFamily1 IN (‘row_key_1’);

SELECT col_name FROM ColumnFamily1 WHERE KEY IN (‘row_key_1’,
‘row_key_2’);
CQL...

# Cassandra Query Language (CQL)

DELETE FROM ColumnFamily1 WHERE key IN ('row_key',);

DELETE col_name FROM ColumnFamily1 WHERE key = 'row_key';
Thanks.
Aaron Morton
                     @aaronmorton
                   www.thelastpickle.com




Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Weitere ähnliche Inhalte

Andere mochten auch

What is DataStax Enterprise?
What is DataStax Enterprise?What is DataStax Enterprise?
What is DataStax Enterprise?DataStax
 
durability, durability, durability
durability, durability, durabilitydurability, durability, durability
durability, durability, durabilityMatthew Dennis
 
DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarMatthew Dennis
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopPatricia Gorla
 
From rdbms to cassandra without a hitch
From rdbms to cassandra without a hitchFrom rdbms to cassandra without a hitch
From rdbms to cassandra without a hitchDuyhai Doan
 
Community Webinar: 15 Commandments of Cassandra DBAs
Community Webinar: 15 Commandments of Cassandra DBAsCommunity Webinar: 15 Commandments of Cassandra DBAs
Community Webinar: 15 Commandments of Cassandra DBAsDataStax
 
Cassandra Community Webinar | The World's Next Top Data Model
Cassandra Community Webinar | The World's Next Top Data ModelCassandra Community Webinar | The World's Next Top Data Model
Cassandra Community Webinar | The World's Next Top Data ModelDataStax
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireDataStax
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...DataStax
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016DataStax
 
How Do I Cassandra?
How Do I Cassandra?How Do I Cassandra?
How Do I Cassandra?Rick Branson
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...DataStax
 
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...DataStax
 
Visualising Data with Code
Visualising Data with CodeVisualising Data with Code
Visualising Data with CodeRi Liu
 

Andere mochten auch (20)

What is DataStax Enterprise?
What is DataStax Enterprise?What is DataStax Enterprise?
What is DataStax Enterprise?
 
durability, durability, durability
durability, durability, durabilitydurability, durability, durability
durability, durability, durability
 
DZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling WebinarDZone Cassandra Data Modeling Webinar
DZone Cassandra Data Modeling Webinar
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
From rdbms to cassandra without a hitch
From rdbms to cassandra without a hitchFrom rdbms to cassandra without a hitch
From rdbms to cassandra without a hitch
 
Community Webinar: 15 Commandments of Cassandra DBAs
Community Webinar: 15 Commandments of Cassandra DBAsCommunity Webinar: 15 Commandments of Cassandra DBAs
Community Webinar: 15 Commandments of Cassandra DBAs
 
Cassandra Community Webinar | The World's Next Top Data Model
Cassandra Community Webinar | The World's Next Top Data ModelCassandra Community Webinar | The World's Next Top Data Model
Cassandra Community Webinar | The World's Next Top Data Model
 
Cassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on FireCassandra Community Webinar | Data Model on Fire
Cassandra Community Webinar | Data Model on Fire
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
Using Spark to Load Oracle Data into Cassandra (Jim Hatcher, IHS Markit) | C*...
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
Data Modeling a Scheduling App (Adam Hutson, DataScale) | Cassandra Summit 2016
 
How Do I Cassandra?
How Do I Cassandra?How Do I Cassandra?
How Do I Cassandra?
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
What is in All of Those SSTable Files Not Just the Data One but All the Rest ...
 
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
Webinar - DataStax Enterprise 5.1: 3X the operational analytics speed, help f...
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Visualising Data with Code
Visualising Data with CodeVisualising Data with Code
Visualising Data with Code
 

Mehr von DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

Mehr von DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

C*ollege Credit: An Introduction to Apache Cassandra

  • 1. DATASTAX C*OLLEGE CREDIT: AN INTRODUCTION TO APACHE CASSANDRA Aaron Morton Apache Cassandra Committer, Data Stax MVP for Apache Cassandra @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  • 2. Overview The Cluster The Data Model The API
  • 3. Cassandra - Started at Facebook - Open sourced in 2008 - Top Level Apache project since 2010.
  • 4. Used by... Netflix, Twitter, Reddit, Rackspace...
  • 5. Inspiration - Google Big Table (2006) - Amazon Dynamo (2007)
  • 6. Why Cassandra? - Scale - Operations - Data Model
  • 7. Why Cassandra? Is My App a Good Fit for Apache Cassandra? Eric Lubow (CTO, SimpleReach) Wednesday October 24 @ 8:30AM PST http://www.datastax.com/resources/webinars/collegecredit
  • 8. Overview The Cluster The Data Model The API
  • 9. Store ‘foo’ key with Replication Factor 3. Node 1 - 'foo' Node 4 Node 2 - 'foo' Node 3 - 'foo'
  • 10. Consistent Hashing. - Evenly map keys to nodes - Minimise key movements when nodes join or leave
  • 11. Partitioner. RandomPartitioner transforms Keys to Tokens using MD5. (Default, there are others.)
  • 12. Keys and Tokens? key 'fop' 'foo' token 0 10 90 99
  • 13. Token Ring. 99 0 'foo' 'fop' token: 90 token: 10
  • 14. Token Ranges. Node 1 token: 0 76-0 1-25 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • 15. Locate Token Range. Node 1 token: 0 'foo' token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • 16. Replication Strategy selects Replication Factor number of nodes for a row.
  • 17. SimpleStrategy with RF 3. Node 1 token: 0 'foo' token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 token: 50
  • 18. NetworkTopologyStrategy uses a Replication Factor per Data Centre. (Default.)
  • 19. Multi DC Replication with RF 3 and RF 2. Node 1 Node 10 token: 0 token: 1 'foo' token: 90 Node 4 West DC Node 2 Node 40 East DC Node 20 token: 75 token: 25 token: 76 token: 26 Node 3 Node 30 token: 50 token: 51
  • 20. The Snitch knows which Data Centre and Rack the Node is in.
  • 21. SimpleSnitch. Places all nodes in the same DC and Rack. (Default, there are others.)
  • 22. PropertyFileSnitch. DC and Rack is specified per node via configuration.
  • 23. EC2Snitch. DC is set to AWS Region and a Rack to Availability Zone.
  • 24. DynamicSnitch. Re-orders nodes according to their observed performance. (Wraps other snitch.)
  • 25. The Client and the Coordinator. Node 1 token: 0 'foo' token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 Client token: 50
  • 26. Gossip. Nodes share information with a small number of neighbours. Who share information with a small number of neigh..
  • 27. Multi DC Client and the Coordinator. Node 1 Node 10 token: 0 token: 1 'foo' token: 90 Node 4 Node 2 Node 40 Node 20 token: 75 token: 25 token: 76 token: 26 Node 3 Node 30 Client token: 50 token: 51
  • 28. Consistency Level (CL). - Specified for each request - Number of nodes to wait for.
  • 29. Consistency Level (CL) - Any* - One, Two Three - QUORUM - LOCAL_QUORUM, EACH_QUOURM*
  • 30. QUOURM at Replication Factor... Replication 2 or 3 4 or 5 6 or 7 Factor QUOURM 2 3 4
  • 31. QUOURM at with RF3. Node 1 token: 0 'foo' token: 90 Node 4 Node 2 token: 75 token: 25 Node 3 Client token: 50
  • 32. Write ‘foo’ at QUOURM with Hinted Handoff. Node 1 'foo' 'foo' token: 90 Node 4 Node 2 'foo' for #3 'foo' Node 3 Client
  • 33. Read ‘foo’ at QUOURM. Node 1 'foo' 'foo' token: 90 Node 4 Node 2 'foo' Node 3 Client
  • 35. Column Timestamps used to resolve differences.
  • 36. Resolving differences. Column Node 1 Node 2 Node 3 cromulent cromulent purple <missing> (timestamp 10) (timestamp 10) embiggens embiggens debigulator monkey (timestamp 10) (timestamp 10) (timestamp 5) tomato tomato tomacco dishwasher (timestamp 10) (timestamp 10) (timestamp 15)
  • 37. Consistent read for ‘foo’ at QUOURM. Node 1 Node 1 cromulent cromulent Node 4 Node 2 Node 4 Node 2 embiggins cromulent cromulent Client Client Node 3 Node 3
  • 38. Strong Consistency W+R>N (#Write Nodes + #Read Nodes> Replication Factor)
  • 39. Achieving Strong Consistency. - QUOURM Read + QUORUM Write - ALL Read + ONE Write - ONE Read + ALL Write
  • 40. Eventual Consistency. W + R <= N
  • 41. Achieving Consistency. - Hinted Handoff - Read Repair - Scheduled nodetool repair
  • 42. Overview The Cluster The Data Model The API
  • 43. Data Model so far. Row Key: Column Column Column (Incomplete.)
  • 44. Data Model. Keyspace Column Family Column Family Column Family Column Column Column Row Key: Column Column Column Column Column Column (Column Family and Table mean the same.)
  • 45. Rows are the unit of replication.
  • 46. The Column Family is the unit of storage.
  • 47. Inside the Column Family. Keyspace Column Family Column: name, value, timestamp Row Key: Column: name, value, timestamp Column: name, value, timestamp (Also TTL Columns)
  • 48. Basic Data Types - ASCII, UTF8 - Integer, Long, Float, Double, Boolean - Date - UUID - Bytes - Counter*
  • 49. Composite Data Types - Two or more Basic types - Ordered by each component - e.g. (IntegerType, UTF8) to hold (timestamp, user_name)
  • 50. Data Modelling. Data Modelling for Apache Cassandra Aaron Morton (Cassandra Committer) Wednesday November 7 @ 11AM PST http://www.datastax.com/resources/webinars/collegecredit
  • 51. Overview The Cluster The Data Model The API
  • 52. The API. - Original Thrift based RPC - Declarative Cassandra Query Language (CQL)
  • 53. RPC via Python pycassa. # pycassa - Python >>> col_fam = pycassa.ColumnFamily(connection_pool, 'ColumnFamily1') >>> col_fam.insert('row_key', {'col_name': 'col_val'})
  • 54. RPC via Python pycassa... # pycassa - Python >>> col_fam.get('row_key') {'col_name': 'col_val', 'col_name2': 'col_val2'} >>> col_fam.multi_get(['row_key'], [‘col_name’]) {‘row_key’ : {'col_name': 'col_val'}}
  • 55. RPC via Python pycassa... # pycassa - Python >>> col_fam.remove('row_key') >>> col_fam.remove('row_key', [‘col_name’])
  • 56. CQL. # Cassandra Query Language (CQL) INSERT INTO ColumnFamily1 (KEY, col_name) VALUES ('row_key', 'col_value');
  • 57. CQL... # Cassandra Query Language (CQL) SELECT * FROM ColumnFamily1 IN (‘row_key_1’); SELECT col_name FROM ColumnFamily1 WHERE KEY IN (‘row_key_1’, ‘row_key_2’);
  • 58. CQL... # Cassandra Query Language (CQL) DELETE FROM ColumnFamily1 WHERE key IN ('row_key',); DELETE col_name FROM ColumnFamily1 WHERE key = 'row_key';
  • 60. Aaron Morton @aaronmorton www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n