Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Cassandra20141009

388 Aufrufe

Veröffentlicht am

Talk given at McGrow-Hill Financial in Oct 2014

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Cassandra20141009

  1. 1. Details And Data Modeling
  2. 2. Agenda  Quick Review Of Cassandra  New Developments In Cassandra  Basic Data Modeling Concepts  Materialized Views  Secondary Indexes  Counters  Time Series Data  Expiring Data 2
  3. 3. Cassandra High Level Cassandra's architecture is based on the combination of two technologies  Google BigTable – Data Model  Amazon Dynamo – Distributed Architecture  Cassandra = C* 3
  4. 4. Architecture Basics & Terminology  Nodes are single instances of C*  Cluster is a group of nodes  Data is organized by keys (tokens) which are distributed across the cluster  Replication Factor (rf) determines how many copies are key  Data Center Aware  Consistency Level – powerful feature to tune consistency vs speed vs availability.’ 4
  5. 5. C* Ring 5
  6. 6. More Architecture  Information on who has what data and who is available is transferred using gossip.  No single point of failure (SPF), every node can service requests.  Data Center Aware 6
  7. 7. CAP Theorem  Distributed Systems Law:  Consistency  Availability  Partition Tolerance (you can only really have two in a distributed system)  Cassandra is AP with Eventual Consistency 7
  8. 8. Consistency  Cassandra Uses the concept of Tunable Consistency, which make it very powerful and flexible for system needs. 8
  9. 9. C* Persistence Model 9
  10. 10. Read Path 10
  11. 11. Write Path 11
  12. 12. Data Model Architecture  Keyspace – container of column families (tables). Defines RF among others.  Table – column family. Contains definition of schema.  Row – a “record” identified by a key  Column - a key and a value 12
  13. 13. 13
  14. 14. Keys  Primary Key  Partition Key – identifies a row  Cluster Key – sorting within a row  Using CQL these are defined together as a compound (composite) key  Compound keys are how you implement “wide rows” which we will look at a lot! 14
  15. 15. Single Primary Key create table users ( user_id UUID PRIMARY KEY, firstname text, lastname text, emailaddres text ); ** Cassandra Data Types http://www.datastax.com/documentation/cql/3.0/cql/cql _reference/cql_data_types_c.html 15
  16. 16. Compound Key create table users ( emailaddress text, department text, firstname text, lastname text, PRIMARY KEY (emailaddress, department) );  Partition Key plus Cluster Key  emailaddress is partition key  department is cluster key 16
  17. 17. Compound Key create table users ( emailaddress text, department text, country text, firstname text, lastname text, PRIMARY KEY ((emailaddress, department), country) );  Partition Key plus Cluster Key  Emailaddress & department is partition key  country is cluster key 17
  18. 18. Deletions  Distributed systems present unique problem for deletes. If it actually deleted data and a node was down and didn’t receive the delete notice it would try and create record when came back online. So…  Tombstone - The data is replaced with a special value called a Tombstone, works within distributed architecture 18
  19. 19. New Rules  Writes Are Cheap  Denormalize All You Need  Model Your Queries, Not Data (understand access patterns)  Application Worries About Joins 19
  20. 20. What’s New In 2.0 Conditional DDL IF Exists or If Not Exists Drop Column Support ALTER TABLE users DROP lastname; 20
  21. 21. More New Stuff  Triggers CREATE TRIGGER myTrigger ON myTable USING 'com.thejavaexperts.cassandra.updateevt'  Lightweight Transactions (CAS) UPDATE users SET firstname = 'tim' WHERE emailaddress = 'tpeters@example.com' IF firstname = 'tom'; ** Not like an ACID Transaction!! 21
  22. 22. CAS & Transactions  CAS - compare-and-set operations. In a single, atomic operation compares a value of a column in the database and applying a modification depending on the result of the comparison.  Consider performance hit. CAS is (was) considered an anti-pattern. 22
  23. 23. Data Modeling… The Basics  Cassandra now is very familiar to RDBMS/SQL users.  Very nicely hides the underlying data storage model.  Still have all the power of Cassandra, it is all in the key definition. RDBMS = model data Cassandra = model access (queries) 23
  24. 24. Side-Note On Querying  Create table with compound key  Select using ALLOW FILTERING  Counts  Select using IN or = 24
  25. 25. Batch Operations  Saves Network Roundtrips  Can contain INSERT, UPDATE, DELETE  Atomic by default (all or nothing)  Can use timestamp for specific ordering 25
  26. 26. Batch Operation Example BEGIN BATCH INSERT INTO users (emailaddress, firstname, lastname, country) values ('brian.enochson@gmail.com', 'brian', 'enochson', 'USA'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('tpeters@example.com', 'tom', 'peters', 'DE'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('jsmith@example.com', 'jim', 'smith', 'USA'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('arogers@example.com', 'alan', 'rogers', 'USA'); DELETE FROM users WHERE emailaddress = 'jsmith@example.com'; APPLY BATCH;  select in cqlsh  List in cassandra-cli with timestamp 26
  27. 27. More Data Modeling…  No Joins  No Foreign Keys  No Third (or any other) Normal Form Concerns  Redundant Data Encouraged. Apps maintain consistency. 27
  28. 28. Secondary Indexes  Allow defining indexes to allow other access than partition key.  Each node has a local index for its data.  They have uses, but shouldn’t be used all the time without consideration.  We will look at alternatives. 28
  29. 29. Secondary Index Example  Create a table  Try to select with column not in PK  Add Secondary Index  Try select again. 29
  30. 30. When to use?  Low Cardinality – small number of unique values  High Cardinality – high number of distinct values  Secondary Indexes are good for Low Cardinality. So country codes, department codes etc. Not email addresses. 30
  31. 31. Materialized View  Want full distribution can use what is called a Materialized View pattern.  Remember redundant data is fine.  Model the queries 31
  32. 32. Materialized View Example  Show normal able with compound key and querying limitations  Create Materialized View Table With Different Compound Key, support alternate access.  Selects use partition key.  Secondary indexes local, not distributed  Allow filtering. Can cause performance issues 32
  33. 33. Counters  Updated in 2.1 and now work in a more distributed and accurate manner.  Table organization, example  How to update, view etc. 33
  34. 34. Time Series Example….  Time series table model.  Need to consider interval for event frequency and wide row size.  Make what is tracked by time and unit of interval partition key. 34
  35. 35. Time Series Data  Due to its quick writing model Cassandra is suited for storing time series data.  The Cassandra wide row is a perfect fit for modeling time series / time based events.  Let’s look at an example…. 35
  36. 36. Event Data  Notice primary key and cluster key.  Insert some data  View in CQL, then in CLI as wide row 36
  37. 37. TTL – Self Expiring Data  Another technique is data that has a defined lifespan.  For instance session identifiers, temporary passwords etc.  For this Cassandra provides a Time To Live (TTL) mechanism. 37
  38. 38. TTL Example…  Create table  Insert data using TTL  Can update specific column with table  Show using selects. 38
  39. 39. Questions  Email: brian.enochson@gmail.com  Twitter: @benochso  G+: https://plus.google.com/+BrianEnochson 39

×