Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
2013 © Trivadis
BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA...
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Ca...
2013 © Trivadis
History of Databases
1960s File-based, Network (CODASYL) and Hierarchical Databases
1970s Relational Datab...
2013 © Trivadis
What‘s wrong with Relational Databases ?
• SQL provides a rich, declarative query language
• Database enfo...
2013 © Trivadis
Relational Databases are great ... But!
New trends
Big Data
Concurrency
Connectivity
Diversity
P2P Knowled...
2013 © Trivadis
Relational Databases are great ... But!
Problem: Complex Object Graphs
Object/Relational impedance mismatc...
2013 © Trivadis
Relational Databases are great ... But!
Problem: Schema evolution
Adding attributes to an object => have t...
2013 © Trivadis
Relational Databases are great ... But!
Problem: Semi-structured data
Relational schema doesn‘t easily han...
2013 © Trivadis
RDBMS
Database
Relational Databases are great ... But!
Problem: Scaling
Scaling writes difficult/expensive...
2013 © Trivadis
So, what’s Wrong With RDBMS?
• Many programmers are already
familiar with it.
• Transactions and ACID make...
2013 © Trivadis
Solution: NoSQL ?
No standard definition of what NoSQL means
• Not Only SQL and not No SQL
• Not only rela...
2013 © Trivadis
Use Cases for NoSQL
• Massive write performance.
• Fast key value look ups.
• Flexible schema and data typ...
2013 © Trivadis
Brewer's CAP Theorem
Any networked shared-data system can have at most two of the three
desirable properti...
2013 © Trivadis
Data Store Positioning
January 2016
Architecture et modèle de données Cassandra
14
Scalability
Standardize...
2013 © Trivadis
Polyglot Persistence
In 2006, Neal Ford coined the term Polyglot
Programming
 Applications should be writ...
2013 © Trivadis
Polyglot Persistence
Today we use the same
database for all kind of data
• Business transactions, session
...
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Ca...
2013 © Trivadis
Definition of Cassandra
Apache Cassandra™ is a free
• Distributed…
• High performance…
• Extremely scalabl...
2013 © Trivadis
History of Cassandra
January 2016
Architecture et modèle de données Cassandra
19
Bigtable Dynamo
2013 © Trivadis
Architecture Overview
Cassandra was designed with the understanding that system/hardware
failures can and ...
2013 © Trivadis
Big Data Scalability
• Capable of comfortably scaling to petabytes
• New nodes = Linear performance increa...
2013 © Trivadis
Who is using Cassandra?
January 2016
Architecture et modèle de données Cassandra
22
Largest publicly known...
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Ca...
2013 © Trivadis
Why Cassandra?
Tunable data consistency
Flexible schema design
Data Compression
CQL language (like SQL)
Su...
2013 © Trivadis
Cassandra Use Cases
Product Catalog / Playlists
Personalization
• Ads
• Recommendations
• Ratings
Fraud De...
2013 © Trivadis
DataStax Enterprise Edition (DSE)
January 2016
Architecture et modèle de données Cassandra
26
2013 © Trivadis
Datastax OpsCenter
January 2016
Architecture et modèle de données Cassandra
27
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Ca...
2013 © Trivadis
Architecture Overview
Each node communicates with each other through the Gossip protocol,
which exchanges ...
2013 © Trivadis
No Single Point of Failure
All nodes the same
Customized replication affords tunable
data redundancy
Read/...
2013 © Trivadis
Easy Replication / Data Distribution
Transparently handled by
Cassandra
Multi-data center capable
Exploits...
2013 © Trivadis
Partitioning
• Nodes are logically structured in Ring Topology.
• Hashed value of key associated with data...
2013 © Trivadis
Data Replication
Replication for high availability and data durability
• Replication factor N: each row is...
2013 © Trivadis
Partitioning and Replication
January 2016
Architecture et modèle de données Cassandra
34
01
1/2
F
E
D
C
B
...
2013 © Trivadis
Data Replication
Each data item is replicated at N (replication factor) nodes.
Different Replication Polic...
2013 © Trivadis
Write Path
When a write occurs, Cassandra stores the data in a structure in memory,
the Memtable, and also...
2013 © Trivadis
Write Requests
Coordinator sends a write request to all replicas that own the row being
written
January 20...
2013 © Trivadis
Write Consistency
The consistency level for writing to Cassandra specifies how many replicas
the write mus...
2013 © Trivadis
Read Path
When a read request for a row
comes in to a node, the row
must be combined from all
SSTables on ...
2013 © Trivadis
Read Requests
There are two types of read requests that a coordinator can send to a
replica:
• A direct re...
2013 © Trivadis
Read Consistency
The consistency level for reading from Cassandra specified how many
replicas must respond...
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Ca...
2013 © Trivadis
Cassandra Data Model
• Table is a multi dimensional map indexed by key (row key).
• Columns are grouped in...
2013 © Trivadis
How Cassandra stores data
• Model brought from Google Bigtable
• Row Key and a lot of columns
• Column nam...
2013 © Trivadis
Cassandra Data Model
January 2016
Keyspace
Architecture et modèle de données Cassandra
45
Column Family Co...
2013 © Trivadis
Row, row key, column key, and column value
January 2016
Architecture et modèle de données Cassandra
46
row...
2013 © Trivadis
Static vs. Dynamic Column Family
Static column family (skinny rows)
• Contains a predefined set of columns...
2013 © Trivadis
What is a wide row?
Rows may be described as “skinny” or “wide”
 Wide row – has a relatively large number...
2013 © Trivadis
What are composite row key and
composite column key?
Composite row key – multiple components separated by ...
2013 © Trivadis
Data Modelling with Cassandra
• De-normalize, De-normalize, De-normalize
• Forget about old-school 3NF
• D...
2013 © Trivadis
Remember this
• Cassandra finds rows fast
• Cassandra scans columns fast
• Cassandra does not scan rows
Ja...
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Ca...
2013 © Trivadis
Cassandra API – Thrift vs. CQL
Thrift
• exposes the internal storage structure of Cassandra pretty much di...
2013 © Trivadis
CQL Language
Very similar to RDBMS SQL syntax
Create objects via DDL (e.g. CREATE…)
Core DML commands supp...
2013 © Trivadis
CQL Shell for Apache Cassandra
cqlsh is the command line utility for execution CQL commands (think of
SQL*...
2013 © Trivadis
The CQL/Cassandra Mapping – Static Table
January 2016
name | age | role
-----+-----+-----
john | 37 | dev
...
2013 © Trivadis
Create a Dynamic table (wide-row) Employee
A Dynamic Table is also created with the CREATE TABLE statement...
2013 © Trivadis
The CQL/Cassandra Mapping – Dynamic Table
January 2016
company | name | age | role
--------+------+-----+-...
2013 © Trivadis
Insert data into Employee
The INSERT command is similar to the SQL counterpart
Major difference is that th...
2013 © Trivadis
Retrieving data from Employee table (II)
Restriction on column other than PRIMARY KEY won't work
Can be so...
2013 © Trivadis
Update data in Employee
The UPDATE statement is similar to the SQL UPDATE command
Just as with the INSERT,...
2013 © Trivadis
Cassandra Data Types
January 2016
Architecture et modèle de données Cassandra
62
Category CQL Data Type De...
2013 © Trivadis
Cassandra Data Types (II)
January 2016
Architecture et modèle de données Cassandra
63
Category CQL Data Ty...
2013 © Trivadis
Cassandra Data Types (III)
TimeUUID
• Have a few extra functions, that allow extracting the time informati...
2013 © Trivadis
Collections
CQL3 also supports collections for storing complex data structures
• Set {value,…}, List [valu...
2013 © Trivadis
Collections (II)
January 2016
Architecture et modèle de données Cassandra
66
cqlsh:training> SELECT * FROM...
2013 © Trivadis
Counter Columns
Create a Counter Column Table that counts “favorite” events
January 2016
Architecture et m...
2013 © Trivadis
Time-to-Live (TTL) on Insert
Insert a row with a TTL in seconds (30s) – after that the row is deleted
Janu...
2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Ca...
2013 © Trivadis
Trivadis / DataStax Partnership
• Since December 2014 we are a DataStax silver partner
• DataStax Partner ...
2013 © Trivadis
Questions and answers ...
2013 © Trivadis
BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREI...
Nächste SlideShare
Wird geladen in …5
×

Architecture et modèle de données Cassandra

444 Aufrufe

Veröffentlicht am

Séminaire Trivadis DataStax Cassandra, 26.01.2015, Genève
Ulises Fasoli
Senior Consultant, Trivadis SA

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Architecture et modèle de données Cassandra

  1. 1. 2013 © Trivadis BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 2013 © Trivadis Architecture et modèle de données Cassandra Genève 26.01.2015 Ulises Fasoli Senior Consultant Trivadis AG January 2016 Architecture et modèle de données Cassandra 1
  2. 2. 2013 © Trivadis Agenda 1. Introduction to NoSQL datastores and Polyglot Persistence 2. What is Apache Cassandra? 3. Why Cassandra, What is DataStax? 4. Cassandra Architecture 5. Cassandra Data Model 6. Cassandra Query Language (CQL) 7. Cassandra/DataStax @ Trivadis January 2016 Architecture et modèle de données Cassandra 2
  3. 3. 2013 © Trivadis History of Databases 1960s File-based, Network (CODASYL) and Hierarchical Databases 1970s Relational Database 1980 SQL became the standard query language Early 1990 Object-Databases Late 1990 XML Databases 2004 NoSQL Databases January 2016 Architecture et modèle de données Cassandra 3
  4. 4. 2013 © Trivadis What‘s wrong with Relational Databases ? • SQL provides a rich, declarative query language • Database enforce referential integrity • ACID semantics • Well understood by developers, database administrators • Well supported by different languages, frameworks and tools • Hibernate, JPA, JDBC, iBATIS, Entity Framework • Well understood and accepted by operations people (DBAs) • Configuration • Monitoring • Backup and Recovery • Tuning • Design January 2016 Architecture et modèle de données Cassandra 4 They are great ….
  5. 5. 2013 © Trivadis Relational Databases are great ... But! New trends Big Data Concurrency Connectivity Diversity P2P Knowledge Cloud/Grid January 2016 Architecture et modèle de données Cassandra 5
  6. 6. 2013 © Trivadis Relational Databases are great ... But! Problem: Complex Object Graphs Object/Relational impedance mismatch Complicated to map rich domain model to relational schema Performance issues • Many rows in many tables • Many joins • Eager vs. lazy loading ORDER ADDRESS CUSTOMER ORDER_LINES Order ID: 1001 Order Date: 15.9.2012 Line Items Customer First Name: Peter Last Name: Sample Billing Address Street: Somestreet 10 City: Somewhere Postal Code: 55901 Name Ipod Touch Monster Beat Apple Mouse Quantity 1 2 1 Price 220.95 190.00 69.90 January 2016 Architecture et modèle de données Cassandra 6
  7. 7. 2013 © Trivadis Relational Databases are great ... But! Problem: Schema evolution Adding attributes to an object => have to add columns to table Expensive, if lots of data in that table  Holding locks on the tables for long time  What if new values should be mandatory, cannot enforce NOT NULL constraint  Application downtime … January 2016 Architecture et modèle de données Cassandra 7
  8. 8. 2013 © Trivadis Relational Databases are great ... But! Problem: Semi-structured data Relational schema doesn‘t easily handle semi-structured data Common solutions  Name/Value table - Poor performance - Lack of constraint  Serialize as Blob - Fewer joins, but no query capabilities January 2016 Architecture et modèle de données Cassandra 8
  9. 9. 2013 © Trivadis RDBMS Database Relational Databases are great ... But! Problem: Scaling Scaling writes difficult/expensive/impossible => Big Data Scaling a relational database:  Vertical scaling is limited and is expensive  Horizontal scaling is limited and is expensive RDBMS Database RDBMS Database RDBMS Database RDBMS Database RDBMS Database Node 1 Node 2 P1 P2 P3 ClientClientClient Client Single DB => Partitioned Table => Database Sharding => Database Cluster January 2016 Architecture et modèle de données Cassandra 9
  10. 10. 2013 © Trivadis So, what’s Wrong With RDBMS? • Many programmers are already familiar with it. • Transactions and ACID make development easy. • Lots of tools to use. • Rigid schema design. • Harder to scale. • Replication. January 2016 Architecture et modèle de données Cassandra 10 Nothing No one size fits all
  11. 11. 2013 © Trivadis Solution: NoSQL ? No standard definition of what NoSQL means • Not Only SQL and not No SQL • Not only relational would have been better Term began in a workshop organized in 2009 Use the right tools (DBs) for the job It is more like a feature set, or event the not of a feature set January 2016 Architecture et modèle de données Cassandra 11
  12. 12. 2013 © Trivadis Use Cases for NoSQL • Massive write performance. • Fast key value look ups. • Flexible schema and data types. • No single point of failure. • Fast prototyping and development. • Out of the box scalability. • Easy maintenance. January 2016 Architecture et modèle de données Cassandra 12
  13. 13. 2013 © Trivadis Brewer's CAP Theorem Any networked shared-data system can have at most two of the three desirable properties:  Consistency All of the nodes see the same data at the same time, regardless of where the data is stored  Availability Node failures do not prevent survivors from continuing to operate  Network Partition tolerance The system continues to operate despite arbitrary message loss January 2016 Architecture et modèle de données Cassandra 13 Availability Consistency Network Partition Tolerance n/a CA CP AP
  14. 14. 2013 © Trivadis Data Store Positioning January 2016 Architecture et modèle de données Cassandra 14 Scalability Standardized Model, Tooling, Complexity Key-value Wide Column (Column Families / Extensible Records) Document Graph Relational SQL Comfort Zone Multi Dimensional
  15. 15. 2013 © Trivadis Polyglot Persistence In 2006, Neal Ford coined the term Polyglot Programming  Applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems Polyglot Persistence defines a a hybrid approach to persistence  Using multiple data storage technologies  Selected based on the way data is being used by individual applications  Why store binary images in RDBMs, when there are better storage systems? January 2016 Architecture et modèle de données Cassandra 15 Polyglot Programmer
  16. 16. 2013 © Trivadis Polyglot Persistence Today we use the same database for all kind of data • Business transactions, session management data, reporting, logging information, content information, ... No need for same properties of availability, consistency or backup requirements Polyglot Data Storage Usage allows to mix and match Relational and NoSQL data stores January 2016 Architecture et modèle de données Cassandra 16 Polygot Persistence Model E-commerce Application Shopping cart data User Sessions Product Catalog RecomendationsCompleted Order Key-Value RDMBS Document Graph „Traditional“ Persistence Model E-commerce Application RDBMS Shopping cart data User Sessions Product Catalog RecomendationsCompleted Order
  17. 17. 2013 © Trivadis Agenda 1. Introduction to NoSQL datastores and Polyglot Persistence 2. What is Apache Cassandra? 3. Why Cassandra, What is DataStax? 4. Cassandra Architecture 5. Cassandra Data Model 6. Cassandra Query Language (CQL) 7. Cassandra/DataStax @ Trivadis January 2016 Architecture et modèle de données Cassandra 17
  18. 18. 2013 © Trivadis Definition of Cassandra Apache Cassandra™ is a free • Distributed… • High performance… • Extremely scalable… • Fault tolerant (i.e. no single point of failure)… post-relational database solution. Cassandra can serve as both real-time Datastore (the "system of record") for online/transactional applications, and as a read-intensive database for business intelligence systems. January 2016 Architecture et modèle de données Cassandra 18
  19. 19. 2013 © Trivadis History of Cassandra January 2016 Architecture et modèle de données Cassandra 19 Bigtable Dynamo
  20. 20. 2013 © Trivadis Architecture Overview Cassandra was designed with the understanding that system/hardware failures can and do occur : • Peer-to-peer, distributed system • All nodes the same • Data partitioned among all nodes in the cluster • Custom data replication to ensure fault tolerance • Read/Write-anywhere design January 2016 Architecture et modèle de données Cassandra 20
  21. 21. 2013 © Trivadis Big Data Scalability • Capable of comfortably scaling to petabytes • New nodes = Linear performance increases • Add new nodes online January 2016 Architecture et modèle de données Cassandra 21
  22. 22. 2013 © Trivadis Who is using Cassandra? January 2016 Architecture et modèle de données Cassandra 22 Largest publicly known cluster has over 300 TB of data spanning 400 machines
  23. 23. 2013 © Trivadis Agenda 1. Introduction to NoSQL datastores and Polyglot Persistence 2. What is Apache Cassandra? 3. Why Cassandra, What is DataStax? 4. Cassandra Architecture 5. Cassandra Data Model 6. Cassandra Query Language (CQL) 7. Cassandra/DataStax @ Trivadis January 2016 Architecture et modèle de données Cassandra 23
  24. 24. 2013 © Trivadis Why Cassandra? Tunable data consistency Flexible schema design Data Compression CQL language (like SQL) Support for key languages and platforms No need for special hardware or software Gigabyte to Petabyte scalability Linear performance gains through adding nodes No single point of failure Easy replication / data distribution Multi-data center and Cloud capable No need for separate caching layer January 2016 Architecture et modèle de données Cassandra 24
  25. 25. 2013 © Trivadis Cassandra Use Cases Product Catalog / Playlists Personalization • Ads • Recommendations • Ratings Fraud Detection Time Series • Finance • Smart Meter IoT / Sensor Data Graph / Network data January 2016 Architecture et modèle de données Cassandra 25
  26. 26. 2013 © Trivadis DataStax Enterprise Edition (DSE) January 2016 Architecture et modèle de données Cassandra 26
  27. 27. 2013 © Trivadis Datastax OpsCenter January 2016 Architecture et modèle de données Cassandra 27
  28. 28. 2013 © Trivadis Agenda 1. Introduction to NoSQL datastores and Polyglot Persistence 2. What is Apache Cassandra? 3. Why Cassandra, What is DataStax? 4. Cassandra Architecture 5. Cassandra Data Model 6. Cassandra Query Language (CQL) 7. Cassandra/DataStax @ Trivadis January 2016 Architecture et modèle de données Cassandra 28
  29. 29. 2013 © Trivadis Architecture Overview Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second A commit log is used on each node to capture write activity. Data durability is assured Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SSTable) January 2016 Architecture et modèle de données Cassandra 29
  30. 30. 2013 © Trivadis No Single Point of Failure All nodes the same Customized replication affords tunable data redundancy Read/write from any node Can replicate data among different physical data center racks January 2016 Architecture et modèle de données Cassandra 30
  31. 31. 2013 © Trivadis Easy Replication / Data Distribution Transparently handled by Cassandra Multi-data center capable Exploits all the benefits of Cloud computing Able to do hybrid Cloud/On- premise setup January 2016 Architecture et modèle de données Cassandra 31
  32. 32. 2013 © Trivadis Partitioning • Nodes are logically structured in Ring Topology. • Hashed value of key associated with data partition is used to assign it to a node in the ring. • Lightly loaded nodes moves position to alleviate highly loaded nodes. January 2016 Architecture et modèle de données Cassandra 32
  33. 33. 2013 © Trivadis Data Replication Replication for high availability and data durability • Replication factor N: each row is replicated at N nodes • Each row key k is assigned to a coordination node • The coordinator node is responsible for replicating the rows within its key range January 2016 Architecture et modèle de données Cassandra 33
  34. 34. 2013 © Trivadis Partitioning and Replication January 2016 Architecture et modèle de données Cassandra 34 01 1/2 F E D C B A N=3 h(key2) h(key1)
  35. 35. 2013 © Trivadis Data Replication Each data item is replicated at N (replication factor) nodes. Different Replication Policies  Rack Unaware – replicate data at N-1 successive nodes after its coordinator  Rack Aware – uses 'Zookeeper' to choose a leader which tells nodes the range they are replicas for  Datacenter Aware – similar to Rack Aware but leader is chosen at Datacenter level instead of Rack level. January 2016 Architecture et modèle de données Cassandra 35
  36. 36. 2013 © Trivadis Write Path When a write occurs, Cassandra stores the data in a structure in memory, the Memtable, and also appends writes to the commit log on disk, providing configurable durability. January 2016 Architecture et modèle de données Cassandra 36
  37. 37. 2013 © Trivadis Write Requests Coordinator sends a write request to all replicas that own the row being written January 2016 Architecture et modèle de données Cassandra 37
  38. 38. 2013 © Trivadis Write Consistency The consistency level for writing to Cassandra specifies how many replicas the write must succeed before returning an ACK to the client • Quorum: (replication_factor / 2) + 1 January 2016 Architecture et modèle de données Cassandra 38
  39. 39. 2013 © Trivadis Read Path When a read request for a row comes in to a node, the row must be combined from all SSTables on that node that contain columns from the row in question as well as from any unflushed memtables, to produce the requested data January 2016 Architecture et modèle de données Cassandra 39
  40. 40. 2013 © Trivadis Read Requests There are two types of read requests that a coordinator can send to a replica: • A direct read request • A background read repair request The number of replicas contacted by a direct read request is determined by the consistency level specified by the client. January 2016 Architecture et modèle de données Cassandra 40
  41. 41. 2013 © Trivadis Read Consistency The consistency level for reading from Cassandra specified how many replicas must respond before a result is returned to the client • Quorum: (replication_factor / 2) + 1 January 2016 Architecture et modèle de données Cassandra 41
  42. 42. 2013 © Trivadis Agenda 1. Introduction to NoSQL datastores and Polyglot Persistence 2. What is Apache Cassandra? 3. Why Cassandra, What is DataStax? 4. Cassandra Architecture 5. Cassandra Data Model 6. Cassandra Query Language (CQL) 7. Cassandra/DataStax @ Trivadis January 2016 Architecture et modèle de données Cassandra 42
  43. 43. 2013 © Trivadis Cassandra Data Model • Table is a multi dimensional map indexed by key (row key). • Columns are grouped into Column Families • Dynamic schema design allows for much more flexible data storage than rigid RDBMS • Each Column has - Name - Value - Timestamp January 2016 Architecture et modèle de données Cassandra 43
  44. 44. 2013 © Trivadis How Cassandra stores data • Model brought from Google Bigtable • Row Key and a lot of columns • Column names sorted (UTF8, Int, Timestamp, etc.) January 2016 Architecture et modèle de données Cassandra 44 Column Name … Column Name Column Value Column Value Timestamp Timestamp TTL TTL Row Key 1 2 Billion BillionofRows
  45. 45. 2013 © Trivadis Cassandra Data Model January 2016 Keyspace Architecture et modèle de données Cassandra 45 Column Family Column Family
  46. 46. 2013 © Trivadis Row, row key, column key, and column value January 2016 Architecture et modèle de données Cassandra 46 row key va cola vb colb vc colc vd cold Column keys (or column names)Row Column values (or cells) • Rows: individual rows constitute a column family • Row key: uniquely identifies a row in a column family • Row: stores pairs of column keys and column values • Column key: uniquely identifies a column value in a row • Column value : stores one value or a collection of values
  47. 47. 2013 © Trivadis Static vs. Dynamic Column Family Static column family (skinny rows) • Contains a predefined set of columns with metadata • Number of columns can vary across multiple rows within the column family • Similar to RDMBS, except no NULL values January 2016 Architecture et modèle de données Cassandra 47 John Lennon 1940 born England country 1980 died Rock style artist type The Beatles England country 1957 founded Rock style band type
  48. 48. 2013 © Trivadis What is a wide row? Rows may be described as “skinny” or “wide”  Wide row – has a relatively large number of column keys (hundreds or thousands); this number may increase as new data values are inserted - For example, a row that stores all bands of the same style - The number of such bands will increase as new bands are formed  Note that column values do not exist in this example - The column key – in this case a band name – stores all the data desired - Could have stored the number of albums, or year founded, etc., as column values ©2014 DataStax Training. Use only with permission. Slide 48 Rock The Animals The Beatles... ... ... ... ... ...
  49. 49. 2013 © Trivadis What are composite row key and composite column key? Composite row key – multiple components separated by colon ‘Revolver’ and 1966 are the album title and year ‘tracks’ value is a collection (map) Composite column key – multiple components separated by colon Composite column keys are sorted by each component ©2014 DataStax Training. Use only with permission. Slide 49 Revolver:1966 Rock genre The Beatles performer {1: 'Taxman', ..., 14: 'Tomorrow Never Knows'} tracks Revolver:1966 Taxman 1:title Eleanor Rigby 2:title Tomorrow Never Knows 14:title... ...
  50. 50. 2013 © Trivadis Data Modelling with Cassandra • De-normalize, De-normalize, De-normalize • Forget about old-school 3NF • De-normalize wherever you can for quicker retrieval and let application logic handle the responsibility of reliably updating redundancies • Rows are gigantic and sorted • Giga-sized rows (2 billion columns max) can be used to store sortable and sliceable columns • Comments by timestamp, ordered bids by quoted price, Ratings by product, .. • One row, one machine • Each row stays on one machine • Rows are not shared across nodes • Beware of this, don't create hotspots with a high demand row! January 2016 Architecture et modèle de données Cassandra 50 From Query to Model
  51. 51. 2013 © Trivadis Remember this • Cassandra finds rows fast • Cassandra scans columns fast • Cassandra does not scan rows January 2016 Architecture et modèle de données Cassandra 51
  52. 52. 2013 © Trivadis Agenda 1. Introduction to NoSQL datastores and Polyglot Persistence 2. What is Apache Cassandra? 3. Why Cassandra, What is DataStax? 4. Cassandra Architecture 5. Cassandra Data Model 6. Cassandra Query Language (CQL) 7. Cassandra/DataStax @ Trivadis January 2016 Architecture et modèle de données Cassandra 52
  53. 53. 2013 © Trivadis Cassandra API – Thrift vs. CQL Thrift • exposes the internal storage structure of Cassandra pretty much directly • Complicated, low-level, full control • legacy CQL • New way to go • Provides thin abstraction layer over Cassandra's internal structure • Hides some distracting and useless implementation details • Allows to provide native syntax for common encodings/idioms (like collections) instead of letting each client (library) re-implement them in their own, different and thus incompatible way January 2016 Architecture et modèle de données Cassandra 53
  54. 54. 2013 © Trivadis CQL Language Very similar to RDBMS SQL syntax Create objects via DDL (e.g. CREATE…) Core DML commands supported: INSERT, UPDATE, DELETE Query data with SELECT Current version is CQL3 January 2016 Architecture et modèle de données Cassandra 54
  55. 55. 2013 © Trivadis CQL Shell for Apache Cassandra cqlsh is the command line utility for execution CQL commands (think of SQL*Plus for Cassandra) CQL3 is default since Cassandra 1.2 January 2016 Architecture et modèle de données Cassandra 55 $ cqlsh Connected to DataStaxCluster at localhost:9160. [cqlsh 4.1.0 | Cassandra 2.0.5.24 | CQL spec 3.1.1 | Thrift protocol 19.39.0] Use HELP for help. cqlsh>
  56. 56. 2013 © Trivadis The CQL/Cassandra Mapping – Static Table January 2016 name | age | role -----+-----+----- john | 37 | dev eric | 38 | ceo age role john 37 dev Eric 38 ceo CREATE TABLE employee ( name text PRIMARY KEY, age int, role text); Architecture et modèle de données Cassandra 56
  57. 57. 2013 © Trivadis Create a Dynamic table (wide-row) Employee A Dynamic Table is also created with the CREATE TABLE statement but using a composite primary key January 2016 Architecture et modèle de données Cassandra 57 cqlsh:training> CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) );
  58. 58. 2013 © Trivadis The CQL/Cassandra Mapping – Dynamic Table January 2016 company | name | age | role --------+------+-----+----- OSC | eric | 38 | ceo OSC | john | 37 | dev RKG | anya | 29 | lead RKG | ben | 27 | dev RKG | chad | 35 | ops eric:age eric:role john:age john:role OSC 38 dev 37 dev anya:age anya:role ben:age ben:role chad:age chad:role RKG 29 lead 27 dev 35 ops CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); Architecture et modèle de données Cassandra 58
  59. 59. 2013 © Trivadis Insert data into Employee The INSERT command is similar to the SQL counterpart Major difference is that the PRIMARY KEY is always required If the same statement is executed twice, there will be no error if same PRIMARY KEY value is reused with different other column value, then the last one wins! January 2016 Architecture et modèle de données Cassandra 59 cqlsh:training> INSERT INTO employee (name, age, role) VALUES ('john', 37, 'dev'); cqlsh:training> INSERT INTO employee (name, age, role) VALUES ('eric', 38, 'ceo');
  60. 60. 2013 © Trivadis Retrieving data from Employee table (II) Restriction on column other than PRIMARY KEY won't work Can be solved with an Index (but be careful, better use de-normalization) January 2016 Architecture et modèle de données Cassandra 60 cqlsh:training> SELECT * FROM employee WHERE age = 37; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:training> CREATE INDEX employee_age_idx ON employee (age); cqlsh:training> SELECT * FROM employee WHERE age = 37; name | age | role ------+-----+------ john | 37 | dev (1 rows)
  61. 61. 2013 © Trivadis Update data in Employee The UPDATE statement is similar to the SQL UPDATE command Just as with the INSERT, the PRIMARY KEY column must be specified as part of the UPDATE In CQL the UPDATE does not check for the existence of the row, if it does not exist, CQL will just create it January 2016 Architecture et modèle de données Cassandra 61 cqlsh:training> UPDATE employee SET age = 38 WHERE name = 'john';
  62. 62. 2013 © Trivadis Cassandra Data Types January 2016 Architecture et modèle de données Cassandra 62 Category CQL Data Type Description String ascii US-ASCII character string text UTF-8 encoded string, used most of the time for storing String data. varchar UTF-8 Strings. inet Used for storing IP addresses Numeric int 32-bit signed integer float 32-bit IEEE-754 floating point double 64-bit IEEE-754 floating point varint Arbitrary precision integers bigint 64-bit number, equivalent to long. decimal Variable-precision decimal counter Distributed counter value (64-bit long)
  63. 63. 2013 © Trivadis Cassandra Data Types (II) January 2016 Architecture et modèle de données Cassandra 63 Category CQL Data Type Description UUIDs uuid A UUID in standard UUID format timeuuid Type 1 UUID only, for storing unique time-base IDs Collections list Ordered collection of one or more elements map Collection of arbitrary key-value pairs set Unordered collection of one or more unique elements Miscellaneous boolean Boolean (true/false) blob Used for storing binary data written in hexadecimal timestamp Date/Time
  64. 64. 2013 © Trivadis Cassandra Data Types (III) TimeUUID • Have a few extra functions, that allow extracting the time information • now() returns a new TimeUUID with the time of the current timestamp, ensures globally unique values • minTimeuuid() and maxTimeuuid() are used when querying ranges of TimeUUIDs Counter • Cannot mix counter columns with other types • Value can not be set, only incremented/decremented by specified amount • Counters may not be part of the PRIMARY KEY of the table January 2016 Architecture et modèle de données Cassandra 64 WHERE event_time > maxTimeuuid('2013-01-01 00:05+0000') AND event_time < minTimeuuid('2013-02-02 10:00+0000')
  65. 65. 2013 © Trivadis Collections CQL3 also supports collections for storing complex data structures • Set {value,…}, List [value,…], Map {key:value,…} January 2016 Architecture et modèle de données Cassandra 65 cqlsh:training> CREATE TABLE collection_sample( id int PRIMARY KEY, string_set set<text>, string_list list<text>, string_map map<text, text>); cqlsh:training> INSERT INTO coll (id, string_set, string_list, string_map) VALUES (1, {'text1','text2','text1'}, ['text1','text2','text1'], {'key1':'value1'});
  66. 66. 2013 © Trivadis Collections (II) January 2016 Architecture et modèle de données Cassandra 66 cqlsh:training> SELECT * FROM collection_sample; id | string_list | string_map | string_set ----+-----------------------------+--------------------+-------------------- 1 | ['text1', 'text2', 'text1'] | {'key1': 'value1'} | {'text1', 'text2'} (1 rows)
  67. 67. 2013 © Trivadis Counter Columns Create a Counter Column Table that counts “favorite” events January 2016 Architecture et modèle de données Cassandra 67 cqlsh:training> CREATE TABLE favorites ( product_id int, month int, number COUNTER, PRIMARY KEY (product_id, month)); cqlsh:training> UPDATE favorites SET number = number + 1 WHERE product_id = 4910 AND month = 06; cqlsh:training> SELECT * FROM favorites; product_id | month | number ------------+-------+-------- 4910 | 6 | 1
  68. 68. 2013 © Trivadis Time-to-Live (TTL) on Insert Insert a row with a TTL in seconds (30s) – after that the row is deleted January 2016 Architecture et modèle de données Cassandra 68 cqlsh:training> INSERT INTO employee (name, age, role) VALUES ('bob', 29, 'dev') USING TTL 30; cqlsh:training> SELECT TTL(role) FROM employee WHERE name='bob'; ttl(role) ----------- 22 cqlsh:training> SELECT TTL(role) FROM employee WHERE name='bob'; (0 rows)
  69. 69. 2013 © Trivadis Agenda 1. Introduction to NoSQL datastores and Polyglot Persistence 2. What is Apache Cassandra? 3. Why Cassandra, What is DataStax? 4. Cassandra Architecture 5. Cassandra Data Model 6. Cassandra Query Language (CQL) 7. Cassandra/DataStax @ Trivadis January 2016 Architecture et modèle de données Cassandra 69
  70. 70. 2013 © Trivadis Trivadis / DataStax Partnership • Since December 2014 we are a DataStax silver partner • DataStax Partner Network (DSPN) • Available certifications • Admin • Developer • Architect • Currently only one other partner in Switzerland: Intersys • http://www.datastax.com/partners January 2016 Architecture et modèle de données Cassandra 70
  71. 71. 2013 © Trivadis Questions and answers ... 2013 © Trivadis BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA Ulises Fasoli Senior consultant +41 21 321 47 00 ulises.fasoli@trivadis.com January 2016 Architecture et modèle de données Cassandra 71

×