Cassandra Community Webinar: Back to Basics with CQL3

Back to Basics with
CQL3
Matt Overstreet
OpenSource Connections


Outline
•
•
•
•
•

Overview
Architecture
Data Modeling
Good At/Bad At
Using Cassandra


Outline
•
•
•
•
•

Overview
Architecture
Data Modeling
Good At/Bad At
Using Cassandra

• What is Big Data?
• How does Cassandra fit?


What is Big Data?
• The three V’s (and a C)

velocity
volume
Variety
Complexity

What is Big Data
• Brewer’s CAP theorem
o
o
o
o

Consistency - all nodes have same world view
Availability - requests can be serviced
Partition tolerance - network/machine failure
Can’t have all 3 -- Pick 2!

• Examples
o MySQL – Consistent, Available
o HBase – Consistent, Partition Tolerant
o Cassandra – Available, Partition Tolerant
– and “Tunably Consistent”!


What is Big Data?
• Common theme: Denormalize everything!
o What’s that?
• JOIN all the tables in the database...
• … well not all the tables

o Why?
• You can shard database at any point
• All related data is co-located

• What this means for you
o
o
o
o
o

No joins
No transactions - potential for inconsistency
Vastly simplified querying
No data-modeling -- Instead, query-modeling
“Infinite and easy” scaling potential

How Does Cassandra Fit?
• No single point of failure
• Optimized for writes, still good with reads
• Can decide between Consistency and Availably
concerns


Outline
•
•
•
•
•

Overview
Architecture
Data Modeling
Good At/Bad At
Using Cassandra

• Ring architecture
• Data partitioning
o Operations
o Writes
o Reads


Ring Architecture
• No single point of failure
• Nodes talk via gossip
• Democratic - all nodes
are equal


Data Partitioning

Original partitioning method.

Data Partitioning

Flexible partitioning with virtual nodes.

Operations: Writes

Requests sent out to nodes and replicants.

Operations: Reads

Coordinator node reaches out to relevant replicants.

Outline
•
•
•
•
•

Overview
Architecture
Data Modeling
Good At/Bad At
Using Cassandra

•
•
•
•

Internals
Cassandra Query Language
Modeling Strategy
Example


C* Data Model
Keyspace


C* Data Model
Keyspace
Column Family

Column Family


C* Data Model
Row Key


C* Data Model
Row Key

Column
Column Name

Column Value
(or Tombstone)
Timestamp
Time-to-live


C* Data Model
Row Key

Column
Column Name

Column Value
(or Tombstone)
Timestamp
Time-to-live

● Row Key, Column Name, Column
Value have types
● Column Name has comparator
● RowKey has partitioner
● Rows can have any number of
columns - even in same column family
● Rows can have many columns
● Column Values can be omitted
● Time-to-live is useful!
● Tombstones

C* Data Model: Writes

Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Insert into
MemTable
● Dump to
CommitLog
● No read
● Very Fast!
● Blocks on CPU
before O/I!
Key
Cache
Key
Cache
Key
Cache
Key
Cache

SSTable
SSTable
SSTable
SSTable


C* Data Model:
Reads
Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Get values from Memtable
● Get values from row
cache if present
● Otherwise check bloom
filter to find appropriate
SSTables
● Check Key Cache for fast
SSTable Search
● Get values from SSTables
● Repopulate Row Cache
● Super Fast Col. retrieval
● Fast row slicing
Key
Cache
Key
Cache
Key
Cache
Key
Cache

SSTable
SSTable
SSTable
SSTable


Internals: Twitter Example
• 4 ColumnFamilies
o
o
o
o

followers
following
tweets
timeline


• 4 ColumnFamilies
o
o
o
o

followers
following
tweets
timeline

• Nate follows Patricia
o
o
o
o

SET followers[Patricia][Nate] = „‟;
SET following[Nate][Patricia] = „‟;
storing data in column names (not values)
denormalized, redundant!

• Get all Nate’s followers
o GET followers[Patricia]
o => Nate,Eric,Scott,Matt,Doug,Kate
o No JOIN!


• Nate tweets
o SET tweets[Nate][2013-07-19 T 09:20] = “Wonderful morning.
This coffee is great.”
o SET tweets[Nate][2013-07-19 T 09:21] = “Oops, smoke is
coming out of the SQL server!”
o SET tweets[Nate][2013-07-19 T 09:51] = “Now my coffee is
cold :-(”

• Get Nate’s tweets
o GET tweets[Nate]
…(what you’d expect)...


CQL (Cassandra Query Language)
CREATE TABLE users (
id timeuuid PRIMARY KEY,
lastname varchar,
firstname varchar,
dateOfBirth timestamp );


lastname varchar,
firstname varchar,
INSERT INTO users (id,lastname, firstname, dateofbirth)
VALUES (now(),'Berryman',‟John','1975-09-15');


lastname varchar,
firstname varchar,
VALUES (now(),‟Berryman‟,‟John‟,‟1975-09-15‟);
UPDATE users SET firstname = ‟John‟
WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6;


lastname varchar,
firstname varchar,
VALUES (now(),'Berryman',‟John','1975-09-15');
UPDATE users SET firstname = 'John‟
WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6;
SELECT dateofbirth,firstname,lastname FROM users ;
dateofbirth
| firstname | lastname
--------------------------+-----------+---------1975-09-15 00:00:00-0400 |
John | Berryman

The CQL/Cassandra Mapping
CREATE TABLE employees (
company text,
name text,
age int,
role text,
PRIMARY KEY (company,name)
);


company text,
name text,
age int,
role text,
);

company | name | age | role
--------+------+-----+----OSC | eric | 38 | ceo
OSC | john | 37 | dev
RKG | anya | 29 | lead
RKG | ben | 27 | dev
RKG | chad | 35 | ops


Modeling Strategy
• Don’t think about the data structure
• Do think of the questions you’ll ask
• Consider efficient operations for Cassandra
o
o
o
o

Writing (4K writes per second per core)
Retrieving a row
Retrieving a row slice
Retrieving in natural order (which you control)

• Write the data in the way you will query it
• Disk space is cheap
• Seperate read-heavy and write-heavy task
o Make wise use of caches


Modeling Strategy: Anti-Patterns
• Read-then-write
• Heavy deletes
o Scatters dead columns throughout SSTables
o Won’t be corrected until first compaction after
gc_grace_seconds (10days)

• Distributed queue
• JOIN-like behavior
• Super wide-row sneak attack (>2B columns)


QUESTIONS?


Cassandra Community Webinar: Back to Basics with CQL3

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Cassandra Community Webinar: Back to Basics with CQL3

Ähnlich wie Cassandra Community Webinar: Back to Basics with CQL3 (20)

Mehr von DataStax

Mehr von DataStax (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Cassandra Community Webinar: Back to Basics with CQL3

Hinweis der Redaktion