Introduction to Cassandra

Introduction to
Cassandra

Shimi Kiviti
@shimi_k

Motivation

Scaling

How do you scale your database?
● reads
● writes

Influential Papers

● Bigtable: A distributed storage system for structured data,
2006
● Dynamo: amazon's highly available key-value store, 2007

Cassandra:
● partition and replication - Dynamo
● log structure column family - Bigtable

Cassandra Highlights

● Symmetric - all nodes are exactly the same
○ No single point of failure
○ Linearly scalable
○ Ease of administration
● High availability with multiple datacenters
● Consistency vs Latency
● Read/Write anywhere
● Flexible Schema
● Column TTL
● Distributed Counters

DHT

● O(1) node lookup
● Explicit replication
● Linear Scalability

Consistency

N = Replication factor
R = Number of replicas to block when read <= N
W = Number of replicas to block when write <= N
Quorum = N/2 + 1

When W + R > N there is a full consistency
examples:
● W = 1, R = N
● W = N, R = 1
● W = Quorum, R = Quorum

Consistency Level

● Every request defines consistency level
○ Any
○ One
○ Two
○ Three
○ Quorum
○ Local Quorum
○ Each Quorum
○ All

Data Model

● Keyspace ~ schema
● ColumnFamilies ~ table
● Rows
● Columns

Column Family

Key1 Column Column Column

Key2 Column Column

Column Family

ColumnFamily: {
TOK: {
chen: 1,
ronen: 7
}
CityPath: {
yuval: 5
}
}

Super Column Family
Super1 Column Column Column
Key
Super2 Column Column Column

ColumnFamily: {
Key: {
super1: {
name: value,
name: value
}
super2: {
name: value
}
}
}

Write

● Any node
● Partitioner
● Commit log, memtable
● Wait for W responses

Write

● No reads
● No seeks
● Sequential disk access
● Atomic within a column family
● Fast
● Always writeable (hinted hand-off)

Read

● Choose any node
● Partitioner
● Wait for R responses
● tunable read repair in the background

Read

Read can be from multiple SSTables
Slower then writes

Cache

● There is no need to use memcached
● There is an internal configurable cache
○ Key cache
○ Row cache

Sorting

When you preform get the result is sorted
● Rows are sorted according to the partitioner
● Columns in a row are sorted according to the type of the
column name

Partitioner

● RandomPartitioner - Uses hash values as tokens. useful for
distributing the load on all nodes.
If you use it, set the nodes tokens manually

● OrderPreservePartioner - You can get sorted rows but it will
cost you with an even cluster

Column Types

Available types:
● Bytes
● UTF8
● Ascii
● Long
● Date
● UUID
● Composite - <Type1>:<Type2>

Column Types

Examples:

Sort1:
8 10
9 vs 8
10 9

Sort2:
dan:8 dan:10
dan:10 vs dan:8
shimi:1 shimi:1

Clients

● Thrift - Cassandra driver level interface
● CQL - Cassandra query language (SQL like)
● High level clients:
○ Python
○ Java
○ Scala
○ Clojure
○ .Net
○ Ruby
○ PHP
○ Perl
○ C++
○ Haskel

Cascal - Scala client

Insert column:

session.insert("app" "users" "shimi" "passwd" "mypass")

val key = "app" "users" "shimi"
session.insert(key "email" "shimi.k@...")

Get column value:

val pass = session.get(key "passwd")

Cascal

Get multiple columns:

val row = session.list(key)
val cols = session.list(key, RangePredicate("email", "passwd"))
val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))

Cascal

Get multiple rows:

val family = "app" "users"
val rows = session.list(family, RangePredicate("dan", "shimi"))
val rows = session.list(family, KeyPrdicate("dan", "shimi"))

Cascal

Remove column:
session.remove("app" "users" "shimi" "passwd")

Remove row:
session.remove("app" "users" "shimi")

Batch operations:

val deleteCols = Delete(key, ColumnPredicate("age" :: "sex"))
val insertEmail = Insert(key "email" "shimi.k@...")
session.batch(insertEmail :: deleteCols)

Guidelines

● Keep together the data you query together
● Think about your use case and how you should fetch your
data.
● Don't try to normalize your data
● You can't win the disk
● Be ready to get your hands dirty
● There is no single solution for everything. You might
consider using different solutions together

The End

Useful links:
● Cassandra, http://cassandra.apache.org/
● Wiki http://wiki.apache.org/cassandra/
● Cassandra mailing list
● IRC
● Bigtable, http://labs.google.com/papers/bigtable.html
● Dynamo http://www.allthingsdistributed.
com/2007/10/amazons_dynamo.html
● Cascal, https://github.com/shimi/cascal

Introduction to Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to Cassandra

Similar to Introduction to Cassandra (20)

Recently uploaded

Recently uploaded (20)

Introduction to Cassandra