1. An introduction
to Cassandra
Pedro Gomes
pedrogomes@lsd.di.uminho.pt
Braga Geek Nights - Abril 2010
2. Context
• NOSQL movement- Not only SQL
•unstructured data
•web oriented interfaces
•scale problems
Voldemort
• +20 emerging non relational databases
• Document stores
• Graph databases
• Key-Value and Wide Column Stores
3. Cassandra - introduction
• From the greek prophetess Cassandra.
• Based on Amazon Dynamo and Goggle
BigTable
• Built on FaceBook, open sourced in 2008
• Scalable, decentralized and structured data
store
4. Why Cassandra?
• High available
• Eventual consistent
• Decentralized
• Elastic
• Fault tolerant
• Flexible Schema
7. Consistency
• CAP theorem Availability Consitency
• Trade consistency for availability
Partition
Tolerance
• Eventual consistency
• Read Repair, Hinted Handoff , Proactive Repair
• A choice, not an obligation
8. Consistency - N,W,R
• Define your Consistency:
• Define the replication factor N
• For writes and reads chose the number
of nodes R or W
• ALL, ONE, QUORUM, ZERO.
• W + R > N = Consistency
9. Data model
• KeySpaces - collection of your unique keys
• Column Families - groups of columns
• Columns - a tuple with column name, value,
and time stamp
• Super columns - A column that is a set of
column
• I will show pictures next, don’t worry.
10. Data model - Column Families
• Using the blog example:
• PostsKeys Columns
Geek Title: Author: Body:
Nights Geek Nights Pedro The...
Title: Author: Body: Tags:
Cassandra Data, ...
Cassandra Pedro This...
Title: Author: Body:
Stuff
Stuff Someone Something
11. Data model - Super Columns
• Comments
Keys SuperColumns
Geek 4/5/2010 Author: Comment: email: 4/5/2010 Author: Comment: email:
Nights 20:00 Ricardo I think... email@ 19:00 Jack IMO ... email@
1/4/2010 Author: Comment: email: 1/4/2010 Author: Comment: email:
Cassandra
14:00 Filipe My POV.. email@ 14:00 Jon ... email@
Stuff 1/4/2010 Author: Comment: email:
14:00 Filipe Great... email@
12. Data model
<Keyspace Name="BloggyAppy">
<!-- CF definitions -->
<ColumnFamily CompareWith="BytesType" Name="BlogEntries"/>
<ColumnFamily CompareWith="TimeUUIDType" Name="Comments"
CompareSubcolumnsWith="BytesType" ColumnType="Super"/>
</Keyspace>
• Think about your schema