3. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
3
Winter 2015
What is NoSQL
review
Stands for Not Only SQL
Class of non-relational data storage systems
Usually do not require a fixed table schema nor do
they use the concept of joins
All NoSQL offerings relax one or more of the ACID
properties (will talk about the CAP theorem)
4. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
4
Winter 2015
Dynamo and BigTable
Three major papers were the seeds of the NoSQL
movement
• BigTable (Google)
• Dynamo (Amazon)
—Gossip protocol (discovery and error detection)
—Distributed key-value data store
—Eventual consistency
• CAP Theorem (discuss in a sec ..)
6. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
7
Winter 2015
What kinds of NoSQL
Review
NoSQL solutions fall into two major areas:
• Key/Value or ‘the big hash table’.
—Amazon S3 (Dynamo)
—Voldemort
—Scalaris
• Schema-less which comes in multiple flavors, column-
based, document-based or graph-based.
—Cassandra (column-based)
—CouchDB (document-based)
—Neo4J (graph-based)
—HBase (column-based)
7. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
8
Winter 2015
Key-Value Stores
Extremely simple interface
• Data model: (key, value) pairs
• Operations:
—Insert(key,value),
—Fetch(key),
—Update(key),
—Delete(key).
Implementation: efficiency, scalability, fault-
tolerance
• Records distributed to nodes based on key
• Replication
• Single-record transactions,“eventual consistency”
8. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
9
Winter 2015
Key-Value Data Stores
Storing Session Information
User Profiles, Preferences: Almost every user has
a unique userID as well as preferences such as
language, color, timezone, which products the
user has access to , and so on.
Suitable Use Cases
9. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
10
Winter 2015
Key-Value Data Stores
As we want the shopping carts to be available
all the time, across browsers, machines, and
sessions, all the shopping information can be put
into value where the key is the userID
Shopping Cart Data
10. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
11
Winter 2015
Key-Value Data Stores
Relationships among data
Multi-operation Transactions
Query by Data
Operations by Sets
Not to Use
11. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
12
Winter 2015
Column-oriented
Store data in column order
Allow key-value pairs to be stored (and retrieved
on key) in a massively parallel system,
• Data model: families of attributes defined in a schema,
new attributes can be added,
• Storing principle: big hashed distributed tables,
• Properties: partitioning (horizontally and/or vertically),
high availability etc. completely transparent to
application,
Intro
13. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
14
Winter 2015
Cassandra
Apache Cassandra™ is a free
Distributed…
High performance…
Extremely scalable…
Fault tolerant (i.e. no single point of failure)…
Post-relational database solution.
Cassandra can serve as both real-time datastore and as a
read-intensive database.
Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...
Thrift
15. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
16
Winter 2015
Cassandra
Originally developed at Facebook
Follows the BigTable data model: column-oriented
Uses the Dynamo Eventual Consistency model
Written in Java
Open-sourced and exists within the Apache family
Uses Apache Thrift as it’s API
Some of its myriad users:
16. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
17
Winter 2015
Cassandra
keyspace: Usually the name of the application; e.g.,
'Twitter', 'Wordpress‘.
column family: structure containing an unlimited
number of rows
• Simple
• Super (nested Column Families)
column: a tuple with name, value and time stamp
• Each Column has
— Name
— Value
— Timestamp
key: name of record
super column: contains more columns
Data Model
17. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
18
Winter 2015
Cassandra – Data Model
keyspace
settings
column family
settings
column
name value timestamp
18. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
19
Winter 2015
Cassandra
Column Family & Super Column Family
19. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
20
Winter 2015
Cassandra
Cassandra was designed with the understanding that
system/hardware failures can and do occur
Peer-to-peer, distributed system
All nodes the same
Data partitioned among all nodes
in the cluster
Custom data replication to ensure
fault tolerance
Read/Write-anywhere design
Architecture Overview
20. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
21
Winter 2015
Cassandra
Each node communicates with each other through the
Gossip protocol, which exchanges information across
the cluster every second,
A commit log is used on each node to capture write
activity. Data durability is assured,
Data also written to an in-memory
structure (memtable) and then to
disk once the memory structure is
full (an SStable).
Architecture Overview
21. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
22
Winter 2015
Why Cassandra?
Gigabyte to Petabyte scalability
Linear performance gains through adding nodes
No single point of failure
Easy replication / data distribution
Multi-data center and Cloud capable
No need for separate caching layer
Tunable data consistency
Flexible schema design
Data Compression
CQL language (like SQL)
Support for key languages and platforms
No need for special hardware or software
22. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
23
Winter 2015
Why Cassandra?
Capable of comfortably scaling to petabytes
New nodes = Linear performance increases
Add new nodes online
Big Data Scalability
1
2
Double Throughput
Capabilities
1
2
3
4
23. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
24
Winter 2015
Why Cassandra?
All nodes the same
Customized replication affords tunable data redundancy
Read/write from any node
Can replicate data among different physical data center
racks
No Single Point of Failure
24. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
25
Winter 2015
Why Cassandra?
Peer-to-peer architecture removes need for special
caching layer and the programming that goes with it
The database cluster uses the memory from all
participating nodes to cache the data assigned to each
node
No irregularities between a memory cache and database
are encountered
No Need for Caching Software
Database Server
Memcached Servers
Application Servers
Writes
Reads
25. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
26
Winter 2015
Why Cassandra?
Uses Google’s Snappy data compression algorithm
Compresses data on a per column family level
Internal tests at DataStax show up to 80%+ compression
of raw data
No performance penalty (and some increases in overall
performance due to less physical I/O)!
Data Compression
Portfolio Keyspace
Customer Column Family
26. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
27
Winter 2015
Why Cassandra?
Very similar to RDBMS SQL syntax
Create objects via DDL (e.g. CREATE…)
Core DML commands supported: INSERT, UPDATE,
DELETE
Query data with SELECT
CQL Language
Portfolio Keyspace
1
2
3
4
5
6
SELECT *
FROM USERS
WHERE STATE = ‘TX’;
27. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
28
Winter 2015
Comparison with MySQL
MySQL > 50 GB Data
Writes Average : ~300 ms
Reads Average : ~350 ms
Stats provided by Authors using facebook data.
Cassandra > 50 GB Data
Writes Average : 0.12 ms
Reads Average : 15 ms
29. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
30
Winter 2015
Where to get Cassandra?
Go to www.datastax.com
DataStax makes free smart start installers available for
Cassandra that include:
• The most up-to-date Cassandra version that is production quality
• A version of DataStax OpsCenter, which is a visual, browser-
based management tool for managing and monitoring
Cassandra
• Drivers and connectors for popular development languages
• Same database and application
• Automatic configuration assistance for ensuring optimal
performance and setup for either stand-alone or cluster
implementations
• Getting Started Guide
30. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
31
Winter 2015
Where Can I Learn More?
www.datastax.com
Free Online Documentation
User/Customer Cas Studies
Technical White Papers
Software downloads
Technical Articles
User Forums
Videos
Tutorials
FAQ’s
Blogs
31. +
NoSQL (part 2) - CAP Theorem & Column Oriented
33
32
Winter 2015
Resources
Sites
Cassandra
• http://cassandra.apache.org
NoSQL News websites
• http://nosql.mypopescu.com
• http://www.nosqldatabases.com
“a practical guide to noSQL”, Posted by Denise Miura on
March 17, 2011 at
• http://blogs.marklogic.com/2011/03/17/a-practical-
guide-to-nosql/