2. www.edureka.in/cassandra
Slide 2
Course Structure
Module 1:
Getting Started With Cassandra
Module 2:
Understanding Cassandra Data Model
Module 3:
Understanding Cassandra Architecture
Module 4:
Creating Sample Application
Module 5:
Configuring, Monitoring, Maintenance and
Tuning Cassandra
Module 6:
Integrating Cassandra With Hadoop
Module 7:
CRUD operations in Cassandra
Module 8:
Live Project
3. www.edureka.in/cassandra
Slide 3
How it Works?
Live Classes
Class Recordings
Module wise Quizzes, Coding Assignments
24x7 on-demand Technical Support
Sample Application and Live Project
Online Certification Exam
Lifetime access to the Learning Management System
4. www.edureka.in/cassandra
Slide 4
Module 1
Getting Started With Cassandra
New Problems which can’t be handled by traditional RDBMS
Tradeoff between Consistency, Availability, Partition Tolerance (CAP theorem)
What are the different solutions available?
What is Cassandra?
Use-Cases for Cassandra
Cassandra Features – Tunable Consistency, P2P Architecture, Elastic Scalability, Col Orientation
Demo Application using Cassandra
Questions?
5. www.edureka.in/cassandra
Slide 5
Module 2
Understanding Cassandra Data Model
Understand what database model is.
Understand the analogy between the RDBMS and Cassandra Data Model.
Understand the following Cassandra database elements:
Cluster
Keyspaces
Column Families
Columns
Super Columns
Rows
Indexes in Cassandra
Primary and Composite Keys and their limitations
Design Differences between RDBMS and Cassandra
Materialized Views
Valueless Columns
Aggregate Keys
6. www.edureka.in/cassandra
Slide 6
Module 3
Understanding Cassandra Architecture
Learn about the System Keyspaces
Learn about internode communication such as Peer to Peer structure as well as Gossip Protocols
Learn how Cassandra detects the failures in the nodes and repairs it
Learn about Anti Entropy and Read Repair
Learn about the Memtables, Sstables, and Commit logs
Hinted Handoffs
Compaction
Bloom Filters
Tombstones
SEDA
Manager and Services
7. www.edureka.in/cassandra
Slide 7
Module 4
Creating Sample Application
Identify challenges faced by RDBMS
Identify various possible available solutions
Identify the rational behind choosing Cassandra
Understand how data modelling differs in Cassandra from traditional relational databases
Understand how queries are used to design Cassandra data model
Apply Cassandra data modelling to various use cases
Create the application which would involve creating various data elements you learned about in
Module 2
Perform batch updates and search column families
Overview of the whole project specifying how Cassandra solved the problem which was laid out
in the beginning
8. www.edureka.in/cassandra
Slide 8
Module 5
Configuring, Monitoring, Maintenance and Tuning Cassandra
Learn about various options of configuring Keyspaces and Column Families
Learn about various Cassandra Replacement Strategies
Learn about Replication
Learn about Partitioners
Learn about Snitches
Learn about configuring Cluster
Learn about Security
Learn about Monitoring Cassandra Cluster
Learn about Cassandra Maintenance
Getting Ring information
Basic Maintenance
Snapshots
Load Balancing
Decommissioning and Updating nodes
Learn about Performance Tuning
Data storage, Reply timeouts
Commit Logs, MemTables, Caching and Buffer sizes
9. www.edureka.in/cassandra
Slide 9
Integrating Cassandra with Hadoop
Learn what Hadoop is
Learn Hadoop Disribution File System
Learn how to work with Map Reduce
Learn Tools like PIG and HIVE
Learn PIG and HIVE interaction with Cassandra
Module 6
10. www.edureka.in/cassandra
Slide 10
CRUD Operations in Cassandra
Learn about Reading and writing data in Cassandra
Learn about Cassandra API (Thrift)
Learn about Slice Predicates
Learn Data Definition Language (DDL) in Cassandra
Learn Data Manipulation Language (DML) statements within Cassandra
Learn to execute CQL scripts from with in CQL and from Command prompt
Learn to Create and Modify Users
Learn about Batch Mutates and Batch Deletes
Learn various Security configurations in Cassandra
Learn to Capture CQL outputs to a file
Learn to Import and Export data with CQL
Module 7
12. www.edureka.in/cassandra
Slide 12
What are we going to learn today?
New Problems which can’t be handled by traditional RDBMS
Tradeoff between Consistency, Availability, Partition Tolerance (CAP theorem)
What are the different solutions available?
What is Cassandra?
Use-Cases for Cassandra
Cassandra Features – Tunable Consistency, P2P Architecture, Elastic Scalability, Column Orientation
Demo Application using Cassandra
Questions
18. www.edureka.in/cassandra
Slide 18
So, What Is Common?
Huge Data
Fast Random access
Variable Schema
Need of Compression
High Availability
Need for Consistency
Need of Distribution (Sharding)
21. www.edureka.in/cassandra
Slide 21
NoSQL Database types
CouchDB, MongoDB
Collection of key value
Connections
Incomplete Data
Tolerant
Query Performance, No
Standard Query Syntax
Hbase, Cassandra
Column Families
Fast Look-ups
Very Low Level API
Amazon Simple DB,
Redis
Collection of Key
Value pairs
Fast Look-ups
Stored Data
has no Schema
InfoGrid, Infinite Graph
“Property Graph” - Nodes
Graph Algorithms – Shortest
Path, Connected ness, Etc
Not easy to Cluster, traverse
whole graph to get answer
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Data Model
Example
Weakness
Strength
Document Data
Store Databases
Key Value
Databases
Columnar NoSQL
Databases
Graph NoSQL
Databases
No SQL
Database Types
28. www.edureka.in/cassandra
Slide 28
More Steps…
Database Configuration
Caching Layer
Consistency problem between the updates in the Cache and
updates in the databases - Problem gets complex over clusters
Might mean manipulating the Write - Turning write logs off—
Not a desirable situation
30. www.edureka.in/cassandra
Slide 30
Why to use Cassandra?
Why to Use Cassandra…?
For High Velocity Data
Writing Data Anywhere,
Everywhere
Scaling Writes and Reads
No Downtime
Scaling Out Strategy
Scaling for both READS
and WRITES
Voluminous Data
Data Originating from
Multiple Locations
Retaining Data for Long
Storing all types of Data
Delivering Fast Response
Time
Keeping Business Online and
Serving Customers
32. www.edureka.in/cassandra
Slide 32
Column Oriented
Emp_no Dept_id Hire_date Emp_In Emp_fn
1 2 2010-08-05 Teresa Annie
2 4 2012-03-10 Ronald Susane
3 3 2012-11-06 Brown Donald
4 3 2011-07-03 Ruth David
5 1 2010-09-12 Stancy Elizabeth
6 2 2012-10-03 Catherine Amelia
1 2 2010-08-05 Teresa Annie
2 4 2012-03-10 Ronald Susane
3 3 2012-11-06 Brown Donald
1 2 3 4 5
2010-
08-05
2012-
03-10
2012-
11-06
2011-
07-03
2010-
09-12
2 4 3 3 1
Row-Oriented Database
Column-Oriented Database
33. www.edureka.in/cassandra
Slide 33
Schema Free
Primary Key First Name Last Name E-mail ID
1 Avril D’Souza NULL
2 David Gomes davidgomes1@yahoo.com
3 Susane NULL NULL
First Name Last Name
Avril D’Souza
First Name Last Name E-mail ID
David Gomes davidgomes1@yahoo.com
First Name
Susane
Schema Based Table
Schema Free
34. www.edureka.in/cassandra
Slide 34
Brewer’s CAP Theorem
http://www.w3resource.com/mongodb/nosql.php
Consistency
Partition
Tolerance
Availability
CA CP
AP
RDBMS MongoDB
HBase
Redis
CouchDB Cassandra DynamoDB Riak
38. www.edureka.in/cassandra
Slide 38
E-Commerce (Travel Portal)
Both B2B & B2C Consumers
High volume of shopping transactions
(> 500 Million Visits / Day)
High volume supply changes
(Manual & System) generated.
Huge Inventory Database
(Millions of hotels)
High Read/Write
(Thousands Reads & Writes/Second)
Application has to 99.99% Available
Fault Tolerant & Reliable.
Fast & Quick Shopping Experience.
Elastic Scale
Innovative Recommendations & Algorithms.
Should be fast for new changes
Should be cost effective for maintenance.
Development Approaches
Legacy Way (Pure RDBMS)
Augmented (RDBMS + Caching, Heavy
Database Hardware)
Using Cassandra
Cassandra Usecase - Summary
39. www.edureka.in/cassandra
Slide 39
Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available,
fault-tolerant, Tuneably consistent, column-oriented database.
What is Apache Cassandra?
Cassandra Features
Open
Source
Distributed
Decentralized
Elastically
Scalable
Highly
Scalable
Fault
Tolerant
Tuneably
Consistent
Column
Oriented
41. www.edureka.in/cassandra
Slide 41
Every Node Is Identical.
Peer to Peer Protocol and uses Gossip Protocol to
maintain and keep the List of nodes in Sync.
No Single Point of Failure.
No Special Host to Coordinate Activities.
Easier to Operate and Maintain because all nodes
are same.
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
Ccy Courier Stationary
Distributed and Decentralized
42. www.edureka.in/cassandra
Slide 42
Types of Scalability
Vertical Scalability
Horizontal Scalability
What is Elastic Scalability?
This is special property of Horizontal Scalability.
The cluster can seamlessly scale up and scale back down without major disruption.
Elastic Scalability
43. www.edureka.in/cassandra
Slide 43
Cluster must accept new nodes without major disruption or
reconfiguration.
ADD A NODE AND MOVE ON!!
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
Ccy Courier Stationary
CCY, Stationary,
Letter/Couriers
Process should not be restarted
Do not have to change application charges
Don’t have to rebalance data
Elastic Scalability
44. www.edureka.in/cassandra
Slide 44
Highly Available
No Downtime
High Availability and Fault Tolerance
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
Ccy Courier Stationary
46. www.edureka.in/cassandra
Slide 46
Cassandra was designed specifically from the ground up to take full advantage
of multiprocessor/ multicore machines, and to run across many dozens of
these machines housed in multiple data centres.
It scales consistently and seamlessly to hundreds of terabytes.
Shows exceptional performance under heavy loads.
Consistently shows very fast throughput for writes per second on a basic
commodity workstation.
High Performance
47. www.edureka.in/cassandra
Slide 47
Use if your application has:
Big Data (Billions Of Records Rows & Columns)
Very High Velocity Random Reads & Writes
Flexible Sparse / Wide Column Requirements
No Multiple Secondary Index Needs
Low Latency
Use Cases:
eCommerce Inventory Cache Use Cases
Time Series / Events Use Cases
Feed Based Activities / Use Cases
Where to Use Cassandra?
48. www.edureka.in/cassandra
Slide 48
Where NOT to Use Cassandra?
Don’t Use if you application has:
Secondary Indexes.
Relational Data.
Transactional (Rollback, Commit)
Primary & Financial Records.
Stringent Security & Authorization Needs On Data
Dynamic Queries on Columns.
Searching Column Data
Low Latency
49. www.edureka.in/cassandra
Slide 49
Cassandra Installation & Configuration
Conf/cassandra.yaml
Tools
Key Space Setup
Column Family / Data Model Setup
Key
Columns & Data Types
Indexes (Primary & Secondary)
Programmatic Consistency
Thrift Hector API
CQL3 API
Application Demo
57. www.edureka.in/cassandra
Slide 57
Module 2
Understanding Cassandra Data Model
Understand what database model is.
Understand the analogy between the RDBMS and Cassandra Data Model.
Understand the following Cassandra database elements:
Cluster
Keyspaces
Column Families
Columns
Super Columns
Rows
Indexes in Cassandra
Primary and Composite Keys and their limitations
Design Differences between RDBMS and Cassandra
Materialized Views
Valueless Columns
Aggregate Keys