SlideShare ist ein Scribd-Unternehmen logo
1 von 40
OVERVIEW AND REAL WORLD 
APPLICATIONS 
Cassandra 
Jersey Shore Tech Meetup 
Nov 13, 2014
You Are Not Here… 
*** http://njhalloffame.org/ 
2
Agenda 
3 
 Some Basic Concepts/Overview 
 New Developments In Cassandra 
 Basic Data Modeling Concepts 
 Materialized Views 
 Secondary Indexes 
 Counters 
 Time Series Data 
 Expiring Data
Cassandra High Level 
4 
Cassandra's architecture is based on the combination 
of two technologies: 
 Google BigTable – Data Model 
 Amazon Dynamo – Distributed Architecture 
BTW – these mean the same thing -> 
Cassandra = C*
Architecture Basics & Terminology 
5 
 Nodes are single instances of C* 
 Cluster is a group of nodes 
 Data is organized by keys (tokens) which are 
distributed across the cluster 
 Replication Factor (rf) determines how many copies 
are key 
 Data Center Aware – works well in multi-DC/EC2 
etc. 
 Consistency Level – powerful feature to tune 
consistency vs. speed vs. availability.’
C* Ring 
6
More Architecture 
7 
 Information on who has what data and who is 
available is transferred using gossip. 
 No single point of failure (SPF), every node can 
service requests. 
 Handles Replication and Downed Nodes (within 
reason)
CAP Theorem 
8 
 Distributed Systems Law: 
 Consistency 
 Availability 
 Partition Tolerance 
(you can only really have two in a distributed system) 
 Cassandra is AP with Eventual Consistency
Consistency 
9 
 Cassandra Uses the concept of Tunable Consistency, 
which make it very powerful and flexible for system 
needs.
C* Persistence Model 
10
Read Path 
11
Write Path 
12
Data Model Architecture 
13 
 Keyspace – container of column families (tables). 
Defines RF among others. 
 Table – column family. Contains definition of 
schema. 
 Row – a “record” identified by a key 
 Column - a key and a value
14
Deletions 
15 
 Distributed systems present unique problem for 
deletes. If it actually deleted data and a node was 
down and didn’t receive the delete notice it would try 
and create record when came back online. So… 
 Tombstone - The data is replaced with a special 
value called a Tombstone, works within distributed 
architecture
Keys 
16 
 Primary Key 
 Partition Key – identifies a row 
 Cluster Key – sorting within a row 
 Using CQL these are defined together as a compound 
(composite) key 
 Compound keys are how you implement “wide 
rows”, the COOL FEATURE!
Single Primary Key 
17 
create table users ( 
user_id UUID PRIMARY KEY, 
firstname text, 
lastname text, 
emailaddres text 
); 
** Cassandra Data Types 
http://www.datastax.com/documentation/cql/3.0/cql/cql_ref 
erence/cql_data_types_c.html
Compound Key 
18 
create table users ( 
emailaddress text, 
department text, 
firstname text, 
lastname text, 
PRIMARY KEY (emailaddress, department) 
); 
 Partition Key plus Cluster Key 
 emailaddress is partition key 
 department is cluster key
Compound Key 
19 
create table users ( 
emailaddress text, 
department text, 
country text, 
firstname text, 
lastname text, 
PRIMARY KEY ((emailaddress, department), country) 
); 
 Partition Key plus Cluster Key 
 Emailaddress & department is partition key 
 country is cluster key
New Rules 
20 
 Writes Are Cheap 
 Denormalize All You Need 
 Model Your Queries, Not Data (understand access 
patterns) 
 Application Worries About Joins
What’s New In 2.0 
21 
Conditional DDL 
IF Exists or If Not Exists 
Drop Column Support 
ALTER TABLE users DROP lastname;
More New Stuff 
22 
 Triggers 
CREATE TRIGGER myTrigger 
ON myTable 
USING 'com.thejavaexperts.cassandra.updateevt' 
 Lightweight Transactions (CAS) 
UPDATE users 
SET firstname = 'tim' 
WHERE emailaddress = 'tpeters@example.com' 
IF firstname = 'tom'; 
** Not like an ACID Transaction!!
CAS & Transactions 
23 
 CAS - compare-and-set operations. In a single, 
atomic operation compares a value of a column in 
the database and applying a modification depending 
on the result of the comparison. 
 Consider performance hit. CAS is (was) considered 
an anti-pattern.
Data Modeling… The Basics 
24 
 Cassandra now is very familiar to RDBMS/SQL 
users. 
 Very nicely hides the underlying data storage model. 
 Still have all the power of Cassandra, it is all in the 
key definition. 
RDBMS = model data 
Cassandra = model access (queries)
Side-Note On Querying 
25 
 Create table with compound key 
 Select using ALLOW FILTERING 
 Counts 
 Select using IN or =
Batch Operations 
26 
 Saves Network Roundtrips 
 Can contain INSERT, UPDATE, DELETE 
 Atomic by default (all or nothing) 
 Can use timestamp for specific ordering
Batch Operation Example 
27 
BEGIN BATCH 
INSERT INTO users (emailaddress, firstname, lastname, country) values 
('brian.enochson@gmail.com', 'brian', 'enochson', 'USA'); 
INSERT INTO users (emailaddress, firstname, lastname, country) values 
('tpeters@example.com', 'tom', 'peters', 'DE'); 
INSERT INTO users (emailaddress, firstname, lastname, country) values 
('jsmith@example.com', 'jim', 'smith', 'USA'); 
INSERT INTO users (emailaddress, firstname, lastname, country) values 
('arogers@example.com', 'alan', 'rogers', 'USA'); 
DELETE FROM users WHERE emailaddress = 'jsmith@example.com'; 
APPLY BATCH; 
 select in cqlsh 
 List in cassandra-cli with timestamp
More Data Modeling… 
28 
 No Joins 
 No Foreign Keys 
 No Third (or any other) Normal Form Concerns 
 Redundant Data Encouraged. Apps maintain 
consistency.
Secondary Indexes 
29 
 Allow defining indexes to allow other access than 
partition key. 
 Each node has a local index for its data. 
 They have uses, but shouldn’t be used all the time 
without consideration. 
 We will look at alternatives.
Secondary Index Example 
30 
 Create a table 
 Try to select with column not in PK 
 Add Secondary Index 
 Try select again. (maybe need to reinsert)
When to use? 
31 
 Low Cardinality – small number of unique values 
 High Cardinality – high number of distinct values 
 Secondary Indexes are good for Low Cardinality. So 
country codes, department codes etc. Not email 
addresses.
Materialized View 
32 
 Want full distribution can use what is called a 
Materialized View pattern. 
 Remember redundant data is fine. 
 Model the queries
Materialized View Example 
33 
 Show normal able with compound key and querying 
limitations 
 Create Materialized View Table With Different 
Compound Key, support alternate access. 
 Selects use partition key. 
 Secondary indexes local, not distributed 
 Allow filtering. Can cause performance issues
Counters 
34 
 Updated in 2.1 and now work in a more distributed 
and accurate manner. 
 Table organization, example 
 How to update, view etc.
Time Series Example…. 
35 
 Time series table model. 
 Need to consider interval for event frequency and 
wide row size. 
 Make what is tracked by time and unit of interval 
partition key.
Time Series Data 
36 
 Due to its quick writing model Cassandra is suited 
for storing time series data. 
 The Cassandra wide row is a perfect fit for modeling 
time series / time based events. 
 Let’s look at an example….
Event Data 
37 
 Notice primary key and cluster key. 
 Insert some data 
 View in CQL, then in CLI as wide row
TTL – Self Expiring Data 
38 
 Another technique is data that has a defined lifespan. 
 For instance session identifiers, temporary 
passwords etc. 
 For this Cassandra provides a Time To Live (TTL) 
mechanism.
TTL Example… 
39 
 Create table 
 Insert data using TTL 
 Can update specific column with table 
 Show using selects.
Questions 
40 
 http://www.thejavaexperts.net/ 
 Email: brian.enochson@gmail.com 
 Twitter: @benochso 
 G+: https://plus.google.com/+BrianEnochson

Weitere ähnliche Inhalte

Was ist angesagt?

Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandraWu Liang
 
Ado.Net Architecture
Ado.Net ArchitectureAdo.Net Architecture
Ado.Net ArchitectureUmar Farooq
 
Data decomposition techniques
Data decomposition techniquesData decomposition techniques
Data decomposition techniquesMohamed Ramadan
 
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionStar Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionFranck Pachot
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Anyscale
 
Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016) Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016) Ameer B. Alaasam
 
MySQL 8.0 Featured for Developers
MySQL 8.0 Featured for DevelopersMySQL 8.0 Featured for Developers
MySQL 8.0 Featured for DevelopersDave Stokes
 

Was ist angesagt? (10)

Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandra
 
Ado.Net Architecture
Ado.Net ArchitectureAdo.Net Architecture
Ado.Net Architecture
 
Data decomposition techniques
Data decomposition techniquesData decomposition techniques
Data decomposition techniques
 
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory optionStar Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
Star Transformation, 12c Adaptive Bitmap Pruning and In-Memory option
 
FractalTreeIndex
FractalTreeIndexFractalTreeIndex
FractalTreeIndex
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
06 linked list
06 linked list06 linked list
06 linked list
 
Db2 faqs
Db2 faqsDb2 faqs
Db2 faqs
 
Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016) Data structures "1" (Lectures 2015-2016)
Data structures "1" (Lectures 2015-2016)
 
MySQL 8.0 Featured for Developers
MySQL 8.0 Featured for DevelopersMySQL 8.0 Featured for Developers
MySQL 8.0 Featured for Developers
 

Andere mochten auch

Yimby and growing your audience from zero to lots
Yimby and growing your audience from zero to lotsYimby and growing your audience from zero to lots
Yimby and growing your audience from zero to lotsJonathan Waddingham
 
Medical Information Workshop (23 Jan 2007 )
Medical Information Workshop (23 Jan 2007 )Medical Information Workshop (23 Jan 2007 )
Medical Information Workshop (23 Jan 2007 )rwakefor
 
Kathleen's Powerpoint Presentation in Sir Rey's Computer Class
Kathleen's Powerpoint Presentation in Sir Rey's Computer ClassKathleen's Powerpoint Presentation in Sir Rey's Computer Class
Kathleen's Powerpoint Presentation in Sir Rey's Computer Classrey ayento
 
Postal De Nadal 2008 09 Manel Sons
Postal De Nadal 2008 09 Manel SonsPostal De Nadal 2008 09 Manel Sons
Postal De Nadal 2008 09 Manel Sonsmanelagui
 
10th Ceepus – Biomedicine Students’ Council Summer Eng
10th Ceepus – Biomedicine Students’ Council Summer Eng10th Ceepus – Biomedicine Students’ Council Summer Eng
10th Ceepus – Biomedicine Students’ Council Summer EngSU07
 
Autotools
Autotools Autotools
Autotools easychen
 
Presentatie Lizzy Jongma Masterclass Open Cultuur Data
Presentatie Lizzy Jongma Masterclass Open Cultuur DataPresentatie Lizzy Jongma Masterclass Open Cultuur Data
Presentatie Lizzy Jongma Masterclass Open Cultuur DataKennisland
 
Enabling co-­creation of e-services through virtual worlds
Enabling co-­creation of e-services through virtual worldsEnabling co-­creation of e-services through virtual worlds
Enabling co-­creation of e-services through virtual worldsThomas Kohler
 
Integrating PHP With System-i using Web Services
Integrating PHP With System-i using Web ServicesIntegrating PHP With System-i using Web Services
Integrating PHP With System-i using Web ServicesIvo Jansch
 
Social media and local government
Social media and local governmentSocial media and local government
Social media and local governmentsimonwakeman
 
Achievo ATK, an Open Source project
Achievo ATK, an Open Source projectAchievo ATK, an Open Source project
Achievo ATK, an Open Source projectIvo Jansch
 
Ict4volunteering Mm
Ict4volunteering MmIct4volunteering Mm
Ict4volunteering Mmhavs
 
ICT Sustainability
ICT SustainabilityICT Sustainability
ICT Sustainabilityhavs
 
Publizitate Eraginkortasunaren Baliospena 5
Publizitate Eraginkortasunaren Baliospena 5Publizitate Eraginkortasunaren Baliospena 5
Publizitate Eraginkortasunaren Baliospena 5katixa
 

Andere mochten auch (20)

Yimby and growing your audience from zero to lots
Yimby and growing your audience from zero to lotsYimby and growing your audience from zero to lots
Yimby and growing your audience from zero to lots
 
Medical Information Workshop (23 Jan 2007 )
Medical Information Workshop (23 Jan 2007 )Medical Information Workshop (23 Jan 2007 )
Medical Information Workshop (23 Jan 2007 )
 
Kathleen's Powerpoint Presentation in Sir Rey's Computer Class
Kathleen's Powerpoint Presentation in Sir Rey's Computer ClassKathleen's Powerpoint Presentation in Sir Rey's Computer Class
Kathleen's Powerpoint Presentation in Sir Rey's Computer Class
 
Postal De Nadal 2008 09 Manel Sons
Postal De Nadal 2008 09 Manel SonsPostal De Nadal 2008 09 Manel Sons
Postal De Nadal 2008 09 Manel Sons
 
10th Ceepus – Biomedicine Students’ Council Summer Eng
10th Ceepus – Biomedicine Students’ Council Summer Eng10th Ceepus – Biomedicine Students’ Council Summer Eng
10th Ceepus – Biomedicine Students’ Council Summer Eng
 
Autotools
Autotools Autotools
Autotools
 
Presentatie Lizzy Jongma Masterclass Open Cultuur Data
Presentatie Lizzy Jongma Masterclass Open Cultuur DataPresentatie Lizzy Jongma Masterclass Open Cultuur Data
Presentatie Lizzy Jongma Masterclass Open Cultuur Data
 
Enabling co-­creation of e-services through virtual worlds
Enabling co-­creation of e-services through virtual worldsEnabling co-­creation of e-services through virtual worlds
Enabling co-­creation of e-services through virtual worlds
 
Integrating PHP With System-i using Web Services
Integrating PHP With System-i using Web ServicesIntegrating PHP With System-i using Web Services
Integrating PHP With System-i using Web Services
 
Social media and local government
Social media and local governmentSocial media and local government
Social media and local government
 
Achievo ATK, an Open Source project
Achievo ATK, an Open Source projectAchievo ATK, an Open Source project
Achievo ATK, an Open Source project
 
Ict4volunteering Mm
Ict4volunteering MmIct4volunteering Mm
Ict4volunteering Mm
 
IoF South West Conference
IoF South West ConferenceIoF South West Conference
IoF South West Conference
 
HTML5 - Um Ano Depois
HTML5 - Um Ano DepoisHTML5 - Um Ano Depois
HTML5 - Um Ano Depois
 
MapIt1418
MapIt1418MapIt1418
MapIt1418
 
Good Luck
Good LuckGood Luck
Good Luck
 
Visual image
Visual imageVisual image
Visual image
 
ICT Sustainability
ICT SustainabilityICT Sustainability
ICT Sustainability
 
H1B 2017 Predictions: Will There Be A H-1B Lottery Again?
H1B 2017 Predictions: Will There Be A H-1B Lottery Again?H1B 2017 Predictions: Will There Be A H-1B Lottery Again?
H1B 2017 Predictions: Will There Be A H-1B Lottery Again?
 
Publizitate Eraginkortasunaren Baliospena 5
Publizitate Eraginkortasunaren Baliospena 5Publizitate Eraginkortasunaren Baliospena 5
Publizitate Eraginkortasunaren Baliospena 5
 

Ähnlich wie Cassandra20141113

A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cqlzznate
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceDataStax Academy
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELJenny Liu
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryAndrey Lomakin
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruTim Callaghan
 
Vsam interview questions and answers.
Vsam interview questions and answers.Vsam interview questions and answers.
Vsam interview questions and answers.Sweta Singh
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
SenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige WhiteSenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige WhiteSencha
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQLPankaj Khattar
 

Ähnlich wie Cassandra20141113 (20)

A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis Price
 
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLELA TALE of DATA PATTERN DISCOVERY IN PARALLEL
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
 
Cassandra no sql ecosystem
Cassandra no sql ecosystemCassandra no sql ecosystem
Cassandra no sql ecosystem
 
Apache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machineryApache Cassandra, part 2 – data model example, machinery
Apache Cassandra, part 2 – data model example, machinery
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
Vsam interview questions and answers.
Vsam interview questions and answers.Vsam interview questions and answers.
Vsam interview questions and answers.
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
7. SQL.pptx
7. SQL.pptx7. SQL.pptx
7. SQL.pptx
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
SenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige WhiteSenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige White
 
Learning Cassandra NoSQL
Learning Cassandra NoSQLLearning Cassandra NoSQL
Learning Cassandra NoSQL
 

Mehr von Brian Enochson

Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBrian Enochson
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Cassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingCassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingBrian Enochson
 

Mehr von Brian Enochson (6)

Hadoop20141125
Hadoop20141125Hadoop20141125
Hadoop20141125
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Big Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and CassasdraBig Data, NoSQL with MongoDB and Cassasdra
Big Data, NoSQL with MongoDB and Cassasdra
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Cassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data ModelingCassandra Deep Diver & Data Modeling
Cassandra Deep Diver & Data Modeling
 

Kürzlich hochgeladen

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Kürzlich hochgeladen (20)

Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

Cassandra20141113

  • 1. OVERVIEW AND REAL WORLD APPLICATIONS Cassandra Jersey Shore Tech Meetup Nov 13, 2014
  • 2. You Are Not Here… *** http://njhalloffame.org/ 2
  • 3. Agenda 3  Some Basic Concepts/Overview  New Developments In Cassandra  Basic Data Modeling Concepts  Materialized Views  Secondary Indexes  Counters  Time Series Data  Expiring Data
  • 4. Cassandra High Level 4 Cassandra's architecture is based on the combination of two technologies:  Google BigTable – Data Model  Amazon Dynamo – Distributed Architecture BTW – these mean the same thing -> Cassandra = C*
  • 5. Architecture Basics & Terminology 5  Nodes are single instances of C*  Cluster is a group of nodes  Data is organized by keys (tokens) which are distributed across the cluster  Replication Factor (rf) determines how many copies are key  Data Center Aware – works well in multi-DC/EC2 etc.  Consistency Level – powerful feature to tune consistency vs. speed vs. availability.’
  • 7. More Architecture 7  Information on who has what data and who is available is transferred using gossip.  No single point of failure (SPF), every node can service requests.  Handles Replication and Downed Nodes (within reason)
  • 8. CAP Theorem 8  Distributed Systems Law:  Consistency  Availability  Partition Tolerance (you can only really have two in a distributed system)  Cassandra is AP with Eventual Consistency
  • 9. Consistency 9  Cassandra Uses the concept of Tunable Consistency, which make it very powerful and flexible for system needs.
  • 13. Data Model Architecture 13  Keyspace – container of column families (tables). Defines RF among others.  Table – column family. Contains definition of schema.  Row – a “record” identified by a key  Column - a key and a value
  • 14. 14
  • 15. Deletions 15  Distributed systems present unique problem for deletes. If it actually deleted data and a node was down and didn’t receive the delete notice it would try and create record when came back online. So…  Tombstone - The data is replaced with a special value called a Tombstone, works within distributed architecture
  • 16. Keys 16  Primary Key  Partition Key – identifies a row  Cluster Key – sorting within a row  Using CQL these are defined together as a compound (composite) key  Compound keys are how you implement “wide rows”, the COOL FEATURE!
  • 17. Single Primary Key 17 create table users ( user_id UUID PRIMARY KEY, firstname text, lastname text, emailaddres text ); ** Cassandra Data Types http://www.datastax.com/documentation/cql/3.0/cql/cql_ref erence/cql_data_types_c.html
  • 18. Compound Key 18 create table users ( emailaddress text, department text, firstname text, lastname text, PRIMARY KEY (emailaddress, department) );  Partition Key plus Cluster Key  emailaddress is partition key  department is cluster key
  • 19. Compound Key 19 create table users ( emailaddress text, department text, country text, firstname text, lastname text, PRIMARY KEY ((emailaddress, department), country) );  Partition Key plus Cluster Key  Emailaddress & department is partition key  country is cluster key
  • 20. New Rules 20  Writes Are Cheap  Denormalize All You Need  Model Your Queries, Not Data (understand access patterns)  Application Worries About Joins
  • 21. What’s New In 2.0 21 Conditional DDL IF Exists or If Not Exists Drop Column Support ALTER TABLE users DROP lastname;
  • 22. More New Stuff 22  Triggers CREATE TRIGGER myTrigger ON myTable USING 'com.thejavaexperts.cassandra.updateevt'  Lightweight Transactions (CAS) UPDATE users SET firstname = 'tim' WHERE emailaddress = 'tpeters@example.com' IF firstname = 'tom'; ** Not like an ACID Transaction!!
  • 23. CAS & Transactions 23  CAS - compare-and-set operations. In a single, atomic operation compares a value of a column in the database and applying a modification depending on the result of the comparison.  Consider performance hit. CAS is (was) considered an anti-pattern.
  • 24. Data Modeling… The Basics 24  Cassandra now is very familiar to RDBMS/SQL users.  Very nicely hides the underlying data storage model.  Still have all the power of Cassandra, it is all in the key definition. RDBMS = model data Cassandra = model access (queries)
  • 25. Side-Note On Querying 25  Create table with compound key  Select using ALLOW FILTERING  Counts  Select using IN or =
  • 26. Batch Operations 26  Saves Network Roundtrips  Can contain INSERT, UPDATE, DELETE  Atomic by default (all or nothing)  Can use timestamp for specific ordering
  • 27. Batch Operation Example 27 BEGIN BATCH INSERT INTO users (emailaddress, firstname, lastname, country) values ('brian.enochson@gmail.com', 'brian', 'enochson', 'USA'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('tpeters@example.com', 'tom', 'peters', 'DE'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('jsmith@example.com', 'jim', 'smith', 'USA'); INSERT INTO users (emailaddress, firstname, lastname, country) values ('arogers@example.com', 'alan', 'rogers', 'USA'); DELETE FROM users WHERE emailaddress = 'jsmith@example.com'; APPLY BATCH;  select in cqlsh  List in cassandra-cli with timestamp
  • 28. More Data Modeling… 28  No Joins  No Foreign Keys  No Third (or any other) Normal Form Concerns  Redundant Data Encouraged. Apps maintain consistency.
  • 29. Secondary Indexes 29  Allow defining indexes to allow other access than partition key.  Each node has a local index for its data.  They have uses, but shouldn’t be used all the time without consideration.  We will look at alternatives.
  • 30. Secondary Index Example 30  Create a table  Try to select with column not in PK  Add Secondary Index  Try select again. (maybe need to reinsert)
  • 31. When to use? 31  Low Cardinality – small number of unique values  High Cardinality – high number of distinct values  Secondary Indexes are good for Low Cardinality. So country codes, department codes etc. Not email addresses.
  • 32. Materialized View 32  Want full distribution can use what is called a Materialized View pattern.  Remember redundant data is fine.  Model the queries
  • 33. Materialized View Example 33  Show normal able with compound key and querying limitations  Create Materialized View Table With Different Compound Key, support alternate access.  Selects use partition key.  Secondary indexes local, not distributed  Allow filtering. Can cause performance issues
  • 34. Counters 34  Updated in 2.1 and now work in a more distributed and accurate manner.  Table organization, example  How to update, view etc.
  • 35. Time Series Example…. 35  Time series table model.  Need to consider interval for event frequency and wide row size.  Make what is tracked by time and unit of interval partition key.
  • 36. Time Series Data 36  Due to its quick writing model Cassandra is suited for storing time series data.  The Cassandra wide row is a perfect fit for modeling time series / time based events.  Let’s look at an example….
  • 37. Event Data 37  Notice primary key and cluster key.  Insert some data  View in CQL, then in CLI as wide row
  • 38. TTL – Self Expiring Data 38  Another technique is data that has a defined lifespan.  For instance session identifiers, temporary passwords etc.  For this Cassandra provides a Time To Live (TTL) mechanism.
  • 39. TTL Example… 39  Create table  Insert data using TTL  Can update specific column with table  Show using selects.
  • 40. Questions 40  http://www.thejavaexperts.net/  Email: brian.enochson@gmail.com  Twitter: @benochso  G+: https://plus.google.com/+BrianEnochson