SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Getting to know
by Michelle Darling
mdarlingcmt@gmail.com
August 2013
Agenda:
● What is Cassandra?
● Installation, CQL3
● Data Modelling
● Summary
Only 15 min to cover these, so
please hold questions til the
end, or email me :-) and I’ll
summarize Q&A for everyone.
Unfortunately, no time for:
● DB Admin
○ Detailed Architecture
○ Partitioning /
Consistent Hashing
○ Consistency Tuning
○ Data Distribution &
Replication
○ System Tables
● App Development
○ Using Python, Ruby etc
to access Cassandra
○ Using Hadoop to
stream data into
Cassandra
What is Cassandra?
“Fortuneteller of Doom”
from Greek Mythology. Tried to
warn others about future disasters,
but no one listened. Unfortunately,
she was 100% accurate.
NoSQL Distributed DB
● Consistency - A__ID
● Availability - High
● Point of Failure - none
● Good for Event
Tracking & Analysis
○ Time series data
○ Sensor device data
○ Social media analytics
○ Risk Analysis
○ Failure Prediction
Rackspace: “Which servers
are under heavy load
and are about to crash?”
The Evolution of Cassandra
2008: Open-Source Release / 2013: Enterprise & Community Editions
Data Model
● Wide rows, sparse arrays
● High performance through very
fast write throughput.
Infrastructure
● Peer-Peer Gossip
● Key-Value Pairs
● Tunable Consistency
2006
2005
● Originally for Inbox Search
● But now used for Instagram
Other NoSQL vs. Cassandra
NoSQL Taxonomy:
● Key-Value Pairs
○ Dynamo, Riak, Redis
● Column-Based
○ BigTable, HBase,
Cassandra
● Document-Based
○ MongoDB, Couchbase
● Graph
○ Neo4J
Big Data Capable
C* Differentiators:
● Production-proven at
Netflix, eBay, Twitter,
20 of Fortune 100
● “Clear Winner” in
Scalability,
Performance,
Availability
-- DataStax
Architecture
● Cluster (ring)
● Nodes (circles)
● Peer-to-Peer Model
● Gossip Protocol
Partitioner:
Consistent Hashing
Netflix
Streaming Video
● Personalized
Recommendations per
family member
● Built on Amazon Web
Services (AWS) +
Cassandra
Cloud installation using
● Amazon Web Services (AWS)
● Elastic Compute Cloud (EC2)
○ Free for the 1st year! Then pay only for what you use.
○ Sign up for AWS EC2 account: Big Data University Video 4:34 minutes,
● Amazon Machine Image (AMI)
○ Preconfigured installation template
○ Choose: “DataStax AMI for Cassandra
Community Edition”
○ Follow these *very good* step-by-step
instructions from DataStax.
○ AMIs also available for CouchBase, MongoDB
(make sure you pick the free tier community versions to avoid
monthly charge$$!!!).
AWS EC2 Dashboard
DataStax AMI Setup
DataStax AMI Setup
--clustername Michelle
--totalnodes 1
--version community
“Roll your Own” Installation
DataStax Community Edition
● Install instructions
For Linux, Windows,
MacOS:
http://www.datastax.com/2012/01/getting-
started-with-cassandra
● Video: “Set up a 4-
node Cassandra
cluster in under 2
minutes”
http://www.screenr.com/5G6
Invoke CQLSH, CREATE KEYSPACE
./bin/cqlsh
cqlsh> CREATE KEYSPACE big_data
… with strategy_class = ‘org.apache.cassandra.
locator.SimpleStrategy’
… with strategy_options:replication_factor=‘1’;
cqlsh> use big_data;
cqlsh:big_data>
Tip: Skip Thrift -- use CQL3
Thrift RPC
// Your Column
Column col = new Column(ByteBuffer.wrap("name".
getBytes()));
col.setValue(ByteBuffer.wrap("value".getBytes()));
col.setTimestamp(System.currentTimeMillis());
// Don't ask
ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);
// Prepare to be amazed
Mutation mutation = new Mutation();
mutation.setColumnOrSuperColumn(cosc);
List<Mutation> mutations = new ArrayList<Mutation>();
mutations.add(mutation);
Map mutations_map = new HashMap<ByteBuffer, Map<String,
List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set("Standard1", mutations);
mutations_map.put(ByteBuffer.wrap("key".getBytes()),
cf_map);
cassandra.batch_mutate(mutations_map,
consistency_level);
CQL3
- Uses cqlsh
- “SQL-like” language
- Runs on top of Thrift RPC
- Much more user-friendly.
Thrift code on left
equals this in CQL3:
INSERT INTO (id, name)
VALUES ('key',
'value');
CREATE TABLE
cqlsh:big_data> create table user_tags (
… user_id varchar,
… tag varchar,
… value counter,
… primary key (user_id, tag)
…):
● TABLE user_tags: “How many times has a user
mentioned a hashtag?”
● COUNTER datatype - Computes & stores counter value
at the time data is written. This optimizes query
performance.
UPDATE TABLE
SELECT FROM TABLE
cqlsh:big_data> UPDATE user_tags SET
value=value+1 WHERE user_id = ‘paul’ AND tag =
‘cassandra’
cqlsh:big_data> SELECT * FROM user_tags
user_id | tag | value
--------+-----------+----------
paul | cassandra | 1
DATA MODELING
A Major Paradigm Shift!
RDBMS Cassandra
Structured Data, Fixed Schema Unstructured Data, Flexible Schema
“Array of Arrays”
2D: ROW x COLUMN
“Nested Key-Value Pairs”
3D: ROW Key x COLUMN key x COLUMN values
DATABASE KEYSPACE
TABLE TABLE a.k.a COLUMN FAMILY
ROW ROW a.k.a PARTITION. Unit of replication.
COLUMN COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit
of storage. Up to 2 billion columns per row.
FOREIGN KEYS, JOINS,
ACID Consistency
Referential Integrity not enforced, so A_CID.
BUT relationships represented using COLLECTIONS.
Cassandra
3D+: Nested Objects
RDBMS
2D: Rows
x columns
Example:
“Twissandra” Web App
Twitter-Inspired
sample application
written in Python +
Cassandra.
● Play with the app:
twissandra.com
● Examine & learn
from the code on
GitHub.
Features/Queries:
● Sign In, Sign Up
● Post Tweet
● Userline (User’s tweets)
● Timeline (All tweets)
● Following (Users being
followed by user)
● Followers (Users
following this user)
Twissandra.com vs Twitter.com
Twissandra - RDBMS Version
Entities
● USER, TWEET
● FOLLOWER, FOLLOWING
● FRIENDS
Relationships:
● USER has many TWEETs.
● USER is a FOLLOWER of many
USERs.
● Many USERs are FOLLOWING
USER.
Twissandra - Cassandra Version
Tip: Model tables to mirror queries.
TABLES or CFs
● TWEET
● USER, USERNAME
● FOLLOWERS, FOLLOWING
● USERLINE, TIMELINE
Notes:
● Extra tables mirror queries.
● Denormalized tables are
“pre-formed”for faster
performance.
TABLE
Tip: Remember,
Skip Thrift -- use CQL3
What does C* data look like?
TABLE Userline
“List all of user’s Tweets”
*************
Row Key: user_id
Columns
● Column Key: tweet_id
● “at” Timestamp
● TTL (Time to Live) -
seconds til expiration
date.
*************
Cassandra Data Model = LEGOs?
FlexibleSchema
Summary:
● Go straight from SQL
to CQL3; skip Thrift, Column
Families, SuperColumns, etc
● Denormalize tables to
mirror important queries.
Roughly 1 table per impt query.
● Choose wisely:
○ Partition Keys
○ Cluster Keys
○ Indexes
○ TTL
○ Counters
○ Collections
See DataStax Music Service
Example
● Consider hybrid
approach:
○ 20% - RDBMS for highly
structured, OLTP, ACID
requirements.
○ 80% - Scale Cassandra to
handle the rest of data.
Remember:
● Cheap: storage,
servers, OpenSource
software.
● Precious: User AND
Developer Happiness.
Resources
C* Summit 2013:
● Slides
● Cassandra at eBay Scale (slides)
● Data Modelers Still Have Jobs -
Adjusting For the NoSQL
Environment (Slides)
● Real-time Analytics using
Cassandra, Spark and Shark
slides
● Cassandra By Example: Data
Modelling with CQL3 Slides
● DATASTAX C*OLLEGE CREDIT:
DATA MODELLING FOR APACHE
CASSANDRA slides
I wish I found these 1st:
● How do I Cassandra?
slides
● Mobile version of
DataStax web docs
(link)

Weitere ähnliche Inhalte

Was ist angesagt?

Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016DataStax
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scaleVinay Kumar Chella
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best PracticesMatillion
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraDataStax Academy
 

Was ist angesagt? (20)

Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
Cassandra serving netflix @ scale
Cassandra serving netflix @ scaleCassandra serving netflix @ scale
Cassandra serving netflix @ scale
 
Cassandra ppt 1
Cassandra ppt 1Cassandra ppt 1
Cassandra ppt 1
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache Cassandra
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Google BigQuery Best Practices
Google BigQuery Best PracticesGoogle BigQuery Best Practices
Google BigQuery Best Practices
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Data Stores @ Netflix
Data Stores @ NetflixData Stores @ Netflix
Data Stores @ Netflix
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 

Andere mochten auch

Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckDataStax Academy
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 

Andere mochten auch (6)

Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
NoSQL Essentials: Cassandra
NoSQL Essentials: CassandraNoSQL Essentials: Cassandra
NoSQL Essentials: Cassandra
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 

Ähnlich wie Cassandra NoSQL Tutorial

Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstackShivaling Sannalli
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSDataStax Academy
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersApache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersAnant Corporation
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And BeyondRomain Hardouin
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkRed Hat Developers
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Akash Kant
 
Global Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasGlobal Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasMongoDB
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixRoopa Tangirala
 

Ähnlich wie Cassandra NoSQL Tutorial (20)

Multi-cluster k8ssandra
Multi-cluster k8ssandraMulti-cluster k8ssandra
Multi-cluster k8ssandra
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstack
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersApache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET Developers
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15
 
Global Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasGlobal Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB Atlas
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 

Mehr von Michelle Darling

Mehr von Michelle Darling (8)

Family pics2august014
Family pics2august014Family pics2august014
Family pics2august014
 
Final pink panthers_03_31
Final pink panthers_03_31Final pink panthers_03_31
Final pink panthers_03_31
 
Final pink panthers_03_30
Final pink panthers_03_30Final pink panthers_03_30
Final pink panthers_03_30
 
Php summary
Php summaryPhp summary
Php summary
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
 
College day pressie
College day pressieCollege day pressie
College day pressie
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
V3 gamingcasestudy
V3 gamingcasestudyV3 gamingcasestudy
V3 gamingcasestudy
 

Kürzlich hochgeladen

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Kürzlich hochgeladen (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Cassandra NoSQL Tutorial

  • 1. Getting to know by Michelle Darling mdarlingcmt@gmail.com August 2013
  • 2. Agenda: ● What is Cassandra? ● Installation, CQL3 ● Data Modelling ● Summary Only 15 min to cover these, so please hold questions til the end, or email me :-) and I’ll summarize Q&A for everyone. Unfortunately, no time for: ● DB Admin ○ Detailed Architecture ○ Partitioning / Consistent Hashing ○ Consistency Tuning ○ Data Distribution & Replication ○ System Tables ● App Development ○ Using Python, Ruby etc to access Cassandra ○ Using Hadoop to stream data into Cassandra
  • 3. What is Cassandra? “Fortuneteller of Doom” from Greek Mythology. Tried to warn others about future disasters, but no one listened. Unfortunately, she was 100% accurate. NoSQL Distributed DB ● Consistency - A__ID ● Availability - High ● Point of Failure - none ● Good for Event Tracking & Analysis ○ Time series data ○ Sensor device data ○ Social media analytics ○ Risk Analysis ○ Failure Prediction Rackspace: “Which servers are under heavy load and are about to crash?”
  • 4. The Evolution of Cassandra 2008: Open-Source Release / 2013: Enterprise & Community Editions Data Model ● Wide rows, sparse arrays ● High performance through very fast write throughput. Infrastructure ● Peer-Peer Gossip ● Key-Value Pairs ● Tunable Consistency 2006 2005 ● Originally for Inbox Search ● But now used for Instagram
  • 5. Other NoSQL vs. Cassandra NoSQL Taxonomy: ● Key-Value Pairs ○ Dynamo, Riak, Redis ● Column-Based ○ BigTable, HBase, Cassandra ● Document-Based ○ MongoDB, Couchbase ● Graph ○ Neo4J Big Data Capable C* Differentiators: ● Production-proven at Netflix, eBay, Twitter, 20 of Fortune 100 ● “Clear Winner” in Scalability, Performance, Availability -- DataStax
  • 6. Architecture ● Cluster (ring) ● Nodes (circles) ● Peer-to-Peer Model ● Gossip Protocol Partitioner: Consistent Hashing
  • 7. Netflix Streaming Video ● Personalized Recommendations per family member ● Built on Amazon Web Services (AWS) + Cassandra
  • 8. Cloud installation using ● Amazon Web Services (AWS) ● Elastic Compute Cloud (EC2) ○ Free for the 1st year! Then pay only for what you use. ○ Sign up for AWS EC2 account: Big Data University Video 4:34 minutes, ● Amazon Machine Image (AMI) ○ Preconfigured installation template ○ Choose: “DataStax AMI for Cassandra Community Edition” ○ Follow these *very good* step-by-step instructions from DataStax. ○ AMIs also available for CouchBase, MongoDB (make sure you pick the free tier community versions to avoid monthly charge$$!!!).
  • 11. DataStax AMI Setup --clustername Michelle --totalnodes 1 --version community
  • 12. “Roll your Own” Installation DataStax Community Edition ● Install instructions For Linux, Windows, MacOS: http://www.datastax.com/2012/01/getting- started-with-cassandra ● Video: “Set up a 4- node Cassandra cluster in under 2 minutes” http://www.screenr.com/5G6
  • 13. Invoke CQLSH, CREATE KEYSPACE ./bin/cqlsh cqlsh> CREATE KEYSPACE big_data … with strategy_class = ‘org.apache.cassandra. locator.SimpleStrategy’ … with strategy_options:replication_factor=‘1’; cqlsh> use big_data; cqlsh:big_data>
  • 14. Tip: Skip Thrift -- use CQL3 Thrift RPC // Your Column Column col = new Column(ByteBuffer.wrap("name". getBytes())); col.setValue(ByteBuffer.wrap("value".getBytes())); col.setTimestamp(System.currentTimeMillis()); // Don't ask ColumnOrSuperColumn cosc = new ColumnOrSuperColumn(); cosc.setColumn(col); // Prepare to be amazed Mutation mutation = new Mutation(); mutation.setColumnOrSuperColumn(cosc); List<Mutation> mutations = new ArrayList<Mutation>(); mutations.add(mutation); Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); Map cf_map = new HashMap<String, List<Mutation>>(); cf_map.set("Standard1", mutations); mutations_map.put(ByteBuffer.wrap("key".getBytes()), cf_map); cassandra.batch_mutate(mutations_map, consistency_level); CQL3 - Uses cqlsh - “SQL-like” language - Runs on top of Thrift RPC - Much more user-friendly. Thrift code on left equals this in CQL3: INSERT INTO (id, name) VALUES ('key', 'value');
  • 15. CREATE TABLE cqlsh:big_data> create table user_tags ( … user_id varchar, … tag varchar, … value counter, … primary key (user_id, tag) …): ● TABLE user_tags: “How many times has a user mentioned a hashtag?” ● COUNTER datatype - Computes & stores counter value at the time data is written. This optimizes query performance.
  • 16. UPDATE TABLE SELECT FROM TABLE cqlsh:big_data> UPDATE user_tags SET value=value+1 WHERE user_id = ‘paul’ AND tag = ‘cassandra’ cqlsh:big_data> SELECT * FROM user_tags user_id | tag | value --------+-----------+---------- paul | cassandra | 1
  • 17. DATA MODELING A Major Paradigm Shift! RDBMS Cassandra Structured Data, Fixed Schema Unstructured Data, Flexible Schema “Array of Arrays” 2D: ROW x COLUMN “Nested Key-Value Pairs” 3D: ROW Key x COLUMN key x COLUMN values DATABASE KEYSPACE TABLE TABLE a.k.a COLUMN FAMILY ROW ROW a.k.a PARTITION. Unit of replication. COLUMN COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit of storage. Up to 2 billion columns per row. FOREIGN KEYS, JOINS, ACID Consistency Referential Integrity not enforced, so A_CID. BUT relationships represented using COLLECTIONS.
  • 19. Example: “Twissandra” Web App Twitter-Inspired sample application written in Python + Cassandra. ● Play with the app: twissandra.com ● Examine & learn from the code on GitHub. Features/Queries: ● Sign In, Sign Up ● Post Tweet ● Userline (User’s tweets) ● Timeline (All tweets) ● Following (Users being followed by user) ● Followers (Users following this user)
  • 21. Twissandra - RDBMS Version Entities ● USER, TWEET ● FOLLOWER, FOLLOWING ● FRIENDS Relationships: ● USER has many TWEETs. ● USER is a FOLLOWER of many USERs. ● Many USERs are FOLLOWING USER.
  • 22. Twissandra - Cassandra Version Tip: Model tables to mirror queries. TABLES or CFs ● TWEET ● USER, USERNAME ● FOLLOWERS, FOLLOWING ● USERLINE, TIMELINE Notes: ● Extra tables mirror queries. ● Denormalized tables are “pre-formed”for faster performance.
  • 24. What does C* data look like? TABLE Userline “List all of user’s Tweets” ************* Row Key: user_id Columns ● Column Key: tweet_id ● “at” Timestamp ● TTL (Time to Live) - seconds til expiration date. *************
  • 25. Cassandra Data Model = LEGOs? FlexibleSchema
  • 26. Summary: ● Go straight from SQL to CQL3; skip Thrift, Column Families, SuperColumns, etc ● Denormalize tables to mirror important queries. Roughly 1 table per impt query. ● Choose wisely: ○ Partition Keys ○ Cluster Keys ○ Indexes ○ TTL ○ Counters ○ Collections See DataStax Music Service Example ● Consider hybrid approach: ○ 20% - RDBMS for highly structured, OLTP, ACID requirements. ○ 80% - Scale Cassandra to handle the rest of data. Remember: ● Cheap: storage, servers, OpenSource software. ● Precious: User AND Developer Happiness.
  • 27. Resources C* Summit 2013: ● Slides ● Cassandra at eBay Scale (slides) ● Data Modelers Still Have Jobs - Adjusting For the NoSQL Environment (Slides) ● Real-time Analytics using Cassandra, Spark and Shark slides ● Cassandra By Example: Data Modelling with CQL3 Slides ● DATASTAX C*OLLEGE CREDIT: DATA MODELLING FOR APACHE CASSANDRA slides I wish I found these 1st: ● How do I Cassandra? slides ● Mobile version of DataStax web docs (link)