SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Use Your MySQL
Knowledge to Become an
Instant Cassandra Guru
Percona Live Santa Clara 2014
Robert Hodges
CEO
Continuent
Tim Callaghan
VP/Engineering
Tokutek
Who are we?
Robert Hodges
●  CEO at Continuent
●  Database nerd since 1982 starting with M204, RDBMS since
1990, NoSQL since 2012; designed Continuent Tungsten
●  Continuent offers clustering and replication for MySQL and
other fine DBMS types
Tim Callaghan
●  VP/Engineering at Tokutek
●  Long time database consumer (Oracle) and producer
(VoltDB, Tokutek)
●  Tokutek offers Fractal Tree indexes in MySQL (TokuDB) and
MongoDB (TokuMX)
Cassandra!
Cassandra, used by NetFlix, eBay, Twitter, Reddit and
many others, is one of today's most popular NoSQL-
databases in use. According to the website, the largest
known Cassandra setup involves over 300 TB of data
on over 400 machines.
High Performance Reads and Writes
Linear Scalability
High Availability
One Good Thing about Cassandra
●  Cassandra makes it easy to scale capacity
Existing cluster nodes
running out of space
Start new nodes and let them join
Stored data
redistribute
automatically
One Bad Thing about Cassandra
cql> create table foo (
id int primary key,
customerId int,
orderId int,
orderValue double);
OK!
<Me: I think I’m gonna like Cassandra.>
cql> create index idx_foo_1 on foo (customerId, orderId);
ERROR! - Secondary indexes only support 1 column
<Me: I think I just changed my mind. CQL != SQL>
<Me: (much later) secondary indexes aren’t that useful>
Today’s Question
How can you use your MySQL
knowledge to get up to speed
on Cassandra?
CQL
CQL is not SQL!
What is CQL?
●  Cassandra originally used a Thrift RPC-based API
●  CQL was added in 0.8
○  Simplifies access
○  Smaller learning curve for SQL users
●  So, you’ll feel right at home
○  Create table
○  Data types
○  Insert, update, select, delete
●  But Cassandra isn’t pretending to be relational...
What ISN’T CQL?
●  It’s familiar, and then it’s not!
○  No joins
○  No foreign keys
○  No “not null”
○  No sum(), group(), min(), …
○  Some “ORDER BY” support
○  Single row UPDATE
○  Limited secondary indexing
○  INSERT == UPDATE
■  but it all behaves like REPLACE INTO
The Bottom Line on CQL
●  It’s just a language
●  It’s very similar to SQL
●  Don’t blame CQL
○  What first appears as a “limitation” in Cassandra can
also be a strength
○  CQL enables us to get started quickly
○  Try Cassandra 0.7 for a little while…
Commit this to memory...
“You do not just throw data into Cassandra and
add later indexes to make it fast.”
Schema Design
“Easy to learn and
difficult to master”
- Nolan Bushnell (founder of Atari and Chuck E. Cheese’s Pizza-Time)
Coming from a Relational World?
Tradeoffs are hard
Feature RDBMS Cassandra
Single Point of Failure
Cross Datacenter
Linear Scaling
Data Modeling
*source = Patrick McFadin (@PatrickMcMcFadin)
How is My Data “Organized”?
Relational Model Cassandra Model
Database Keyspace
Table Column Family
Primary Key Row Key
Column Name Column Name/Key
Column Value Column Value
What is a BigTable?
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
Row
Key
Column Name
Column Value
Timestamp
TTL
Column Name
Column Value
Timestamp
TTL
Column Name
Column Value
Timestamp
TTL
Column Name
Column Value
Timestamp
TTL
Up to 2 billion!
●  Cassandra uses it for the data model
●  Supports extremely wide rows
●  Row lookup is fast and easily distributed
●  Columns are sorted by name
What About Data Types?
●  Unlike some other NoSQL databases, types
are included via CQL
●  Usual suspects
○  ascii, bigint, blob, boolean, decimal, varchar, …
●  More interesting
○  uuid - global uniqueness
○  timeuuid - global uniqueness, sorted by time portion
○  inet - IPv4 or IPv6 address
○  varint - variable precision integer
How Do I Create a Static Table?
CREATE TABLE users (
user varchar,
email varchar,
state varchar,
PRIMARY KEY (user)
);
●  Primary key is hashed for location/placement (more on that later)
○  Meaning you cannot range scan by PK
●  No “not null” or varchar sizing (enforce in applications)
●  Reads (on username) and all writes are easily distributed
●  Schema flexibility without downtime
○  alter table users add column password varchar;
user
email
timestamp
ttl
state
timestamp
ttl
These rows look and feel familiar.
No Auto-Increment?
●  Auto-increment is extremely difficult in a distributed system
●  Use a natural primary key, if possible
○  Remember, small primary keys aren’t important when denormalizing
●  Or, use uuid / timeuuid
○  Generated in your client applications
CREATE TABLE payments (
id timeuuid,
user varchar,
type varchar,
amount decimal,
PRIMARY KEY (id)
);
What About Secondary Indexes?
CREATE TABLE users (
user varchar,
email varchar,
state varchar,
PRIMARY KEY (user));
-- OPTION 1 : create an index
CREATE INDEX idxUBS on users (state);
-- OPTION 2 : create another table (store data twice)
CREATE TABLE usersByState (
state varchar,
user varchar,
PRIMARY KEY (state, user));
Why Not Create Secondary Indexes?
select * from users where state = ‘CA’;
Secondary index
●  Pro : Index is automatically maintained
●  Con : Above query sent to entire cluster (slows everyone down)
Additional table
●  Pro : Above query is directed to single node (important for scalability)
●  Con : Index is manually maintained (insert data twice, which is OK)
General tips
●  Low selectivity is bad - (male/female)
●  Extremely high selectivity is bad - (unique)
●  In general, create additional table and don’t busy entire cluster for reads
How Do I Create a Dynamic Table?
CREATE TABLE payments (
user varchar,
id timeuuid,
type varchar,
amount decimal,
PRIMARY KEY (user, id)
);
- Compound PK creates wide row
- Remainder of columns are “grouped”
- Still 1 “row” per user
user
type
timestamp
ttl
amount
timestamp
ttl
id1
type
timestamp
ttl
amount
timestamp
ttl
id2
●  Affects how data is stored/organized
●  Still returned in “row” format to CSQL
●  Enables “ORDER BY id” and range query on “id” for “user = ?” queries
●  Remember, BigTables support up to 2 billion columns
What About Relationships?
Relational
table emp (empName text PK, deptId int, ...);
index empIdx1 on emp (deptId);
table dept (deptId int PK, deptName text);
Cassandra
table emp (empName text PK, deptId int, deptName text);
table dept (
deptId int,
deptName text,
empName text,
PRIMARY KEY ((deptId, deptName), empName));
Store department name
with employee
Store employee names with
department (up to ~2 billion)
1, Accounting FrankJones FredJones SamSmith
What About Time Series Data?
CREATE TABLE dashboard (
dashboardId text,
event_time timestamp,
event_value double,
PRIMARY KEY (dashboardId, event_time))
WITH CLUSTERING ORDER BY (event_time DESC);
●  great for “last n”, like dashboards
●  “WITH CLUSTERING ORDER BY ()”
○  Data is stored in given order in BigTable row
○  In this case descending by event_time
○  Easy to access most recent data
How Do I Remove Time Series Data?
CREATE TABLE dashboard (
dashboardId text,
event_time timestamp,
event_value double,
PRIMARY KEY (dashboardId, event_time))
WITH CLUSTERING ORDER BY (event_time DESC);
insert into dashboard (dashboardId, event_time, event_value)
values (25, now(), 500.12) using ttl 86400;
●  defined at the “data” level
●  data “magically” disappears
●  no more cron jobs
Are There Other Cool Schemas?
●  collections : sets, lists, maps
○  an alternative to making the row “wide”
○  set<text> : ordered by CQL Type comparator
○  list<text> : ordered by insertion
○  map<int,text> : unordered
○  support for 64,000 objects per collection (but don’t go crazy)
create table users (
user varchar,
emails set<text>,
PRIMARY KEY (user));
insert into users (user, emails) values (‘tmcallaghan’,
{‘tim@tokutek.com’,’timc@tokutek.com’});
Topic:
Transactions
MySQL Transactions and Isolation
mysql> BEGIN;
...
mysql> INSERT INTO sample(data) VALUES
(“Hello world!”);
mysql> INSERT INTO sample(data) VALUES
(“Goodbye world!”);
...
mysql> COMMIT;
InnoDB creates MVCC view
of data; locks updated
rows, commits atomically
MyISAM locks table
and commits each row
immediately
Does Cassandra Have Transactions?
“Transactions” includes a lot of things so…
No:
●  Updates on different rows are separate, like MyISAM
●  Failed transactions on replicas might create partial
writes (Cassandra does not guarantee clean-up)
Yes:
●  Updates to single rows are atomic and isolated
●  Updates to rows are durable (logged) as well
(Important: Cassandra rows can be “bigger” than MySQL)
How Does Cassandra Handle Locks?
Cassandra uses timestamps instead
create columnfamily sample(id int primary key, data varchar, n int);
insert into sample(id, data, n) values(1, 'hello', 25);
insert into sample(id, data, n) values(2, 'goodbye', 27);
cqlsh:cbench> update sample set data='goodbye!' where id=2;
cqlsh:cbench> select id, writetime(data),writetime(n) from sample;
id | writetime(data) | writetime(n)
----+------------------+------------------
1 | 1396326015674000 | 1396326015674000 ⇐ Updated together
2 | 1396326144783000 | 1396326027160000 ⇐ Updated separately
Timestamps are the same for columns updated at the same
time
How Does Cassandra Handle Isolation?
●  Row updates are completely isolated until
they are completed
●  Updates can propagate out to replicas at
different times
Topic:
Replication and HA
Review of MySQL Replication
How Does Cassandra Replication Work?
●  Cassandra is fully multi-master
●  Updates are allowed at any location
●  Updates can happen even when there is a
partition
db1
db3
db2
db4
Client program
issues write
Writes
proxied
to other
instances
Coordinator
How Many Replicas Are There?
The number of replicas and how they are
distributed are properties of keyspaces
CREATE KEYSPACE cbench WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '3'
};
Keep 3 copies of
data
Strategy class to
distribute replicas
MySQL Partitioning
•  MySQL partitioning breaks a table into <n> tables
o  “PARTITION” is actually a storage engine
•  Tables can be partitioned by hash or range
o  Hash = random distribution
o  Range = user controlled distribution (date range)
•  Helpful in “big data” use-cases
•  Partitions can usually be dropped efficiently
o  Unlike “delete from table1 where timeField <
’12/31/2012’;”
How Does Cassandra Partition Data?
Cassandra splits data by key across hosts
using a “ring” to assign locations
A
B
F
E D
C
G
H
db3
EFGHAB
db4
GHABCD
db2
CDEFGH
db1
ABCDEF
Virtual node D gets
⅛ of hash range
Copy assigned to
host db4 by strategy
Partitioning in Action
Replica placement is based on primary key
insert into sample(id, data) values
(550e8400-e29b-41d4-a716-446655440000,
'hello!');
Primary Key
Run hash function on value, assign to virtual
node, from there to actual hosts
What About Conflicts?
●  Any client can update rows from any location
●  Cassandra resolves conflicts using
timestamps
●  Conflicts are resolved at the cell level
●  The latest timestamp value wins
Id Data Age
352 “bye” 35
UPDATE sample SET data =
‘hello’, age=35 WHERE id=352
UPDATE sample SET data =
‘bye’ WHERE id=352
WINWIN
LOSE
So Is This Like Galera?
●  Galera uses ACID transactions with
optimistic locking
●  It would accept the first transaction and roll
back the second completely
Id Data Age
352 “bye” 35
UPDATE sample SET data =
‘hello’, age=35 WHERE id=352
UPDATE sample SET data =
‘bye’ WHERE id=352
WIN
WIN
LOSE
How Does “Failover” Work?
●  Cassandra does not have failover
●  If a node fails or disappears, others continue
●  Writes or reads may stop if too many
replicas fail
db1 db2
db4
Client program
issues write
Writes
proxied
to other
instances
X
Coordinator
Tunable Consistency
Clients define the level of consistency
cqlsh:cbench> consistency all
Consistency level set to ALL.
cqlsh:cbench> update rtest set k1=4444 where id=3;
Unable to complete request: one or more nodes were unavailable.
cqlsh:cbench> select * from cbench.rtest where id=3;
Unable to complete request: one or more nodes were unavailable.
cqlsh:cbench> consistency quorum
Consistency level set to QUORUM.
cqlsh:cbench> update rtest set k1=4444 where id=3;
What Happens To Failed Writes?
●  Cassandra has several repair mechanisms
for failures
●  Hinted Handoff - The coordinator remembers
the write and replays it when node returns
●  Read Repair - Coordinator for a later read
notices that a value is out of sync
●  Node Repair - Run an external process
(nodetool) to scan for and fix inconsistencies
What Else is There to Learn?
“A lot” would be an understatement, but here
are some topics to consider.
●  You need to “rethink” your data model, so
read up, practice, repeat
●  Data consistency is an application problem
●  Storage/Internals : LSM, Compaction,
Tombstones, Bloom Filters
What Should You Do?
Summary 1
We liked…
●  CQL + “Tables” made it easy to get started
●  HA
●  Scaling model
Look [out] for…
●  Don’t bother trying to “port” your MySQL application
●  Query patterns are critical, model around them
●  Pay attention to version when reading docs/blogs
○  Cassandra is quickly evolving (0.7, 1.0, 1.2, 2.0, …)
Summary 2
Highly recommended reading
●  Book: “Practical Cassandra: A Developer’s Approach”
●  Datastax has great docs, http://www.datastax.com/docs
●  Presentations
○  http://www.slideshare.net/jaykumarpatel/cassandra-
data-modeling-best-practices
●  Blogs
○  http://planetcassandra.org
○  http://ebaytechblog.com
Questions?
Robert Hodges
CEO, Continuent
robert.hodges@continuent.com
@continuent
Tim Callaghan
VP/Engineering, Tokutek
tim@tokutek.com
@tmcallaghan

Weitere ähnliche Inhalte

Was ist angesagt?

Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data ModellingKnoldus Inc.
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basicsnickmbailey
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...Dave Stokes
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...Dave Stokes
 
Confoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & HistogramsConfoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & HistogramsDave Stokes
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityColin Charles
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityColin Charles
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraDataStax
 
EclipseCon 2021 NoSQL Endgame
EclipseCon 2021 NoSQL EndgameEclipseCon 2021 NoSQL Endgame
EclipseCon 2021 NoSQL EndgameThodoris Bais
 
No SQL, No problem - using MongoDB in Ruby
No SQL, No problem - using MongoDB in RubyNo SQL, No problem - using MongoDB in Ruby
No SQL, No problem - using MongoDB in Rubysbeam
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerDataStax
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Martin Zapletal
 

Was ist angesagt? (20)

Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data Modelling
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache Cassandra
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
Longhorn PHP - MySQL Indexes, Histograms, Locking Options, and Other Ways to ...
 
Confoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & HistogramsConfoo 2021 - MySQL Indexes & Histograms
Confoo 2021 - MySQL Indexes & Histograms
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra Interoperability
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
EclipseCon 2021 NoSQL Endgame
EclipseCon 2021 NoSQL EndgameEclipseCon 2021 NoSQL Endgame
EclipseCon 2021 NoSQL Endgame
 
No SQL, No problem - using MongoDB in Ruby
No SQL, No problem - using MongoDB in RubyNo SQL, No problem - using MongoDB in Ruby
No SQL, No problem - using MongoDB in Ruby
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
Cassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super ModelerCassandra Community Webinar | Become a Super Modeler
Cassandra Community Webinar | Become a Super Modeler
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
 

Ähnlich wie Use Your MySQL Knowledge to Become an Instant Cassandra Guru

Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...NoSQLmatters
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_uploadRajini Ramesh
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyGuillaume Lefranc
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBJanos Geronimo
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaJose Mº Muñoz
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandraPL dream
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataChen Robert
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDTony Rogerson
 
从 Oracle 合并到 my sql npr 实例分析
从 Oracle 合并到 my sql   npr 实例分析从 Oracle 合并到 my sql   npr 实例分析
从 Oracle 合并到 my sql npr 实例分析YUCHENG HU
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 

Ähnlich wie Use Your MySQL Knowledge to Become an Instant Cassandra Guru (20)

Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Introduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDBIntroduction to NoSQL CassandraDB
Introduction to NoSQL CassandraDB
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Spark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest CórdobaSpark & Cassandra - DevFest Córdoba
Spark & Cassandra - DevFest Córdoba
 
Storage cassandra
Storage   cassandraStorage   cassandra
Storage cassandra
 
Cassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting dataCassandra implementation for collecting data and presenting data
Cassandra implementation for collecting data and presenting data
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
NewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACIDNewSQL - Deliverance from BASE and back to SQL and ACID
NewSQL - Deliverance from BASE and back to SQL and ACID
 
从 Oracle 合并到 my sql npr 实例分析
从 Oracle 合并到 my sql   npr 实例分析从 Oracle 合并到 my sql   npr 实例分析
从 Oracle 合并到 my sql npr 实例分析
 
Module02
Module02Module02
Module02
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
PO WER - Piotr Mariat - Sql
PO WER - Piotr Mariat - SqlPO WER - Piotr Mariat - Sql
PO WER - Piotr Mariat - Sql
 

Mehr von Tim Callaghan

Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceTim Callaghan
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneTim Callaghan
 
So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)Tim Callaghan
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedTim Callaghan
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationTim Callaghan
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBTim Callaghan
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXTim Callaghan
 
Use Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruUse Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruTim Callaghan
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeTim Callaghan
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksTim Callaghan
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical OverviewTim Callaghan
 

Mehr von Tim Callaghan (12)

Is It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB PerformanceIs It Fast? : Measuring MongoDB Performance
Is It Fast? : Measuring MongoDB Performance
 
Benchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and FortuneBenchmarking MongoDB for Fame and Fortune
Benchmarking MongoDB for Fame and Fortune
 
So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)So you want to be a software developer? (version 2.0)
So you want to be a software developer? (version 2.0)
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Introduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free ReplicationIntroduction to TokuDB v7.5 and Read Free Replication
Introduction to TokuDB v7.5 and Read Free Replication
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Get More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDBGet More Out of MySQL with TokuDB
Get More Out of MySQL with TokuDB
 
Get More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMXGet More Out of MongoDB with TokuMX
Get More Out of MongoDB with TokuMX
 
Use Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB GuruUse Your MySQL Knowledge to Become a MongoDB Guru
Use Your MySQL Knowledge to Become a MongoDB Guru
 
Fractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to PracticeFractal Tree Indexes : From Theory to Practice
Fractal Tree Indexes : From Theory to Practice
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just Works
 
VoltDB : A Technical Overview
VoltDB : A Technical OverviewVoltDB : A Technical Overview
VoltDB : A Technical Overview
 

Kürzlich hochgeladen

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Kürzlich hochgeladen (20)

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Use Your MySQL Knowledge to Become an Instant Cassandra Guru

  • 1. Use Your MySQL Knowledge to Become an Instant Cassandra Guru Percona Live Santa Clara 2014 Robert Hodges CEO Continuent Tim Callaghan VP/Engineering Tokutek
  • 2. Who are we? Robert Hodges ●  CEO at Continuent ●  Database nerd since 1982 starting with M204, RDBMS since 1990, NoSQL since 2012; designed Continuent Tungsten ●  Continuent offers clustering and replication for MySQL and other fine DBMS types Tim Callaghan ●  VP/Engineering at Tokutek ●  Long time database consumer (Oracle) and producer (VoltDB, Tokutek) ●  Tokutek offers Fractal Tree indexes in MySQL (TokuDB) and MongoDB (TokuMX)
  • 3. Cassandra! Cassandra, used by NetFlix, eBay, Twitter, Reddit and many others, is one of today's most popular NoSQL- databases in use. According to the website, the largest known Cassandra setup involves over 300 TB of data on over 400 machines. High Performance Reads and Writes Linear Scalability High Availability
  • 4. One Good Thing about Cassandra ●  Cassandra makes it easy to scale capacity Existing cluster nodes running out of space Start new nodes and let them join Stored data redistribute automatically
  • 5. One Bad Thing about Cassandra cql> create table foo ( id int primary key, customerId int, orderId int, orderValue double); OK! <Me: I think I’m gonna like Cassandra.> cql> create index idx_foo_1 on foo (customerId, orderId); ERROR! - Secondary indexes only support 1 column <Me: I think I just changed my mind. CQL != SQL> <Me: (much later) secondary indexes aren’t that useful>
  • 6. Today’s Question How can you use your MySQL knowledge to get up to speed on Cassandra?
  • 8. What is CQL? ●  Cassandra originally used a Thrift RPC-based API ●  CQL was added in 0.8 ○  Simplifies access ○  Smaller learning curve for SQL users ●  So, you’ll feel right at home ○  Create table ○  Data types ○  Insert, update, select, delete ●  But Cassandra isn’t pretending to be relational...
  • 9. What ISN’T CQL? ●  It’s familiar, and then it’s not! ○  No joins ○  No foreign keys ○  No “not null” ○  No sum(), group(), min(), … ○  Some “ORDER BY” support ○  Single row UPDATE ○  Limited secondary indexing ○  INSERT == UPDATE ■  but it all behaves like REPLACE INTO
  • 10. The Bottom Line on CQL ●  It’s just a language ●  It’s very similar to SQL ●  Don’t blame CQL ○  What first appears as a “limitation” in Cassandra can also be a strength ○  CQL enables us to get started quickly ○  Try Cassandra 0.7 for a little while… Commit this to memory... “You do not just throw data into Cassandra and add later indexes to make it fast.”
  • 11. Schema Design “Easy to learn and difficult to master” - Nolan Bushnell (founder of Atari and Chuck E. Cheese’s Pizza-Time)
  • 12. Coming from a Relational World? Tradeoffs are hard Feature RDBMS Cassandra Single Point of Failure Cross Datacenter Linear Scaling Data Modeling *source = Patrick McFadin (@PatrickMcMcFadin)
  • 13. How is My Data “Organized”? Relational Model Cassandra Model Database Keyspace Table Column Family Primary Key Row Key Column Name Column Name/Key Column Value Column Value
  • 14. What is a BigTable? http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf Row Key Column Name Column Value Timestamp TTL Column Name Column Value Timestamp TTL Column Name Column Value Timestamp TTL Column Name Column Value Timestamp TTL Up to 2 billion! ●  Cassandra uses it for the data model ●  Supports extremely wide rows ●  Row lookup is fast and easily distributed ●  Columns are sorted by name
  • 15. What About Data Types? ●  Unlike some other NoSQL databases, types are included via CQL ●  Usual suspects ○  ascii, bigint, blob, boolean, decimal, varchar, … ●  More interesting ○  uuid - global uniqueness ○  timeuuid - global uniqueness, sorted by time portion ○  inet - IPv4 or IPv6 address ○  varint - variable precision integer
  • 16. How Do I Create a Static Table? CREATE TABLE users ( user varchar, email varchar, state varchar, PRIMARY KEY (user) ); ●  Primary key is hashed for location/placement (more on that later) ○  Meaning you cannot range scan by PK ●  No “not null” or varchar sizing (enforce in applications) ●  Reads (on username) and all writes are easily distributed ●  Schema flexibility without downtime ○  alter table users add column password varchar; user email timestamp ttl state timestamp ttl These rows look and feel familiar.
  • 17. No Auto-Increment? ●  Auto-increment is extremely difficult in a distributed system ●  Use a natural primary key, if possible ○  Remember, small primary keys aren’t important when denormalizing ●  Or, use uuid / timeuuid ○  Generated in your client applications CREATE TABLE payments ( id timeuuid, user varchar, type varchar, amount decimal, PRIMARY KEY (id) );
  • 18. What About Secondary Indexes? CREATE TABLE users ( user varchar, email varchar, state varchar, PRIMARY KEY (user)); -- OPTION 1 : create an index CREATE INDEX idxUBS on users (state); -- OPTION 2 : create another table (store data twice) CREATE TABLE usersByState ( state varchar, user varchar, PRIMARY KEY (state, user));
  • 19. Why Not Create Secondary Indexes? select * from users where state = ‘CA’; Secondary index ●  Pro : Index is automatically maintained ●  Con : Above query sent to entire cluster (slows everyone down) Additional table ●  Pro : Above query is directed to single node (important for scalability) ●  Con : Index is manually maintained (insert data twice, which is OK) General tips ●  Low selectivity is bad - (male/female) ●  Extremely high selectivity is bad - (unique) ●  In general, create additional table and don’t busy entire cluster for reads
  • 20. How Do I Create a Dynamic Table? CREATE TABLE payments ( user varchar, id timeuuid, type varchar, amount decimal, PRIMARY KEY (user, id) ); - Compound PK creates wide row - Remainder of columns are “grouped” - Still 1 “row” per user user type timestamp ttl amount timestamp ttl id1 type timestamp ttl amount timestamp ttl id2 ●  Affects how data is stored/organized ●  Still returned in “row” format to CSQL ●  Enables “ORDER BY id” and range query on “id” for “user = ?” queries ●  Remember, BigTables support up to 2 billion columns
  • 21. What About Relationships? Relational table emp (empName text PK, deptId int, ...); index empIdx1 on emp (deptId); table dept (deptId int PK, deptName text); Cassandra table emp (empName text PK, deptId int, deptName text); table dept ( deptId int, deptName text, empName text, PRIMARY KEY ((deptId, deptName), empName)); Store department name with employee Store employee names with department (up to ~2 billion) 1, Accounting FrankJones FredJones SamSmith
  • 22. What About Time Series Data? CREATE TABLE dashboard ( dashboardId text, event_time timestamp, event_value double, PRIMARY KEY (dashboardId, event_time)) WITH CLUSTERING ORDER BY (event_time DESC); ●  great for “last n”, like dashboards ●  “WITH CLUSTERING ORDER BY ()” ○  Data is stored in given order in BigTable row ○  In this case descending by event_time ○  Easy to access most recent data
  • 23. How Do I Remove Time Series Data? CREATE TABLE dashboard ( dashboardId text, event_time timestamp, event_value double, PRIMARY KEY (dashboardId, event_time)) WITH CLUSTERING ORDER BY (event_time DESC); insert into dashboard (dashboardId, event_time, event_value) values (25, now(), 500.12) using ttl 86400; ●  defined at the “data” level ●  data “magically” disappears ●  no more cron jobs
  • 24. Are There Other Cool Schemas? ●  collections : sets, lists, maps ○  an alternative to making the row “wide” ○  set<text> : ordered by CQL Type comparator ○  list<text> : ordered by insertion ○  map<int,text> : unordered ○  support for 64,000 objects per collection (but don’t go crazy) create table users ( user varchar, emails set<text>, PRIMARY KEY (user)); insert into users (user, emails) values (‘tmcallaghan’, {‘tim@tokutek.com’,’timc@tokutek.com’});
  • 26. MySQL Transactions and Isolation mysql> BEGIN; ... mysql> INSERT INTO sample(data) VALUES (“Hello world!”); mysql> INSERT INTO sample(data) VALUES (“Goodbye world!”); ... mysql> COMMIT; InnoDB creates MVCC view of data; locks updated rows, commits atomically MyISAM locks table and commits each row immediately
  • 27. Does Cassandra Have Transactions? “Transactions” includes a lot of things so… No: ●  Updates on different rows are separate, like MyISAM ●  Failed transactions on replicas might create partial writes (Cassandra does not guarantee clean-up) Yes: ●  Updates to single rows are atomic and isolated ●  Updates to rows are durable (logged) as well (Important: Cassandra rows can be “bigger” than MySQL)
  • 28. How Does Cassandra Handle Locks? Cassandra uses timestamps instead create columnfamily sample(id int primary key, data varchar, n int); insert into sample(id, data, n) values(1, 'hello', 25); insert into sample(id, data, n) values(2, 'goodbye', 27); cqlsh:cbench> update sample set data='goodbye!' where id=2; cqlsh:cbench> select id, writetime(data),writetime(n) from sample; id | writetime(data) | writetime(n) ----+------------------+------------------ 1 | 1396326015674000 | 1396326015674000 ⇐ Updated together 2 | 1396326144783000 | 1396326027160000 ⇐ Updated separately Timestamps are the same for columns updated at the same time
  • 29. How Does Cassandra Handle Isolation? ●  Row updates are completely isolated until they are completed ●  Updates can propagate out to replicas at different times
  • 31. Review of MySQL Replication
  • 32. How Does Cassandra Replication Work? ●  Cassandra is fully multi-master ●  Updates are allowed at any location ●  Updates can happen even when there is a partition db1 db3 db2 db4 Client program issues write Writes proxied to other instances Coordinator
  • 33. How Many Replicas Are There? The number of replicas and how they are distributed are properties of keyspaces CREATE KEYSPACE cbench WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '3' }; Keep 3 copies of data Strategy class to distribute replicas
  • 34. MySQL Partitioning •  MySQL partitioning breaks a table into <n> tables o  “PARTITION” is actually a storage engine •  Tables can be partitioned by hash or range o  Hash = random distribution o  Range = user controlled distribution (date range) •  Helpful in “big data” use-cases •  Partitions can usually be dropped efficiently o  Unlike “delete from table1 where timeField < ’12/31/2012’;”
  • 35. How Does Cassandra Partition Data? Cassandra splits data by key across hosts using a “ring” to assign locations A B F E D C G H db3 EFGHAB db4 GHABCD db2 CDEFGH db1 ABCDEF Virtual node D gets ⅛ of hash range Copy assigned to host db4 by strategy
  • 36. Partitioning in Action Replica placement is based on primary key insert into sample(id, data) values (550e8400-e29b-41d4-a716-446655440000, 'hello!'); Primary Key Run hash function on value, assign to virtual node, from there to actual hosts
  • 37. What About Conflicts? ●  Any client can update rows from any location ●  Cassandra resolves conflicts using timestamps ●  Conflicts are resolved at the cell level ●  The latest timestamp value wins Id Data Age 352 “bye” 35 UPDATE sample SET data = ‘hello’, age=35 WHERE id=352 UPDATE sample SET data = ‘bye’ WHERE id=352 WINWIN LOSE
  • 38. So Is This Like Galera? ●  Galera uses ACID transactions with optimistic locking ●  It would accept the first transaction and roll back the second completely Id Data Age 352 “bye” 35 UPDATE sample SET data = ‘hello’, age=35 WHERE id=352 UPDATE sample SET data = ‘bye’ WHERE id=352 WIN WIN LOSE
  • 39. How Does “Failover” Work? ●  Cassandra does not have failover ●  If a node fails or disappears, others continue ●  Writes or reads may stop if too many replicas fail db1 db2 db4 Client program issues write Writes proxied to other instances X Coordinator
  • 40. Tunable Consistency Clients define the level of consistency cqlsh:cbench> consistency all Consistency level set to ALL. cqlsh:cbench> update rtest set k1=4444 where id=3; Unable to complete request: one or more nodes were unavailable. cqlsh:cbench> select * from cbench.rtest where id=3; Unable to complete request: one or more nodes were unavailable. cqlsh:cbench> consistency quorum Consistency level set to QUORUM. cqlsh:cbench> update rtest set k1=4444 where id=3;
  • 41. What Happens To Failed Writes? ●  Cassandra has several repair mechanisms for failures ●  Hinted Handoff - The coordinator remembers the write and replays it when node returns ●  Read Repair - Coordinator for a later read notices that a value is out of sync ●  Node Repair - Run an external process (nodetool) to scan for and fix inconsistencies
  • 42. What Else is There to Learn? “A lot” would be an understatement, but here are some topics to consider. ●  You need to “rethink” your data model, so read up, practice, repeat ●  Data consistency is an application problem ●  Storage/Internals : LSM, Compaction, Tombstones, Bloom Filters
  • 44. Summary 1 We liked… ●  CQL + “Tables” made it easy to get started ●  HA ●  Scaling model Look [out] for… ●  Don’t bother trying to “port” your MySQL application ●  Query patterns are critical, model around them ●  Pay attention to version when reading docs/blogs ○  Cassandra is quickly evolving (0.7, 1.0, 1.2, 2.0, …)
  • 45. Summary 2 Highly recommended reading ●  Book: “Practical Cassandra: A Developer’s Approach” ●  Datastax has great docs, http://www.datastax.com/docs ●  Presentations ○  http://www.slideshare.net/jaykumarpatel/cassandra- data-modeling-best-practices ●  Blogs ○  http://planetcassandra.org ○  http://ebaytechblog.com
  • 46. Questions? Robert Hodges CEO, Continuent robert.hodges@continuent.com @continuent Tim Callaghan VP/Engineering, Tokutek tim@tokutek.com @tmcallaghan