SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Cassandra Storage Engine in MariaDB
MariaDB Cassandra interoperability
Sergei Petrunia
Colin Charles
Who are we
● Sergei Petrunia
– Principal developer of CassandraSE, optimizer
developer, formerly from MySQL
– psergey@mariadb.org
● Colin Charles
– Chief Evangelist, MariaDB, formerly from MySQL
– colin@mariadb.org
Agenda
● An introduction to Cassandra
● The Cassandra Storage Engine
(Cassandra SE)
● Data mapping
● Use cases
● Benchmarks
● Conclusion
Background: what is Cassandra
• A distributed NoSQL database
– Key-Value store
● Limited range scan suppor
– Optionally flexible schema
● Pre-defined “static” columns
● Ad-hoc “dynamic” columns
– Automatic sharding / replication
– Eventual consistency
4
Background: Cassandra's data model
• “Column families” like tables
• Row key → columns
• Somewhat similar to SQL but
some important differences.
• Supercolumns are not
supported
5
CQL – Cassandra Query Language
Looks like SQL at first glance
6
bash$ cqlsh -3
cqlsh> CREATE KEYSPACE mariadbtest
... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1};
cqlsh> use mariadbtest;
cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key,
... data1 varchar, data2 bigint
... ) with compact storage;
cqlsh:mariadbtest> insert into cf1 (pk, data1,data2)
... values ('row1', 'data-in-cassandra', 1234);
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
CQL is not SQL
Similarity with SQL is superficial
7
cqlsh:mariadbtest> select * from cf1 where pk='row1';
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
cqlsh:mariadbtest> select * from cf1 where data2=1234;
Bad Request: No indexed columns present in by-columns clause with Equal
operator
cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2';
Bad Request: line 1:34 missing EOF at 'or'
• No joins or subqueries
• No GROUP BY, ORDER BY must be able to use available
indexes
• WHERE clause must represent an index lookup.
Cassandra Storage Engine
8
Provides a “view” of Cassandra's data
from MariaDB.
Starts a NoCQL movement
1. Load the Cassandra SE plugin
• Get MariaDB 10.0.1+
• Load the Cassandra plugin
– From SQL:
9
MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so';
[mysqld]
...
plugin-load=ha_cassandra.so
– Or, add a line to my.cnf:
MariaDB [(none)]> show plugins;
+--------------------+--------+-----------------+-----------------+---------+
| Name | Status | Type | Library | License |
+--------------------+--------+-----------------+-----------------+---------+
...
| CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL |
+--------------------+--------+-----------------+-----------------+---------+
• Check it is loaded
2. Connect to Cassandra
• Create an SQL table which is a view of a column family
10
MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> create table t2 (pk varchar(36) primary key,
-> data1 varchar(60),
-> data2 bigint
-> ) engine=cassandra
-> keyspace='mariadbtest'
-> thrift_host='10.196.2.113'
-> column_family='cf1';
Query OK, 0 rows affected (0.01 sec)
– thrift_host can be set per-table
– @@cassandra_default_thrift_host allows to
● Re-point the table to different node dynamically
● Not change table DDL when Cassandra IP changes.
Possible gotchas
11
• SELinux blocks the connection
MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ;
ERROR 1429 (HY000): Unable to connect to foreign data source: connect()
failed: Permission denied [1]
MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ;
ERROR 1429 (HY000): Unable to connect to foreign data source: Column family
cf1 not found in keyspace mariadbtest
• Cassandra 1.2 and CFs without “COMPACT STORAGE”
– Packaging bug
– To get running quickly: echo 0 >/selinux/enforce
– Caused by a change in Cassandra 1.2
– They broke Pig also
– We intend to update CassandraSE for 1.2
Accessing Cassandra data
●
Can insert data
12
MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123);
Query OK, 1 row affected (0.00 sec)
cqlsh:mariadbtest> select * from cf1;
pk | data1 | data2
------+-------------------+-------
row1 | data-in-cassandra | 1234
row2 | data-from-mariadb | 123
• Cassandra sees inserted data
MariaDB [test]> select * from t2;
+------+-------------------+-------+
| pk | data1 | data2 |
+------+-------------------+-------+
| row1 | data-in-cassandra | 1234 |
+------+-------------------+-------+
• Can get Cassandra's data
Data mapping between
Cassandra and SQL
Data mapping between Cassandra and SQL
14
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
Data mapping between Cassandra and SQL
15
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
• Table must have a primary key
– Name/type must match Cassandra's rowkey
Data mapping between Cassandra and SQL
16
create table tbl (
pk varchar(36) primary key,
data1 varchar(60),
data2 bigint
) engine=cassandra keyspace='ks1' column_family='cf1'
• MariaDB table represents Cassandra's Column Family
– Can use any table name, column_family=... specifies CF.
• Table must have a primary key
– Name/type must match Cassandra's rowkey
• Columns map to Cassandra's static columns
– Name must be the same as in Cassandra
– Datatypes must match
– Can any subset of CF's columns
Datatype mapping
Cassandra MariaDB
blob BLOB, VARBINARY(n)
ascii BLOB, VARCHAR(n), use charset=latin1
text BLOB, VARCHAR(n), use charset=utf8
varint VARBINARY(n)
int INT
bigint BIGINT, TINY, SHORT
uuid CHAR(36) (text in MariaDB)
timestamp TIMESTAMP (second precision), TIMESTAMP(6) (microsecond precision),
BIGINT
boolean BOOL
float FLOAT
double DOUBLE
decimal VARBINARY(n)
counter BIGINT
• CF column datatype determines MariaDB datatype
Dynamic columns
• Cassandra supports “dynamic column families”
• Can access ad-hoc columns with MariaDB's
dynamic columns feature
18
create table tbl
(
rowkey type PRIMARY KEY
column1 type,
...
dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes
) engine=cassandra keyspace=... column_family=...;
insert into tbl values
(1, column_create('col1', 1, 'col2', 'value-2'));
select rowkey,
column_get(dynamic_cols, 'uuidcol' as char)
from tbl;
Data mapping is safe
create table t3 (pk varchar(60) primary key, no_such_field int)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not
be mapped to any field in Cassandra'
create table t3 (pk varchar(60) primary key, data1 double)
engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1';
ERROR 1928 (HY000): Internal error: 'Failed to map column data1
to datatype org.apache.cassandra.db.marshal.UTF8Type'
• Cassandra SE will refuse incorrect mappings
Command mapping
Command Mapping
● Cassandra commands
– PUT (upsert)
– GET
● Scan
– DELETE (if exists)
● SQL commands
– SELECT → GET/Scan
– INSERT → PUT (upsert)
– UPDATE/DELETE → read+write.
SELECT command mapping
● MariaDB has an SQL interpreter
● Cassandra SE supports lookups and scans
● Can now do
– Arbitrary WHERE clauses
– JOINs between Cassandra tables and
MariaDB tables
● Batched Key Access is supported
DML command mapping
● No SQL semantics
– INSERT overwrites rows
– UPDATE reads, then writes
● Have you updated what you read
– DELETE reads, then deletes
● Can't be sure if/what you have deleted
● Not as bad as it sounds, it's Cassandra
– Cassandra SE doesn't make it SQL.
Cassandra SE use cases
Cassandra use cases
● Collect massive amounts
of data
– Web page hits
– Sensor updates
● Updates are naturally non-conflicting
– Keyed by UUIDs, timestamps
● Reads are served with one lookup
● Good for certain kinds of data
– Moving from SQL entirely may be difficult
Cassandra SE use cases (1)
● Send an update to Cassandra
– Be a sensor
● Grab a piece of data from Cassandra
– “This web page was last viewed by …”
– “Last known position of this user was ...”.
Access Cassandra
data from SQL
Cassandra SE use cases (2)
● Want a special table that is
– auto-replicated
– fault-tolerant
– Very fast?
● Get Cassandra, and create a
Cassandra SE table.
Coming from MySQL/MariaDB side:
Cassandra Storage Engine non-use cases
• Huge, sift-through-all-data joins
– Use Pig
• Bulk data transfer to/from Cassandra
cluster
– Use Sqoop
• A replacement for InnoDB
– No full SQL semantics
28
A “benchmark”
• One table
• EC2 environment
– m1.large nodes
– Ephemeral disks
• Stream of single-line
INSERTs
• Tried Innodb and
Cassandra
• Hardly any tuning
Conclusions
• Cassandra SE can be used to peek at
data in Cassandra from MariaDB.
• It is not a replacement for Pig/Hive
• It is really easy to setup and use
30
Going Forward
• Looking for input
• Do you want support for
– Fast counter columns updates?
– Awareness of Cassandra cluster
topology?
– Secondary indexes?
– …?
31
Resources
• https://kb.askmonty.org/en/cassandrase/
• http://wiki.apache.org/cassandra/DataModel
• http://cassandra.apache.org/
• http://www.datastax.com/docs/1.1/ddl/column_family
32
Thanks!
33
Q & A
Extra: Cassandra SE internals
• Developed against Cassandra 1.1
• Uses Thrift API
– cannot stream CQL resultset in 1.1
– Cant use secondary indexes
• Only supports AllowAllAuthenticator
• In Cassandra 1.2
– “CQL Binary Protocol” with streaming
– CASSANDRA-5234: Thrift can only read CFs
“WITH COMPACT STORAGE”
34

Weitere ähnliche Inhalte

Was ist angesagt?

Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
DataStax
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
mubarakss
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 

Was ist angesagt? (20)

Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra ppt 2
Cassandra ppt 2Cassandra ppt 2
Cassandra ppt 2
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 

Ähnlich wie Maria db cassandra interoperability cassandra storage engine in mariadb

MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
Colin Charles
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
Patrick McFadin
 
[B14] A MySQL Replacement by Colin Charles
[B14] A MySQL Replacement by Colin Charles[B14] A MySQL Replacement by Colin Charles
[B14] A MySQL Replacement by Colin Charles
Insight Technology, Inc.
 

Ähnlich wie Maria db cassandra interoperability cassandra storage engine in mariadb (20)

MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra Interoperability
 
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
C* Summit 2013: Can't we all just get along? MariaDB and Cassandra by Colin C...
 
MariaDB for developers
MariaDB for developersMariaDB for developers
MariaDB for developers
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)MariaDB for Developers and Operators (DevOps)
MariaDB for Developers and Operators (DevOps)
 
[B14] A MySQL Replacement by Colin Charles
[B14] A MySQL Replacement by Colin Charles[B14] A MySQL Replacement by Colin Charles
[B14] A MySQL Replacement by Colin Charles
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
Instaclustr webinar 50,000 transactions per second with Apache Spark on Apach...
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
DataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New WorldDataSource V2 and Cassandra – A Whole New World
DataSource V2 and Cassandra – A Whole New World
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
Cassandra at BrightTag
Cassandra at BrightTagCassandra at BrightTag
Cassandra at BrightTag
 
NoSQL Session II
NoSQL Session IINoSQL Session II
NoSQL Session II
 
Cassandra - A Distributed Database System
Cassandra - A Distributed Database System Cassandra - A Distributed Database System
Cassandra - A Distributed Database System
 
MySQL 开发
MySQL 开发MySQL 开发
MySQL 开发
 
Presentation
PresentationPresentation
Presentation
 

Mehr von YUCHENG HU

Confluence 回顾(retrospectives) 蓝图 cwikiossez
Confluence 回顾(retrospectives) 蓝图   cwikiossezConfluence 回顾(retrospectives) 蓝图   cwikiossez
Confluence 回顾(retrospectives) 蓝图 cwikiossez
YUCHENG HU
 

Mehr von YUCHENG HU (20)

Confluencewiki 使用空间
Confluencewiki 使用空间Confluencewiki 使用空间
Confluencewiki 使用空间
 
Git
GitGit
Git
 
Presta shop 1.6 如何安装简体中文语言文件
Presta shop 1.6 如何安装简体中文语言文件Presta shop 1.6 如何安装简体中文语言文件
Presta shop 1.6 如何安装简体中文语言文件
 
Logback 介绍
Logback 介绍Logback 介绍
Logback 介绍
 
Presta shop 1.6 详细安装指南
Presta shop 1.6 详细安装指南Presta shop 1.6 详细安装指南
Presta shop 1.6 详细安装指南
 
Presta shop 1.6 的安装环境
Presta shop 1.6 的安装环境Presta shop 1.6 的安装环境
Presta shop 1.6 的安装环境
 
Presta shop 1.6 如何安装简体中文语言文件
Presta shop 1.6 如何安装简体中文语言文件Presta shop 1.6 如何安装简体中文语言文件
Presta shop 1.6 如何安装简体中文语言文件
 
Presta shop 1.6 图文安装教程
Presta shop 1.6 图文安装教程Presta shop 1.6 图文安装教程
Presta shop 1.6 图文安装教程
 
V tiger 5.4.0 图文安装教程
V tiger 5.4.0 图文安装教程V tiger 5.4.0 图文安装教程
V tiger 5.4.0 图文安装教程
 
Confluence 回顾(retrospectives) 蓝图 cwikiossez
Confluence 回顾(retrospectives) 蓝图   cwikiossezConfluence 回顾(retrospectives) 蓝图   cwikiossez
Confluence 回顾(retrospectives) 蓝图 cwikiossez
 
Confluence 会议记录(meeting notes)蓝图 cwikiossez
Confluence 会议记录(meeting notes)蓝图   cwikiossezConfluence 会议记录(meeting notes)蓝图   cwikiossez
Confluence 会议记录(meeting notes)蓝图 cwikiossez
 
VTIGER - 销售机会 - CWIKIOSSEZ
VTIGER - 销售机会 - CWIKIOSSEZ VTIGER - 销售机会 - CWIKIOSSEZ
VTIGER - 销售机会 - CWIKIOSSEZ
 
Confluence 使用一个模板新建一个页面 cwikiossez
Confluence 使用一个模板新建一个页面     cwikiossezConfluence 使用一个模板新建一个页面     cwikiossez
Confluence 使用一个模板新建一个页面 cwikiossez
 
Confluence 使用模板
Confluence 使用模板Confluence 使用模板
Confluence 使用模板
 
Cwikiossez confluence 订阅页面更新邮件通知
Cwikiossez confluence 订阅页面更新邮件通知Cwikiossez confluence 订阅页面更新邮件通知
Cwikiossez confluence 订阅页面更新邮件通知
 
Cwikiossez confluence 关注页面 博客页面和空间
Cwikiossez confluence 关注页面 博客页面和空间Cwikiossez confluence 关注页面 博客页面和空间
Cwikiossez confluence 关注页面 博客页面和空间
 
My sql università di enna a.a. 2005-06
My sql   università di enna a.a. 2005-06My sql   università di enna a.a. 2005-06
My sql università di enna a.a. 2005-06
 
My sql would you like transactions
My sql would you like transactionsMy sql would you like transactions
My sql would you like transactions
 
MySQL 指南
MySQL 指南MySQL 指南
MySQL 指南
 
MySQL 简要介绍
MySQL 简要介绍MySQL 简要介绍
MySQL 简要介绍
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Maria db cassandra interoperability cassandra storage engine in mariadb

  • 1. Cassandra Storage Engine in MariaDB MariaDB Cassandra interoperability Sergei Petrunia Colin Charles
  • 2. Who are we ● Sergei Petrunia – Principal developer of CassandraSE, optimizer developer, formerly from MySQL – psergey@mariadb.org ● Colin Charles – Chief Evangelist, MariaDB, formerly from MySQL – colin@mariadb.org
  • 3. Agenda ● An introduction to Cassandra ● The Cassandra Storage Engine (Cassandra SE) ● Data mapping ● Use cases ● Benchmarks ● Conclusion
  • 4. Background: what is Cassandra • A distributed NoSQL database – Key-Value store ● Limited range scan suppor – Optionally flexible schema ● Pre-defined “static” columns ● Ad-hoc “dynamic” columns – Automatic sharding / replication – Eventual consistency 4
  • 5. Background: Cassandra's data model • “Column families” like tables • Row key → columns • Somewhat similar to SQL but some important differences. • Supercolumns are not supported 5
  • 6. CQL – Cassandra Query Language Looks like SQL at first glance 6 bash$ cqlsh -3 cqlsh> CREATE KEYSPACE mariadbtest ... WITH REPLICATION ={'class':'SimpleStrategy','replication_factor':1}; cqlsh> use mariadbtest; cqlsh:mariadbtest> create columnfamily cf1 ( pk varchar primary key, ... data1 varchar, data2 bigint ... ) with compact storage; cqlsh:mariadbtest> insert into cf1 (pk, data1,data2) ... values ('row1', 'data-in-cassandra', 1234); cqlsh:mariadbtest> select * from cf1; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234
  • 7. CQL is not SQL Similarity with SQL is superficial 7 cqlsh:mariadbtest> select * from cf1 where pk='row1'; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234 cqlsh:mariadbtest> select * from cf1 where data2=1234; Bad Request: No indexed columns present in by-columns clause with Equal operator cqlsh:mariadbtest> select * from cf1 where pk='row1' or pk='row2'; Bad Request: line 1:34 missing EOF at 'or' • No joins or subqueries • No GROUP BY, ORDER BY must be able to use available indexes • WHERE clause must represent an index lookup.
  • 8. Cassandra Storage Engine 8 Provides a “view” of Cassandra's data from MariaDB. Starts a NoCQL movement
  • 9. 1. Load the Cassandra SE plugin • Get MariaDB 10.0.1+ • Load the Cassandra plugin – From SQL: 9 MariaDB [(none)]> install plugin cassandra soname 'ha_cassandra.so'; [mysqld] ... plugin-load=ha_cassandra.so – Or, add a line to my.cnf: MariaDB [(none)]> show plugins; +--------------------+--------+-----------------+-----------------+---------+ | Name | Status | Type | Library | License | +--------------------+--------+-----------------+-----------------+---------+ ... | CASSANDRA | ACTIVE | STORAGE ENGINE | ha_cassandra.so | GPL | +--------------------+--------+-----------------+-----------------+---------+ • Check it is loaded
  • 10. 2. Connect to Cassandra • Create an SQL table which is a view of a column family 10 MariaDB [test]> set global cassandra_default_thrift_host='10.196.2.113'; Query OK, 0 rows affected (0.00 sec) MariaDB [test]> create table t2 (pk varchar(36) primary key, -> data1 varchar(60), -> data2 bigint -> ) engine=cassandra -> keyspace='mariadbtest' -> thrift_host='10.196.2.113' -> column_family='cf1'; Query OK, 0 rows affected (0.01 sec) – thrift_host can be set per-table – @@cassandra_default_thrift_host allows to ● Re-point the table to different node dynamically ● Not change table DDL when Cassandra IP changes.
  • 11. Possible gotchas 11 • SELinux blocks the connection MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ; ERROR 1429 (HY000): Unable to connect to foreign data source: connect() failed: Permission denied [1] MariaDB [test]> create table t1 ( ... ) engine=cassandra ... ; ERROR 1429 (HY000): Unable to connect to foreign data source: Column family cf1 not found in keyspace mariadbtest • Cassandra 1.2 and CFs without “COMPACT STORAGE” – Packaging bug – To get running quickly: echo 0 >/selinux/enforce – Caused by a change in Cassandra 1.2 – They broke Pig also – We intend to update CassandraSE for 1.2
  • 12. Accessing Cassandra data ● Can insert data 12 MariaDB [test]> insert into t2 values ('row2','data-from-mariadb', 123); Query OK, 1 row affected (0.00 sec) cqlsh:mariadbtest> select * from cf1; pk | data1 | data2 ------+-------------------+------- row1 | data-in-cassandra | 1234 row2 | data-from-mariadb | 123 • Cassandra sees inserted data MariaDB [test]> select * from t2; +------+-------------------+-------+ | pk | data1 | data2 | +------+-------------------+-------+ | row1 | data-in-cassandra | 1234 | +------+-------------------+-------+ • Can get Cassandra's data
  • 14. Data mapping between Cassandra and SQL 14 create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' • MariaDB table represents Cassandra's Column Family – Can use any table name, column_family=... specifies CF.
  • 15. Data mapping between Cassandra and SQL 15 create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' • MariaDB table represents Cassandra's Column Family – Can use any table name, column_family=... specifies CF. • Table must have a primary key – Name/type must match Cassandra's rowkey
  • 16. Data mapping between Cassandra and SQL 16 create table tbl ( pk varchar(36) primary key, data1 varchar(60), data2 bigint ) engine=cassandra keyspace='ks1' column_family='cf1' • MariaDB table represents Cassandra's Column Family – Can use any table name, column_family=... specifies CF. • Table must have a primary key – Name/type must match Cassandra's rowkey • Columns map to Cassandra's static columns – Name must be the same as in Cassandra – Datatypes must match – Can any subset of CF's columns
  • 17. Datatype mapping Cassandra MariaDB blob BLOB, VARBINARY(n) ascii BLOB, VARCHAR(n), use charset=latin1 text BLOB, VARCHAR(n), use charset=utf8 varint VARBINARY(n) int INT bigint BIGINT, TINY, SHORT uuid CHAR(36) (text in MariaDB) timestamp TIMESTAMP (second precision), TIMESTAMP(6) (microsecond precision), BIGINT boolean BOOL float FLOAT double DOUBLE decimal VARBINARY(n) counter BIGINT • CF column datatype determines MariaDB datatype
  • 18. Dynamic columns • Cassandra supports “dynamic column families” • Can access ad-hoc columns with MariaDB's dynamic columns feature 18 create table tbl ( rowkey type PRIMARY KEY column1 type, ... dynamic_cols blob DYNAMIC_COLUMN_STORAGE=yes ) engine=cassandra keyspace=... column_family=...; insert into tbl values (1, column_create('col1', 1, 'col2', 'value-2')); select rowkey, column_get(dynamic_cols, 'uuidcol' as char) from tbl;
  • 19. Data mapping is safe create table t3 (pk varchar(60) primary key, no_such_field int) engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1'; ERROR 1928 (HY000): Internal error: 'Field `no_such_field` could not be mapped to any field in Cassandra' create table t3 (pk varchar(60) primary key, data1 double) engine=cassandra `keyspace`='mariadbtest' `column_family`='cf1'; ERROR 1928 (HY000): Internal error: 'Failed to map column data1 to datatype org.apache.cassandra.db.marshal.UTF8Type' • Cassandra SE will refuse incorrect mappings
  • 21. Command Mapping ● Cassandra commands – PUT (upsert) – GET ● Scan – DELETE (if exists) ● SQL commands – SELECT → GET/Scan – INSERT → PUT (upsert) – UPDATE/DELETE → read+write.
  • 22. SELECT command mapping ● MariaDB has an SQL interpreter ● Cassandra SE supports lookups and scans ● Can now do – Arbitrary WHERE clauses – JOINs between Cassandra tables and MariaDB tables ● Batched Key Access is supported
  • 23. DML command mapping ● No SQL semantics – INSERT overwrites rows – UPDATE reads, then writes ● Have you updated what you read – DELETE reads, then deletes ● Can't be sure if/what you have deleted ● Not as bad as it sounds, it's Cassandra – Cassandra SE doesn't make it SQL.
  • 25. Cassandra use cases ● Collect massive amounts of data – Web page hits – Sensor updates ● Updates are naturally non-conflicting – Keyed by UUIDs, timestamps ● Reads are served with one lookup ● Good for certain kinds of data – Moving from SQL entirely may be difficult
  • 26. Cassandra SE use cases (1) ● Send an update to Cassandra – Be a sensor ● Grab a piece of data from Cassandra – “This web page was last viewed by …” – “Last known position of this user was ...”. Access Cassandra data from SQL
  • 27. Cassandra SE use cases (2) ● Want a special table that is – auto-replicated – fault-tolerant – Very fast? ● Get Cassandra, and create a Cassandra SE table. Coming from MySQL/MariaDB side:
  • 28. Cassandra Storage Engine non-use cases • Huge, sift-through-all-data joins – Use Pig • Bulk data transfer to/from Cassandra cluster – Use Sqoop • A replacement for InnoDB – No full SQL semantics 28
  • 29. A “benchmark” • One table • EC2 environment – m1.large nodes – Ephemeral disks • Stream of single-line INSERTs • Tried Innodb and Cassandra • Hardly any tuning
  • 30. Conclusions • Cassandra SE can be used to peek at data in Cassandra from MariaDB. • It is not a replacement for Pig/Hive • It is really easy to setup and use 30
  • 31. Going Forward • Looking for input • Do you want support for – Fast counter columns updates? – Awareness of Cassandra cluster topology? – Secondary indexes? – …? 31
  • 32. Resources • https://kb.askmonty.org/en/cassandrase/ • http://wiki.apache.org/cassandra/DataModel • http://cassandra.apache.org/ • http://www.datastax.com/docs/1.1/ddl/column_family 32
  • 34. Extra: Cassandra SE internals • Developed against Cassandra 1.1 • Uses Thrift API – cannot stream CQL resultset in 1.1 – Cant use secondary indexes • Only supports AllowAllAuthenticator • In Cassandra 1.2 – “CQL Binary Protocol” with streaming – CASSANDRA-5234: Thrift can only read CFs “WITH COMPACT STORAGE” 34