This document discusses migrating Oracle databases to Cassandra. Cassandra offers lower costs, supports more data types, and can scale to handle large volumes of data across multiple data centers. It also allows for more flexible data modeling and built-in compression. The document compares Cassandra and Oracle on features, provides examples of companies using Cassandra, and outlines best practices for data modeling in Cassandra. It also discusses strategies for migrating data from Oracle to Cassandra including using loaders, Sqoop, and Spark.
2. Why Cassandra
Lower Cost of ownership makes it #1 choice for Big Data OLTP Applications.
Unlike Oracle, Cassandra can store structured, semi-structured, and unstructured data.
Cassandra is the right choice when you need availability and performance at scale, normally costs
80-90+% less than just Oracle’s enterprise edition alone.
Oracle is not architected to tackle the new wave of big data, online applications developed today
Provide continuous availability with redundancy in both data and function across one or more
locations/ Data Centers vs. simple failover for the Oracle database.
Can handle high velocity data coming in via sensors, mobile devices, and the like, and have extreme
right speed and low latency query speed.
Support all types of workload without needing to ETL data in different data model.
Built-in data compressed up to 80% without performance overhead.
Migrating Oracle Databases To Cassandra Umair Mansoob
4. Comparing Cost Oracle vs Cassandra
Above cost combine with lack of support for unstructured Data can be no brainer for many
companies
Migrating Oracle Databases To Cassandra Umair Mansoob
5. Comparing with Other NoSQL Databases
Migrating Oracle Databases To Cassandra Umair Mansoob
7. Oracle vs Cassandra
Name Oracle Cassandra
Database Schema’s Yes Schema Free
Secondary Indexes Yes Limited
SQL Yes CQL ( DDL, DML)
Db Scripting Yes ( PL/SQL) No
Partitioning Methods Horizontal Partitioning Sharding
Consistency Immediate Eventual / Immediate
Concurrency Yes Yes
Durability Yes Yes
Multi-DataCenter Capabilities No Yes
Data Consistency Model CAP Theorem ACID
Data Compression Various Types of compression Built-in
Data modeling 3rd Normal Form 1st or 2nd Normal Form
Migrating Oracle Databases To Cassandra Umair Mansoob
8. When Cassandra Is not Right
ACID-compliant transactions, with nested transactions, commits/rollbacks, and full referential
integrity required
If you cannot avoid join operations and you cannot code join using programs.
If you application only has structured data , No even semi unstructured data is needed.
When application load is in the range of low – medium , where MYSQL might be a better
choice.
No requirement for a single database/cluster to span many different data centers.
High availability requirements can be accomplished via a synchronous replication architecture
that is primarily maintained at a single data center.
Migrating Oracle Databases To Cassandra Umair Mansoob
9. CAP vs ACID consistency
CAP stands for "consistency, availability, and partition tolerance.
The CAP theorem, states that, at most, only two of these properties can obtain in any shared-
data system
ACID (atomicity, consistency, isolation, durability) properties of a traditional relational
database management system (RDBMS Oracle)
ACID consistency is all about database rules. If a schema declares that a value must be unique,
then a consistent system will enforce uniqueness.
CAP consistency promises that every replica of the same logical value, spread across nodes in a
distributed system, has the same exact value at all times.
Migrating Oracle Databases To Cassandra Umair Mansoob
10. Achieving Data Consistency
Data written to a database cluster is first written to a commit log in the same fashion that
nearly every popular RDBMS does.
Cassandra offers tunable data consistency. This means a developer or administrator can
choose how strong they wish consistency across nodes to be.
The strongest form of consistency is to mandate that any data modifications be made to all
nodes.
Cassandra provides consistency in the CAP sense, in that all readers will see the same values.
Cassandra supports different type of consistency models (Strict consistency, Causal
consistency, Eventual consistency).
Migrating Oracle Databases To Cassandra Umair Mansoob
11. Data Modeling Best Practices
Don’t optimize your data model to minimize the Writes, they are cheap in Cassandra.
Don’t optimize your data model to minimize Data duplication, duplication is good for efficient
reads.
Focus on spread data evenly around the cluster by picking good primary key for table.
Focus on minimizing number of partition reads, ideally 1 partition per read because each
partition might reside in different nodes.
The way to minimize partition reads is to model your data to fit your queries.
In General you will use roughly one table per query pattern. If you need to support multiple
query patterns, you usually need more than one table.
Remember, data duplication is okay. Many of your tables may repeat the same data.
Migrating Oracle Databases To Cassandra Umair Mansoob
12. Migrating Data to Cassandra
Using Cassandra’s High-Speed Loader : Data from Oracle can be extracted into flat files that are
delimited in some way and then loaded into Cassandra tables via the CQL COPY command.
Using Sqoop : DataStax Enterprise supports Sqoop, which is a utility designed to transfer data
directly from an RDBMS like Oracle into Cassandra
Pentaho’s Data Integration product call Kettle with a free community edition.
Use Spark to Load Oracle Data into Cassandra.
ETL tools - there are a spread of ETL equipment (e.g. Informatica) that aid Cassandra as both a
supply and goal facts platform.
Migrating Oracle Databases To Cassandra Umair Mansoob
13. What parts of an Oracle database cannot
be migrated Cassandra
Stored procedures
Views
Triggers
Functions
Security privileges
Referential integrity constraints
Rules
Partitioned table definitions
Migrating Oracle Databases To Cassandra Umair Mansoob