The session will cover the best practices to migrate existing data from Apache Cassandra to Scylla and how to do it while being online all of the time.
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Scylla Summit 2017: Migrating to Scylla From Cassandra and Others With No Downtime
1. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Migration To Scylla
From Cassandra
Senior Solutions Architect, ScyllaDB
Alexander Sicular
2. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Alexander "Sasha" Sicular
2
● Over 16 years at Columbia University, the last seven as
Director of Medical Informatics, working in the field of
clinical informatics building EMR's, billing, data
integration and research systems.
● Having extensive experience in relational,
non-relational and distributed databases, Alexander
helps customers get the most out of Scylla as a Senior
Solutions Architect at ScyllaDB.
3. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
3
Agenda
+ Compatibility
+ DB Migration 101
+ Offline migration
+ Live migration
+ Migration From Cassandra to Scylla
+ Migration Tools
+ Best Practice
4. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Compatibility
5. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Scylla Compatibility
5
+ SSTable file format (Compatible to Cassandra 2.1)
+ Configuration file format (Compatible to Cassandra 2.1)
+ CQL language (CQL version 3.3.1)
+ CQL native protocol (CQL version 3.3.1)
+ JMX management protocol (Compatible to Cassandra 2.1)
+ Management command line (nodetool from C* 3.0)
+ All Drivers (Java, C++, Python, Node, Ruby, Go…)
6. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
DB Migration 101
7. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
DB Migration Steps
7
+ Schema Migration
+ Migrating Historical Data (Forklifting)
+ Migrating Live Data (Dual Writes)
+ Validation (Offline and/or Dual Reads)*
+ Fade out old DB
* Optional step
8. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Offline Migration
From DB-OLD to DB-NEW
8
Read from DB-NEW
Read / Write to DB-OLD
Write to DB-NEW
Time
Forklifting Historical Data
Validation*
Fade out
DB-OLDDBs in Sync
Down Time
Migrate Schema
9. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Live Migration
From DB-OLD to DB-NEW
9
Read from DB-OLD
Read from DB-NEW
Dual Reads*
Write to DB-OLD
Write to DB-NEW
Dual Writes
Time
Forklifting Historical Data
Validation*
DBs in Sync
Fade out
DB-OLD
Migrate Schema
10. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Migration Tools
11. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
11
Migration Multi DC cluster
SSTable
Loader
SSTables
CQL
Internal
communication
DC A
DC B
DC C
DC A
DC B
If every Cassandra DC holds the same
information, uploading from one of the DC's
sstables is sufficient.
Dual Write needs to be implemented in all
regions.
Number and RF of DC's does not have to be
preserved.
12. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
12
+ Use DESCRIBE to export each Cassandra Keyspace, Table, UDT (not including
system tables)
+ Cassandra
+ cqlsh "-e DESC SCHEMA" > schema.cql
+ Scylla
+ cqlsh --file ‘schema.cql’
+ When migrating from Cassandra 3.x some schema updates required
Migrate Schema
13. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
13
+ Update the application logic to send each write to both clusters (Cassandra
and Scylla) in parallel
+ Recommendations:
+ Compare the results and log inconsistencies, if any
+ Use client side timestamp
+ Create knobs for each DB writer, allowing you to stop/start writing to each DB in
runtime
+ Rolling application logic upgrade for zero downtime
+ Dual Read can follow the same logic
Dual Write
Client
CQLCQL
14. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
14
Use two different cluster sessions.
#connect to cluster 1
db1 = cassandra.cluster.Cluster(IP_C1).connect()
#connect to cluster 2
db2 = cassandra.cluster.Cluster(IP_C2).connect()
Dual Writes
15. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
15
Two prepared statements, one for each DB session.
#insert statement with explicit TIMESTAMP
insert_statement = "INSERT INTO keyspace.table (c1,c2)
VALUES (?,?) USING TIMESTAMP ?"
#prepared statements
prepared_statement_1 = db1.prepare(insert_statement)
prepared_statement_2 = db2.prepare(insert_statement)
Dual Writes
16. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
16
Create sample values, execute async insert statements.
#rand values, explicitly set a write time in microseconds
values = [random.randrange(0,1000) , str(uuid.uuid4()) , int(time.time()*1000000)]
# build a list of queries
inserts = []
#insert 1st statement into the 1st session
inserts.append(db1.execute_async(prepared_statement_1, values))
#insert 2nd statement into the 2nd session
inserts.append(db2.execute_async(prepared_statement_2, values))
Dual Writes
17. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
17
Return for results, log results and values in array.
# loop over futures and output success/fail
results = []
for i in range(0,len(inserts)):
try:
row = inserts[i].result()
results.append(1)
except Exception:
results.append(0)
results.append(values)
Dual Writes
18. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
18
Check for failures in either write.
#did we have failures?
if (results[0]==0):
#do something
log('Write to cluster 1 failed')
if (results[1]==0):
#do something
log('Write to cluster 2 failed')
Dual Writes
19. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
19
Forklifting Historical Data
+ Install Scylla’s sstableloader on Cassandra nodes, or on intermediate servers
+ Create snapshot of each Cassandra node
+ Run sstableloader from each Cassandra node
sstableloader -x -d [Scylla IP] .../[ks]/[table]
Or, from intermediate servers, using mount to Cassandra filesystem
sstableloader -x -d [scylla IP] .../[mount point] in /[ks]/[table] format
+ Watch for an affect on Cassandra nodes, and use throttling (-t) to limit the
loader throughput
SSTable
Loader
SSTables
CQL
20. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Best Practices
21. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
21
Best Practices
+ Clean up the origin database in advance. Don't waste
time on old data!
+ More data = longer migration time
+ Iterative migration and validation. For example one table,
one region, one user prefix, etc. After validation keep or
delete/restart that dataset
+ At any point: verify and validate. You can always roll back
to the origin DB for any reason
22. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
22
Best Practices… Continued
+ Make sure to have a monitoring stack in place for both
DBs and the application during the entire migration
+ Validate the process by sampling data at different points
+ Before fading out the origin DB, make sure there are no
live connections to it
+ Make sure all relevant users are aware of the process and
limitations (don't update your schema!)
+ Get Scylla involved. We want to help!
23. PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
THANK YOU!
siculars@scylladb.com
@siculars
Please stay in touch:
Any questions?