C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Hindsight is 20/20:
MySQL to Cassandra
Michael Kjellman (@mkjellman)
Barracuda Networks
#cassandra13

What I Do
• Build and maintain “real-time” Spam detection
and Web Filter classification
• Java/Perl/C (and bits of everything else)
• Author perlcassa (Perl C* client)
• Frontend? Backend? Customer? Internal?
Broken RAID Card? Bad Disk? I touch it all.
#cassandra13

Our C* Cluster
• In production for ~2 years since 0.8
• Running 1.2.5 + minor patches
• 24 nodes in 2 datacenters
• (2) 2TB Hard Drives (no RAID)
• (1) Small SSD for small hot CFs
• 64GB of RAM
• Puppet for management
• Cobbler for deployment
• Target max load at 600GB per node
#cassandra13

What is “real-time” exactly?
#cassandra13

Our Rewrite by the Numbers
Cassandra Based MySQL Based
Average Application Latency 2.41ms 5.0ms
Elements in Database 32,836,767 3,946,713
Elements Application Handles 32,836,767 314,974
Element Seen Prior to Tracking 1st request Various Thresholds
Datacenters 2 1
Average Latency of Automated
Classification
3 seconds 8 minutes
#cassandra13

Should you Rewrite?
• How To Survive a Ground-Up Rewrite Without
Losing Your Sanity[1] – Joel Spolsky
• Past engineering decisions preventing
implementation of new business requirements
• New threats smarter and more targeted
[1]http://onstartups.com/tabid/3339/bid/97052/How-To-Survive-a-Ground-Up-Rewrite-Without-Losing-Your-Sanity.aspx
#cassandra13

Evolving Legacy Systems
• Even good developers can write sloppy code
• Too much duct tape
– Most layers applied around the database
#cassandra13

Hitting the Reset Button
• Plan for continuous failure
• Easily Scalable
• No Single Point of Failure – that you know of
• Many smaller boxes vs. one monolithic box
#cassandra13

Whiteboard to Reality
• Get technical buy-in from all parties
• Migrate and rewrite in stages
– Business requirements forced hybrid period with the
old and new systems operated in parallel
#cassandra13

Cassandra is Not…
1. Direct MySQL replacement
2. Magic bullet to solve everything
#cassandra13

Migrating
• Painful
• Painful
• Painful
• Tons of rewriting
• Tons of regressions
• Did I say painful?
#cassandra13

So Why Migrate?
• C* is the best option for persistence tier
• Business success motivation
• Don‟t let your database hold you back
#cassandra13

Lessons Learned (the good)
• Carefully defining data model up front
• Creating a flexible systems architecture that
adapts well to changes during implementation
• Seriously – “Measure twice, cut once.”
#cassandra13

Lessons Learned (the bad)
• Consider migration and delivery requirements
from the very beginning
• Adjust expectations – didn‟t expect relying on
legacy systems for so long
• Make syncing data between systems a priority
#cassandra13

Tips
1. Define requirements early
2. Start with the queries
3. Think differently regarding reads
4. Syncing and migrating data
5. Don‟t use C* as a queue
6. Estimate capacity
7. Automate, Automate, Automate
8. Some maintenance required
#cassandra13

1. Define Requirements Early
• What kind of queries will your application make?
• Do you need ordered results for all of your
rows?
• What is your read load? Write load?
#cassandra13

2. Start with the Queries
• C* != “#dontneedtothinkaboutmyschema”
• Counters and Composites
• Optimize for use case
– Don‟t be afraid of writes. Storage is cheap.
– Optimize to reduce the number of tombstones
#cassandra13

3. Think Differently Regarding Reads
• Do you really need all that data at once?
• mysql> SELECT * FROM mysupercooltable WHERE
foo = ‘bar’;
– Slow, but eventually will work
• cqlsh> SELECT * FROM myreallybigcf WHERE foo
= ‘bar’;
– Won‟t work. Expect RPC timeout exceptions on reads generally
after ~10,000 rows even with paging
• Our solutions:
– ElasticSearch
– Hadoop/Pig
#cassandra13

4. Syncing and Migrating Data
• Sync and migration scripts – take more seriously
than production code
• Design sync to be continuous with both systems
running in parallel during migration
• Prioritize the sync
#cassandra13

5. Don‟t use C* as a Queue
• Cassandra anti-patterns: Queues and queue-like
datasets[2] – Aleksey Yeschenko
• Tombstones + read performance
• Our solution:
– Kafka (multiple publisher, multiple consumer durable
queue)
[2]http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
#cassandra13

6. Estimate Capacity
• Don‟t forget the Java heap (8GB Max)
• Plan capacity – today and future
• Stress Tool – profile node and multiply
• MySQL hardware != Cassandra hardware
• New bottlenecks thanks to C* being so
awesome?
• I/O still an important concern with C*
#cassandra13

7. Automate, Automate, Automate
• Love your inner Ops self. Distributed systems
move complexity to operations.
• Puppet or something similar (really)
• Learn CCM earlier rather than later
– www.github.com/pcmanus/ccm
#cassandra13

8. Some Maintenance Required
• Repairs & Cleanup ops
– automate and run frequently
• Rolling restart meet rolling
repair
• Learn jconsole
• Solution:
– Jolokia (JMX via HTTP)
#cassandra13

Where is Barracuda Today?
• 2 years in production with Cassandra
• Definitely the right choice for our persistence tier
• 2 product lines on C* based system and another
major product in beta
• Achieved “real-time” response
#cassandra13

2.0 and Beyond
• Thrift -> CQL
• CQL helps the MySQL to C* migration
– Easier to comprehend / grasp
• Everyone understands SELECT * FROM cf WHERE
key = „foo‟;
• CAS and other 2.0 features make C* an even
better replacement option for MySQL
#cassandra13

C* Community
• Supercalifragilisticexpialidocious community!
• Riak, HBase, Oracle are other options. How is
their dev community?
• Great client support. Great people. Great
motivated developers.
• IRC: #cassandra on freenode
• Mailing List: user@cassandra.apache.org
#cassandra13

C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Ähnlich wie C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman (20)

Mehr von DataStax Academy

Mehr von DataStax Academy (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

C* Summit 2013 - Hindsight is 20/20. MySQL to Cassandra by Michael Kjellman

Hinweis der Redaktion