Diese Präsentation wurde erfolgreich gemeldet.

Cassandra - An Introduction

4

Teilen

Nächste SlideShare
Ayuda migracion
Ayuda migracion
Wird geladen in …3
×
1 von 35
1 von 35

Cassandra - An Introduction

4

Teilen

Herunterladen, um offline zu lesen

English translation of my slide for the talk held at LinuxTag 2011. I give an overview of Cassandra and talk about the experiences with Cassandra we've made using it for real-time analysis at TWIMPACT.

English translation of my slide for the talk held at LinuxTag 2011. I give an overview of Cassandra and talk about the experiences with Cassandra we've made using it for real-time analysis at TWIMPACT.

Weitere Verwandte Inhalte

Ähnliche Bücher

Kostenlos mit einer 14-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 14-tägigen Testversion von Scribd

Alle anzeigen

Cassandra - An Introduction

  1. 1. Cassandra – An Introduction Mikio L. Braun Leo Jugel TU Berlin, twimpact LinuxTag Berlin 13. Mai 2011 LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  2. 2. What is NoSQL ● For many web applications, “classical data bases” are not the right choice: ● Database is just used for storing objects. ● Consistency not essential. ● A lot of concurrent access. LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  3. 3. NoSQL in comparison Classical Databases NoSQL Powerful query language very simple query language Scales by using larger servers skales through clustering (“scaling up”) (“scaling out”) Changes of database schema very costly No fixed database schema ACID: Atomicity, Consistency, Isolation, Typically only “eventually consistent” Duratbility Transactions, locking, etc. Typically no support for transactions etc. LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  4. 4. Brewer's CAP Theorem ● CAP: Consistency, Availability, Partition Tolerance ● Consistency: You never get old data. ● Availability: read/write operations always possible. ● Partition Tolerance: other guarantees hold even if network of servers break. ● You can only have two of these! Gilbert, Lynch, Brewer's conjecture and the feasibility of consistent, available, partition- tolerant web services, ACM SIGACT News, Volume 33, Issue 2, June 2002 LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  5. 5. Homepage http://cassandra.apache.org Language Java History ● Developed at Facebook for inbox search, released as Open Source in July 2008 ● Apache Incubator since March 2009 ● Apache Top-Level since February 2010 Main Properties ● structured key value store ● “eventually consistent” ● fully equivalent nodes ● cluster can be modified without restarting Support DataStax (http://datastax.com) Licence Apache 2.0 LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  6. 6. Version 0.6.x and 0.7.x ● Most important changes in 0.7.x ● config file format changed from XML to YAML ● schema modification (ColumnFamilies) without restart ● Beginning support for secondary indices ● However, also problems with stability initially. LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  7. 7. Inspirations for Cassandra ● Amazon Dynamo ● Clustering without dedicated master node ● Peer-to-peer discovery of nodes, HintedHintoff, etc. ● Google BigTable ● data model ● requires central master node ● Provides much more fine grained control: – which data should be stored together – on-the-fly compression, etc. LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  8. 8. Installation ● Download tar.gz from http://cassandra.apache.org/download/ ● Unpack ● ./conf contains config files ● ./bin/cassandra -f to start Cassandra, Ctrl-C to stop LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  9. 9. Configuration ● Database ● Version 0.6.x: conf/storage-conf.xml ● Version 0.7.x: conf/cassandra.yaml ● JVM Parameters ● Version 0.6.x: bin/cassandra.in.sh ● Version 0.7.x: conf/cassandra-env.sh LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  10. 10. Cassandra's Data Model Keyspace (= database) byte arrays Column Family (= table) Row key {name1: value1, name2: value2, name3: value3, ...} column strings sorted by name! sorted according to partitioner Super Column Family key key {name1: value1, ...} LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  11. 11. Example: Simple Object Store class Person { long id; String name; String affiliation; } Convert fields to byte arrays Keyspace “MyDatabase”: ColumnFamily “Person”: “1”: {“id”: “1”, “name”: “Mikio Braun, “affiliation”: “TU Berlin”} LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  12. 12. Example: Index class Page { long id; … Object data fields List<Links> links; } Keyspace “MyDatabase” ColumnFamily “Pages” class Link { “3”: {“id”: 3, …} long id; “4”: {“id”: 4, …} ... Used for both, linking int numberOfHits; ColumnFamily “Links” and indexing! } “1”: {“id”: 1, “url”: …} “17”. {“id”: 17, “url”: …} ColumnFamily “LinksPerPageByNumberOfHits” “3”: { “00000132:00000001”: “t”, “000025: 00000017”: … “4”: { “00000044:00000024”: “t”, … } Here we exploit that columns are sorted by their names. Of course, everything encoded in byte arrays, not ASCII LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  13. 13. Are SuperColumnFamilies necessary? ● Usually, you can replace a SuperColumnFamily by several CollumnFamilies. ● Since SuperColumnFamilies make the implementation and the protocol more compelx, there are also people advocating the remove SuperCFs... . LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  14. 14. Cassandra's Architecture MemTable Read Operation Flush Memory Disk Write Operation Commit Log SSTable SSTable SSTable Compaction! LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  15. 15. Cassandras API ● THRIFT-based API Read operations Write operations get single column insert single column get_slice range of columns batch_mutate several columns in multiget_slice range of columns in several rows several rows remove single column get_count column count truncate while ColumnFamily get_range_slice several columns from range of rows get_indexed_slices range of columns from index Sonstige login, describe_*, add/drop column family/keyspace since 0.7.x LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  16. 16. Cassandra Clustering ● Fully equivalent nodes, no master node. ● Bootstrapping requires seed node. “Storage Proxy” Node Node Node Reads/writes according to consistency level Query LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  17. 17. Consistency Level and Replication Factor ●Replication factor: On how many nodes is a piece of data stored? ● Consistency level: Consistency Level ANY A node has received the operation, even a HintedHandoff node. ONE One node has completed the request. QUORUM Operation has completed on majority of nodes / newest result is returned. LOCAL_QUORUM QUORUM in local data center GLOBAL_QUORUM QUORUM in global data center ALL Wait till all nodes have completed the request LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  18. 18. How to deal with failure ● As long as requirements of the consistency level can be met, everything is fine. ● Hinted Handoff: ● A write operation for a faulty node is stored on another node and pushed to the other node once it is available again. ● Data won't be readable after write! ● Read Repair: ● After read operation has completed, data will be compared and updated on all nodes in the background. LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  19. 19. Libraries Python Pycassa: http://github.com/pycassa/pycass Telephus: http://github.com/driftx/Telephus Java Datanucleus JDO:http://github.com/tnine/Datanucleus-Cassandra-Plugin Hector: http://github.com/rantav/hector Kundera http://code.google.com/p/kundera/ Pelops: http://github.com/s7/scale7-pelops Grails grails-cassandra: https://github.com/wolpert/grails-cassandra .NET Aquiles: http://aquiles.codeplex.com/ FluentCassandra: http://github.com/managedfusion/fluentcassandra Ruby Cassandra: http://github.com/fauna/cassandra PHP phpcassa: http://github.com/thobbs/phpcassa SimpleCassie: http://code.google.com/p/simpletools-php/wiki/SimpleCassie Or roll your own based on THRIFT http://thrift.apache.org/ :) LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  20. 20. TWIMPACT: An Application ● Real-time analysis of Twitter ● Trend analysis based on retweets ● Very high data rate (several million tweets per day, about 50 per second) LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  21. 21. TWIMPACT: twimpact.jp LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  22. 22. TWIMPACT: twimpact.com LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  23. 23. Application Profile ● Information about tweets, users, and retweets ● Text matching for non-API-retweets ● Retweet frequency and user impact ● Operation profile: get_slice get get_slice batch_mutate insert batch_mutate remove (all) (range) (one row) Fraction 50.1% 6.0% 0.1% 14.9% 21.5% 6.8% 0.8% Duration 1.1ms 1.7ms 0.8ms 0.9ms 1.1ms 0.8ms 1.2ms LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  24. 24. Practical Experiences with Cassandra ● Very stable ● Read operations relatively expensive ● Multithreading leads to a huge performance increase ● Requires quite extensive tuning ● Clustering doesn't automatically lead to better performance ● Compaction leads to performance decrease of up to 50% LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  25. 25. Performance through Multithreading ● Multithreading leads to much higher throughput ● How to achieve multithreading without locking support? 64 32 16 8 4 2 1 Core i7, 4 cores (2 + 2 HT) LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  26. 26. Performance through Multithreading ● Multithreading leads to much higher throughput ● How to achieve multithreading without locking support? LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  27. 27. Cassandra Tuning ● Tuning opportunities: ● Size of memtables, thresholds for flushes ● Size of JVM Heap ● Frequency and depth of compaction ● Where? ● MemTableThresholds etc. in conf/cassandra.yaml ● JVM Parameters in conf/cassandra-env.sh LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  28. 28. Overview of JVM GC Old Generation Young Generation CMSInitiatingOccupancyFraction “Eden” “Survivors” Additional memory usage while GC up to a few hundred MB dozens of GBs is running LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  29. 29. Cassandra's Memory Usage Flush Memtables, indexes, etc. Size of Memtable: 128M, JVM Heap: 3G, #CF: 12 Compaction LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  30. 30. Cassandra's Memory Usage ● Memtables may survive for a very long time (up to several hours) ● are placed in old generation ● GC has to process several dozen GBs ● heap to small, GC triggered too late  “GC storm” ● Trade-off: ● I/O load vs. memory usage ● Do not neglect compaction! LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  31. 31. The Effects of GC and Compactions Große GC Compaction LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  32. 32. Cluster vs Single Node ● Our set-up: ● 1 Cluster with six-core CPU and RAID 5 with 6 hard disks ● 4 Cluster with six-core CPU and RAID 0 with 2 hard disks ● Single node consistently performs 1,5-3 times better. ● Possible causes: ● Overhead through network communication/consistency levels, etc. ● Hard disk performance significant ● Cluster still too small ● Effectively available disk space: ● 1 Cluster: 6 * 500 GB = 3TB with RAID 5 = 2.5 TB (83%) ● 4 Cluster: 4 * 1TB = 4TB with replication factor 2 = 2TB (50%) LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  33. 33. Alternatives ● MongoDB, CouchDB, redis, even memcached... . ● Persistency: Disk or RAM? ● Replication: Master/Slave or Peer-to-Peer? ● Sharding? ● Upcoming trend towards more complex query languages (Javascript), map-reduce operations, etc. LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  34. 34. Summary: Cassandra ● Platform which scales well ● Active user and developer community ● Read operations quite expensive ● For optimal performance, extensive tuning necessary ● Depending on your application, eventually consistent and lack of transactions/locking might be problematic. LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  35. 35. Links ● Apache Cassandra http://cassandra.apache.org ● Apache Cassandra Wiki http://wiki.apache.org/cassandra/FrontPage ● DataStax Dokumentation für Cassandra http://www.datastax.com/docs/0.7/index ● My Blog: http://blog.mikiobraun.de ● Twimpact: http://beta.twimpact.com LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de

×