Cassandra - An Introduction
Upcoming SlideShare
Loading in...5
×
 

Cassandra - An Introduction

on

  • 5,879 Views

English translation of my slide for the talk held at LinuxTag 2011. I give an overview of Cassandra and talk about the experiences with Cassandra we've made using it for real-time analysis at ...

English translation of my slide for the talk held at LinuxTag 2011. I give an overview of Cassandra and talk about the experiences with Cassandra we've made using it for real-time analysis at TWIMPACT.

Statistiken

Views

Gesamtviews
5,879
Views auf SlideShare
5,667
Views einbetten
212

Actions

Gefällt mir
4
Downloads
154
Kommentare
1

5 Einbettungen 212

http://blog.mikiobraun.de 199
http://www.techgig.com 7
http://feeds.feedburner.com 4
url_unknown 1
http://translate.googleusercontent.com 1

Zugänglichkeit

Kategorien

Details hochladen

Uploaded via as Adobe PDF

Benutzerrechte

© Alle Rechte vorbehalten

Report content

Als unangemessen gemeldet Als unangemessen melden
Als unangemessen melden

Wählen Sie Ihren Grund, warum Sie diese Präsentation als unangemessen melden.

Löschen
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Ihre Nachricht erscheint hier
    Processing...
  • Can anyone send me the details of Cassandra Certification ?
    Are you sure you want to
    Ihre Nachricht erscheint hier
    Processing...
Kommentar posten
Kommentar bearbeiten

Cassandra - An Introduction Cassandra - An Introduction Presentation Transcript

  • Cassandra – An Introduction Mikio L. Braun Leo Jugel TU Berlin, twimpact LinuxTag Berlin 13. Mai 2011LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • What is NoSQL ● For many web applications, “classical data bases” are not the right choice: ● Database is just used for storing objects. ● Consistency not essential. ● A lot of concurrent access.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • NoSQL in comparisonClassical Databases NoSQLPowerful query language very simple query languageScales by using larger servers skales through clustering(“scaling up”) (“scaling out”)Changes of database schema very costly No fixed database schemaACID: Atomicity, Consistency, Isolation, Typically only “eventually consistent”DuratbilityTransactions, locking, etc. Typically no support for transactions etc.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de View slide
  • Brewers CAP Theorem ● CAP: Consistency, Availability, Partition Tolerance ● Consistency: You never get old data. ● Availability: read/write operations always possible. ● Partition Tolerance: other guarantees hold even if network of servers break. ● You can only have two of these!Gilbert, Lynch, Brewers conjecture and the feasibility of consistent, available, partition-tolerant web services, ACM SIGACT News, Volume 33, Issue 2, June 2002LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de View slide
  • Homepage http://cassandra.apache.orgLanguage JavaHistory ● Developed at Facebook for inbox search, released as Open Source in July 2008 ● Apache Incubator since March 2009 ● Apache Top-Level since February 2010Main Properties ● structured key value store ● “eventually consistent” ● fully equivalent nodes ● cluster can be modified without restartingSupport DataStax (http://datastax.com)Licence Apache 2.0LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Version 0.6.x and 0.7.x ● Most important changes in 0.7.x ● config file format changed from XML to YAML ● schema modification (ColumnFamilies) without restart ● Beginning support for secondary indices ● However, also problems with stability initially.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Inspirations for Cassandra ● Amazon Dynamo ● Clustering without dedicated master node ● Peer-to-peer discovery of nodes, HintedHintoff, etc. ● Google BigTable ● data model ● requires central master node ● Provides much more fine grained control: – which data should be stored together – on-the-fly compression, etc.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Installation ● Download tar.gz from http://cassandra.apache.org/download/ ● Unpack ● ./conf contains config files ● ./bin/cassandra -f to start Cassandra, Ctrl-C to stopLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Configuration ● Database ● Version 0.6.x: conf/storage-conf.xml ● Version 0.7.x: conf/cassandra.yaml ● JVM Parameters ● Version 0.6.x: bin/cassandra.in.sh ● Version 0.7.x: conf/cassandra-env.shLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Cassandras Data ModelKeyspace (= database) byte arrays Column Family (= table) Row key {name1: value1, name2: value2, name3: value3, ...} column strings sorted by name! sorted according to partitioner Super Column Family key key {name1: value1, ...}LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Example: Simple Object Store class Person { long id; String name; String affiliation; } Convert fields to byte arrays Keyspace “MyDatabase”: ColumnFamily “Person”: “1”: {“id”: “1”, “name”: “Mikio Braun, “affiliation”: “TU Berlin”}LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Example: Index class Page { long id; … Object data fields List<Links> links; } Keyspace “MyDatabase” ColumnFamily “Pages” class Link { “3”: {“id”: 3, …} long id; “4”: {“id”: 4, …} ... Used for both, linking int numberOfHits; ColumnFamily “Links” and indexing! } “1”: {“id”: 1, “url”: …} “17”. {“id”: 17, “url”: …} ColumnFamily “LinksPerPageByNumberOfHits” “3”: { “00000132:00000001”: “t”, “000025: 00000017”: … “4”: { “00000044:00000024”: “t”, … } Here we exploit that columns are sorted by their names. Of course, everything encoded in byte arrays, not ASCIILinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Are SuperColumnFamilies necessary? ● Usually, you can replace a SuperColumnFamily by several CollumnFamilies. ● Since SuperColumnFamilies make the implementation and the protocol more compelx, there are also people advocating the remove SuperCFs... .LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Cassandras Architecture MemTable Read Operation Flush Memory DiskWrite Operation Commit Log SSTable SSTable SSTable Compaction! LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Cassandras API ● THRIFT-based APIRead operations Write operationsget single column insert single columnget_slice range of columns batch_mutate several columns inmultiget_slice range of columns in several rows several rows remove single columnget_count column count truncate while ColumnFamilyget_range_slice several columns from range of rowsget_indexed_slices range of columns from indexSonstigelogin, describe_*, add/drop column family/keyspace since 0.7.x LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Cassandra Clustering ● Fully equivalent nodes, no master node. ● Bootstrapping requires seed node. “Storage Proxy” Node Node Node Reads/writes according to consistency level QueryLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Consistency Level and Replication Factor●Replication factor: On how many nodes is apiece of data stored?● Consistency level:Consistency LevelANY A node has received the operation, even a HintedHandoff node.ONE One node has completed the request.QUORUM Operation has completed on majority of nodes / newest result is returned.LOCAL_QUORUM QUORUM in local data centerGLOBAL_QUORUM QUORUM in global data centerALL Wait till all nodes have completed the requestLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • How to deal with failure● As long as requirements of the consistency level can be met, everything is fine.● Hinted Handoff: ● A write operation for a faulty node is stored on another node and pushed to the other node once it is available again. ● Data wont be readable after write!● Read Repair: ● After read operation has completed, data will be compared and updated on all nodes in the background.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • LibrariesPython Pycassa: http://github.com/pycassa/pycass Telephus: http://github.com/driftx/TelephusJava Datanucleus JDO:http://github.com/tnine/Datanucleus-Cassandra-Plugin Hector: http://github.com/rantav/hector Kundera http://code.google.com/p/kundera/ Pelops: http://github.com/s7/scale7-pelopsGrails grails-cassandra: https://github.com/wolpert/grails-cassandra.NET Aquiles: http://aquiles.codeplex.com/ FluentCassandra: http://github.com/managedfusion/fluentcassandraRuby Cassandra: http://github.com/fauna/cassandraPHP phpcassa: http://github.com/thobbs/phpcassa SimpleCassie: http://code.google.com/p/simpletools-php/wiki/SimpleCassieOr roll your own based on THRIFT http://thrift.apache.org/ :)LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • TWIMPACT: An Application ● Real-time analysis of Twitter ● Trend analysis based on retweets ● Very high data rate (several million tweets per day, about 50 per second)LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • TWIMPACT: twimpact.jpLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • TWIMPACT: twimpact.comLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Application Profile ● Information about tweets, users, and retweets ● Text matching for non-API-retweets ● Retweet frequency and user impact ● Operation profile: get_slice get get_slice batch_mutate insert batch_mutate remove (all) (range) (one row) Fraction 50.1% 6.0% 0.1% 14.9% 21.5% 6.8% 0.8% Duration 1.1ms 1.7ms 0.8ms 0.9ms 1.1ms 0.8ms 1.2msLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Practical Experiences with Cassandra ● Very stable ● Read operations relatively expensive ● Multithreading leads to a huge performance increase ● Requires quite extensive tuning ● Clustering doesnt automatically lead to better performance ● Compaction leads to performance decrease of up to 50%LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Performance through Multithreading ● Multithreading leads to much higher throughput ● How to achieve multithreading without locking support? 64 32 16 8 4 2 1 Core i7, 4 cores (2 + 2 HT)LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Performance through Multithreading ● Multithreading leads to much higher throughput ● How to achieve multithreading without locking support?LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Cassandra Tuning ● Tuning opportunities: ● Size of memtables, thresholds for flushes ● Size of JVM Heap ● Frequency and depth of compaction ● Where? ● MemTableThresholds etc. in conf/cassandra.yaml ● JVM Parameters in conf/cassandra-env.shLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Overview of JVM GC Old Generation Young Generation CMSInitiatingOccupancyFraction “Eden” “Survivors” Additional memory usage while GC up to a few hundred MB dozens of GBs is runningLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Cassandras Memory Usage Flush Memtables, indexes, etc.Size of Memtable: 128M, JVM Heap: 3G, #CF: 12 Compaction LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Cassandras Memory Usage ● Memtables may survive for a very long time (up to several hours) ● are placed in old generation ● GC has to process several dozen GBs ● heap to small, GC triggered too late  “GC storm” ● Trade-off: ● I/O load vs. memory usage ● Do not neglect compaction!LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • The Effects of GC and Compactions Große GC CompactionLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Cluster vs Single Node● Our set-up: ● 1 Cluster with six-core CPU and RAID 5 with 6 hard disks ● 4 Cluster with six-core CPU and RAID 0 with 2 hard disks● Single node consistently performs 1,5-3 times better.● Possible causes: ● Overhead through network communication/consistency levels, etc. ● Hard disk performance significant ● Cluster still too small● Effectively available disk space: ● 1 Cluster: 6 * 500 GB = 3TB with RAID 5 = 2.5 TB (83%) ● 4 Cluster: 4 * 1TB = 4TB with replication factor 2 = 2TB (50%)LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Alternatives ● MongoDB, CouchDB, redis, even memcached... . ● Persistency: Disk or RAM? ● Replication: Master/Slave or Peer-to-Peer? ● Sharding? ● Upcoming trend towards more complex query languages (Javascript), map-reduce operations, etc.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Summary: Cassandra ● Platform which scales well ● Active user and developer community ● Read operations quite expensive ● For optimal performance, extensive tuning necessary ● Depending on your application, eventually consistent and lack of transactions/locking might be problematic.LinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de
  • Links● Apache Cassandra http://cassandra.apache.org● Apache Cassandra Wiki http://wiki.apache.org/cassandra/FrontPage● DataStax Dokumentation für Cassandra http://www.datastax.com/docs/0.7/index● My Blog: http://blog.mikiobraun.de● Twimpact: http://beta.twimpact.comLinuxTag Berlin, 13. 5. 2011 (c) 2011 by Mikio L. Braun @mikiobraun, blog.mikiobraun.de