Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Introduction to Cassandra

  • Als Erste(r) kommentieren

Introduction to Cassandra

  1. 1. Introduction to Cassandra Shimi Kiviti @shimi_k
  2. 2. Motivation ScalingHow do you scale your database? ● reads ● writes
  3. 3. Influential Papers ● Bigtable: A distributed storage system for structured data, 2006 ● Dynamo: amazons highly available key-value store, 2007Cassandra: ● partition and replication - Dynamo ● log structure column family - Bigtable
  4. 4. Cassandra Highlights● Symmetric - all nodes are exactly the same ○ No single point of failure ○ Linearly scalable ○ Ease of administration● High availability with multiple datacenters● Consistency vs Latency● Read/Write anywhere● Flexible Schema● Column TTL● Distributed Counters
  5. 5. DHT - Distributed Hash Table
  6. 6. DHT● O(1) node lookup● Explicit replication● Linear Scalability
  7. 7. ConsistencyN = Replication factorR = Number of replicas to block when read <= NW = Number of replicas to block when write <= NQuorum = N/2 + 1When W + R > N there is a full consistencyexamples: ● W = 1, R = N ● W = N, R = 1 ● W = Quorum, R = Quorum
  8. 8. Consistency Level● Every request defines consistency level ○ Any ○ One ○ Two ○ Three ○ Quorum ○ Local Quorum ○ Each Quorum ○ All
  9. 9. Data Model● Keyspace ~ schema● ColumnFamilies ~ table● Rows● Columns
  10. 10. Column FamilyKey1 Column Column ColumnKey2 Column Column
  11. 11. Column FamilyColumnFamily: { TOK: { chen: 1, ronen: 7 } CityPath: { yuval: 5 }}
  12. 12. Super Column Family Super1 Column Column ColumnKey Super2 Column Column Column ColumnFamily: { Key: { super1: { name: value, name: value } super2: { name: value } } }
  13. 13. Write● Any node● Partitioner● Commit log, memtable● Wait for W responses
  14. 14. Write
  15. 15. Write● No reads● No seeks● Sequential disk access● Atomic within a column family● Fast● Always writeable (hinted hand-off)
  16. 16. Read● Choose any node● Partitioner● Wait for R responses● tunable read repair in the background
  17. 17. ReadRead can be from multiple SSTablesSlower then writes
  18. 18. Cache● There is no need to use memcached● There is an internal configurable cache ○ Key cache ○ Row cache
  19. 19. SortingWhen you preform get the result is sorted ● Rows are sorted according to the partitioner ● Columns in a row are sorted according to the type of the column name
  20. 20. Partitioner● RandomPartitioner - Uses hash values as tokens. useful for distributing the load on all nodes. If you use it, set the nodes tokens manually● OrderPreservePartioner - You can get sorted rows but it will cost you with an even cluster
  21. 21. Column TypesAvailable types: ● Bytes ● UTF8 ● Ascii ● Long ● Date ● UUID ● Composite - <Type1>:<Type2>
  22. 22. Column TypesExamples:Sort1:8 109 vs 810 9Sort2:dan:8 dan:10dan:10 vs dan:8shimi:1 shimi:1
  23. 23. Clients● Thrift - Cassandra driver level interface● CQL - Cassandra query language (SQL like)● High level clients: ○ Python ○ Java ○ Scala ○ Clojure ○ .Net ○ Ruby ○ PHP ○ Perl ○ C++ ○ Haskel
  24. 24. Cascal - Scala clientInsert column:session.insert("app" "users" "shimi" "passwd" "mypass")val key = "app" "users" "shimi"session.insert(key "email" "shimi.k@...")Get column value:val pass = session.get(key "passwd")
  25. 25. CascalGet multiple columns:val row = session.list(key)val cols = session.list(key, RangePredicate("email", "passwd"))val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))
  26. 26. CascalGet multiple rows:val family = "app" "users"val rows = session.list(family, RangePredicate("dan", "shimi"))val rows = session.list(family, KeyPrdicate("dan", "shimi"))
  27. 27. CascalRemove column:session.remove("app" "users" "shimi" "passwd")Remove row:session.remove("app" "users" "shimi")Batch operations:val deleteCols = Delete(key, ColumnPredicate("age" :: "sex"))val insertEmail = Insert(key "email" "shimi.k@...")session.batch(insertEmail :: deleteCols)
  28. 28. Guidelines● Keep together the data you query together● Think about your use case and how you should fetch your data.● Dont try to normalize your data● You cant win the disk● Be ready to get your hands dirty● There is no single solution for everything. You might consider using different solutions together
  29. 29. The EndUseful links: ● Cassandra, http://cassandra.apache.org/ ● Wiki http://wiki.apache.org/cassandra/ ● Cassandra mailing list ● IRC ● Bigtable, http://labs.google.com/papers/bigtable.html ● Dynamo http://www.allthingsdistributed. com/2007/10/amazons_dynamo.html ● Cascal, https://github.com/shimi/cascal

×