SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Oh why is my
cluster’s read
performance
   terrible?
What it is.
• In one sentence:
  – A data store that resembles:
     • A Hash table (k/v) that’s evenly distributed across a
       cluster of servers. Practically speaking, not n level k/v.
     • Or, An excel spread sheet that is chopped up and
       housed on different servers.
     • Basic Nomenclature..
Where did it come from?
• Legacy:
  – Google Big Table and Amazon Dynamo.
  – Open sourced by Facebook in 2009.
  – They had designed it for ‘Inbox search’
What it is not.
• Not a general purpose data store.
   – Highly specialized use cases.
        • The Business use case must align with Cassandra architecture.
   –   No transactions
   –   No joins. (make multiple roundtrips). De-normalize data.
   –   No stored procedures.
   –   No range queries across keys (default)
   –   No Referential Integrity (pk constraint, foreign keys).
   –   No locking.
   –   Uses timestamp to upsert data.
   –   Charlie must be aghast.
Who uses it?
• Web scale companies.
  – Netflix, Twitter.
     • Capture clickstream data.
     • User activity/gaming.
     • Backing store for search tools (lucene)
         – Structured/Unstructured data.

• Trend: Web scale companies moving from
  distributed Mysql to Cassandra.
Interest over time/google trend…
Netflix slide..
Where do people use it?
• Mostly in analytic/reporting ‘stacks’.
  – Fire hose (value proposition 1) in vast amounts of ‘log like’
    data.
  – Hopefully your data model ensures that your data
    is physically clustered (value proposition 2) on read.
     - Data that is physically clustered is conducive to
       reporting.
     - Can be used ‘real time’ but not its strength.
Important to know right up front.
• Designed for High Write rates (all activity is sequential io).
    – If improperly used, read performance will suffer.
    – Always strive to minimize disk seek on read.
• Millions of Inserts should result in 10’s of thousands of Row Keys. (not
  millions of keys)
• Main usage pattern: High Write / Low Read (rates). See Netflix slide.
• Anti-pattern: (oltp like) Millions of inserts / Millions of reads. (for main
  data tables)
• If your Cassandra use is kosher, then you will find that IO is the bottleneck.
  Need better performance? Simply add more boxes (more io bandwidth to
  your cluster)
• It’s all about physically clustering your data for efficient reads.
• You have a query in mind, well, design a Cassandra table that satisfies your
  query. (lots of data duplication all over the place). Make sure that your
  query is satisfied by navigating to a single row key.
• Favors throughput over latency.
• With analytics/reporting in mind:
  – Let’s explore RDMS storage inefficiency (For large
    query) and Cassandra’s value proposition # 2.
Data in an RDBMS (physical)
                                         Block size 8k   Symbol Price     Time

Select * …
                                         db block 1      MSFT     28.01    t1    1 k row size
Where
                                                         …
Symbol=MSFT                                              …
                                                         MSFT     28.03    t5

Minimum IO = 24K (8k x blocks visited)
3 seeks.                                 db block 20 …

Slow.                                                    …
                                                         MSFT     28.03    t7
                                                         …




                                         db block
                                         1000            …
                                                         …
                                                         MSFT     28.01   t22
                                                         …
Data in a Cassandra (physical)

Select *
Where                     KEY      Col/Value

Symbol=MSFT               MSFT     t1 => 28.03   t5 => 28.03   t7=>28.03   t22=>28.01



Minimum IO = 8K (8k x 1)
- 1 seek to KEY (+ overhead), then sequentially read the data.
- You want to make sure that you are getting a lot of value per seek!
Cassandra likes “Wide Rows”.
- Your data is physically clustered and sorted (t1,t5…).
- Millions of inserts have resulted in thousands of keys. (high write/ low
    read)
- Fast
What it is, in depth:
• Log Structured Data store. (all activity is
  sequentially written)
• Favors Availability and Partition tolerance over
  Consistency.
   – Consistency is tunable, but if you tune for high
     consistency you trade off performance.
   – Cassandra consistency is not the same as database
     consistency. It is ReadYourWrites consistency.
• Column oriented.
• TTL (time to live / expire data )
• Compaction (coalesce data in files)
System Properties
• Distributed / elastic scalability. (value proposition 3)
• Fault Tolerant – Rack aware, Inter/intra
  datacenter data replication. (value proposition 4)
• Peer to peer, no single point of failure. (value
  proposition 5) (write/read from any node, it will act

  as the proxy to the cluster). No master node.
• Durable.
Evenly distributes data (default)
• Consistent hashing.
• Token Range: 0 – 2^127-1
• Your ‘key’ gets
Assigned a token.
Eg. Key = smith = token
15, place it on the Eastern
Node.
Replication Factor = 3
• Consistency Level
• Hinted Handoff
Consistency Level
DataTypes.
ACID?
• A/I/D ( in bits and bobs)
• BASE. Basically Available Soft-state Eventual
  consistency
Cassandra/Future
• Will slowly take on more rdbms like features.
  – Cassandra 1.1 has row level isolation. Previously
    you could read some one else’s inflight data.
Reference: CAP Theorem.
• Consistency (all nodes see the same data at
  the same time)
• Availability (a guarantee that every request
  receives a response about whether it was
  successful or failed)
• Partition tolerance (the system continues to
  operate despite arbitrary message loss or
  failure of part of the system)

Weitere ähnliche Inhalte

Was ist angesagt?

Hibernate caching
Hibernate cachingHibernate caching
Hibernate caching
bsudy
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 

Was ist angesagt? (20)

Java volatile
Java volatileJava volatile
Java volatile
 
Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandra
 
Cassandra advanced part-ll
Cassandra advanced part-llCassandra advanced part-ll
Cassandra advanced part-ll
 
Hibernate caching
Hibernate cachingHibernate caching
Hibernate caching
 
Hibernate caching
Hibernate cachingHibernate caching
Hibernate caching
 
Data corruption
Data corruptionData corruption
Data corruption
 
Blosc Talk by Francesc Alted from PyData London 2014
Blosc Talk by Francesc Alted from PyData London 2014Blosc Talk by Francesc Alted from PyData London 2014
Blosc Talk by Francesc Alted from PyData London 2014
 
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...
Blosc: Sending Data from Memory to CPU (and back) Faster than Memcpy by Franc...
 
Building a PII scrubbing layer
Building a PII scrubbing layerBuilding a PII scrubbing layer
Building a PII scrubbing layer
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Spanner osdi2012
Spanner osdi2012Spanner osdi2012
Spanner osdi2012
 
Hoard: A Scalable Memory Allocator for Multithreaded Applications
Hoard: A Scalable Memory Allocator for Multithreaded ApplicationsHoard: A Scalable Memory Allocator for Multithreaded Applications
Hoard: A Scalable Memory Allocator for Multithreaded Applications
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Ndb cluster 80_dbt2_5_tb
Ndb cluster 80_dbt2_5_tbNdb cluster 80_dbt2_5_tb
Ndb cluster 80_dbt2_5_tb
 
Redis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HARedis as a Main Database, Scaling and HA
Redis as a Main Database, Scaling and HA
 
Slide smallfiles
Slide smallfilesSlide smallfiles
Slide smallfiles
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
ScimoreDB @ CommunityDays 2011
ScimoreDB @ CommunityDays 2011ScimoreDB @ CommunityDays 2011
ScimoreDB @ CommunityDays 2011
 
OSDC 2013 | Neues in DRBD9 by Philipp Reisner
OSDC 2013 | Neues in DRBD9 by Philipp ReisnerOSDC 2013 | Neues in DRBD9 by Philipp Reisner
OSDC 2013 | Neues in DRBD9 by Philipp Reisner
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 

Ähnlich wie Apache Cassandra Opinion and Fact

Deployment Preparedness
Deployment Preparedness Deployment Preparedness
Deployment Preparedness
MongoDB
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Boris Yen
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures
小新 制造
 

Ähnlich wie Apache Cassandra Opinion and Fact (20)

Storm presentation
Storm presentationStorm presentation
Storm presentation
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Deployment Preparedness
Deployment Preparedness Deployment Preparedness
Deployment Preparedness
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Memory-Based Cloud Architectures
Memory-Based Cloud ArchitecturesMemory-Based Cloud Architectures
Memory-Based Cloud Architectures
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Percona FT / TokuDB
Percona FT / TokuDBPercona FT / TokuDB
Percona FT / TokuDB
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Cassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUGCassandra, Modeling and Availability at AMUG
Cassandra, Modeling and Availability at AMUG
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 

Apache Cassandra Opinion and Fact

  • 1. Oh why is my cluster’s read performance terrible?
  • 2. What it is. • In one sentence: – A data store that resembles: • A Hash table (k/v) that’s evenly distributed across a cluster of servers. Practically speaking, not n level k/v. • Or, An excel spread sheet that is chopped up and housed on different servers. • Basic Nomenclature..
  • 3. Where did it come from? • Legacy: – Google Big Table and Amazon Dynamo. – Open sourced by Facebook in 2009. – They had designed it for ‘Inbox search’
  • 4. What it is not. • Not a general purpose data store. – Highly specialized use cases. • The Business use case must align with Cassandra architecture. – No transactions – No joins. (make multiple roundtrips). De-normalize data. – No stored procedures. – No range queries across keys (default) – No Referential Integrity (pk constraint, foreign keys). – No locking. – Uses timestamp to upsert data. – Charlie must be aghast.
  • 5. Who uses it? • Web scale companies. – Netflix, Twitter. • Capture clickstream data. • User activity/gaming. • Backing store for search tools (lucene) – Structured/Unstructured data. • Trend: Web scale companies moving from distributed Mysql to Cassandra.
  • 8. Where do people use it? • Mostly in analytic/reporting ‘stacks’. – Fire hose (value proposition 1) in vast amounts of ‘log like’ data. – Hopefully your data model ensures that your data is physically clustered (value proposition 2) on read. - Data that is physically clustered is conducive to reporting. - Can be used ‘real time’ but not its strength.
  • 9. Important to know right up front. • Designed for High Write rates (all activity is sequential io). – If improperly used, read performance will suffer. – Always strive to minimize disk seek on read. • Millions of Inserts should result in 10’s of thousands of Row Keys. (not millions of keys) • Main usage pattern: High Write / Low Read (rates). See Netflix slide. • Anti-pattern: (oltp like) Millions of inserts / Millions of reads. (for main data tables) • If your Cassandra use is kosher, then you will find that IO is the bottleneck. Need better performance? Simply add more boxes (more io bandwidth to your cluster) • It’s all about physically clustering your data for efficient reads. • You have a query in mind, well, design a Cassandra table that satisfies your query. (lots of data duplication all over the place). Make sure that your query is satisfied by navigating to a single row key. • Favors throughput over latency.
  • 10. • With analytics/reporting in mind: – Let’s explore RDMS storage inefficiency (For large query) and Cassandra’s value proposition # 2.
  • 11. Data in an RDBMS (physical) Block size 8k Symbol Price Time Select * … db block 1 MSFT 28.01 t1 1 k row size Where … Symbol=MSFT … MSFT 28.03 t5 Minimum IO = 24K (8k x blocks visited) 3 seeks. db block 20 … Slow. … MSFT 28.03 t7 … db block 1000 … … MSFT 28.01 t22 …
  • 12. Data in a Cassandra (physical) Select * Where KEY Col/Value Symbol=MSFT MSFT t1 => 28.03 t5 => 28.03 t7=>28.03 t22=>28.01 Minimum IO = 8K (8k x 1) - 1 seek to KEY (+ overhead), then sequentially read the data. - You want to make sure that you are getting a lot of value per seek! Cassandra likes “Wide Rows”. - Your data is physically clustered and sorted (t1,t5…). - Millions of inserts have resulted in thousands of keys. (high write/ low read) - Fast
  • 13. What it is, in depth: • Log Structured Data store. (all activity is sequentially written) • Favors Availability and Partition tolerance over Consistency. – Consistency is tunable, but if you tune for high consistency you trade off performance. – Cassandra consistency is not the same as database consistency. It is ReadYourWrites consistency. • Column oriented. • TTL (time to live / expire data ) • Compaction (coalesce data in files)
  • 14. System Properties • Distributed / elastic scalability. (value proposition 3) • Fault Tolerant – Rack aware, Inter/intra datacenter data replication. (value proposition 4) • Peer to peer, no single point of failure. (value proposition 5) (write/read from any node, it will act as the proxy to the cluster). No master node. • Durable.
  • 15. Evenly distributes data (default) • Consistent hashing. • Token Range: 0 – 2^127-1 • Your ‘key’ gets Assigned a token. Eg. Key = smith = token 15, place it on the Eastern Node.
  • 16. Replication Factor = 3 • Consistency Level • Hinted Handoff
  • 19. ACID? • A/I/D ( in bits and bobs) • BASE. Basically Available Soft-state Eventual consistency
  • 20. Cassandra/Future • Will slowly take on more rdbms like features. – Cassandra 1.1 has row level isolation. Previously you could read some one else’s inflight data.
  • 21. Reference: CAP Theorem. • Consistency (all nodes see the same data at the same time) • Availability (a guarantee that every request receives a response about whether it was successful or failed) • Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)