SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
YCSB
Yahoo! Cloud Serving Benchmark
Scalable Distributed Systems
Antonio L. Severien
antonio.severien@gmail.com
João Rosa
Joao.rui.rosa@gmail.com
Overview
• Distributed Databases
• Cassandra
• HBase
• YCSB General View
• YCSB Details
• Amazon EC2
• YCSB Results
• YCSB Future
• Conclusions
• References
Distributed Databases
Traditional RDBMS
• ACID transactions
• Query language (SQL)
• Data tied to the modeling (hard to analyze)
• Scalable to a limit
Distributed Databases
• Not ACID
• Not Relational
• Column oriented (key-value)
• CAP (Consistency, Availability, Partitioning)
• Big Data (Massively scalable)
Distributed Databases
• Sherpa/PNUTS
• BigTable
• HBase, Hypertable, HTable
• Megastore
• Azure
• Cassandra
• Amazon Web Services
• S3, SimpleDB, EBS • CouchDB
• Voldemort
• Dynomite
• Tokyo
• Redis
• MongoDB
Distributed Databases
• NoSQL Databases have different designs and architecture
Cassandra
Thrift
Gossip
Token ring
…
Hbase
HDFS
Zookeeper
Hadoop (MapReduce)
BigTable
GFS
Chubby (Lock Service)
MapReduce
Cassandra
• Highlights
• High availability
• Incremental scalability
• Eventually consistent
• Tradeoffs between consistency and latency
• Minimal administration
• No SPF (Single Point of Failure)
Cassandra
• CAP-aware
• Cassandra values Availability and Partitioning tolerance (AP)
 eventually consistent
• Providing strong Consistency in Cassandra increases latency
• Partitioning
• Token oriented
• Explicit Replication
• Replication factor ≤ Total nodes
• High level clients
• Python, Java, C#, .NET, Scala, Ruby, PHP, Erlang, Haskell…etc
• Thrift  driver-level interface
Cassandra
• Data Model
• Cluster:
• Machines (nodes) in a logical
Cassandra instance
• can contain multiple keyspaces
• Keyspace:
• name for ColumnFamilies
• ColumnFamilies:
• contain multiple columns each with name, value and timestamp
referenced by row keys.
• Analogous to table on RDBMS
• SuperColumns:
• columns with subcolumns
• Rows
• Columns
keyA Column1 Column2 Column3
keyB Column5 Column6 column10
Column
Byte[] Name
Byte[] Value
I64 Timestamp
Cassandra
Partitioning Replication
HBase
“HBase is more a datastore than a database”
• It lacks many of the features of RDBMS
• Distributed and scalable big data store.
• Regions model
• Strong consistency
HBase
Built on top of Hadoop Distributed Filesystem (HDFS)
HBase
• The NameNode is
responsible for maintaining
the filesystem metadata.
• The DataNodes are
responsible for storing HDFS
blocks.
HBase
• The NameNode is
responsible for maintaining
the filesystem metadata.
• The DataNodes are
responsible for storing HDFS
blocks.
Note: In our study case, we only
had interest on HDFS layer.
HBase
HBase
DatanodesNamenode
HBase
• Data is stored into HBase tables.
• Tables are made of rows and columns.
• All columns belong to a particular column family.
Important note: All column family members are stored together.
• A query on a
column family
model has a better
performance
YCSB General View
• Which is the best NoSQL DB?
• How to compare?
• Yahoo! Cloud Serving Benchmark (YCSB)
• Benchmarking tool
• Evaluate key-value and cloud DBs performance on a common set
of workloads
• Client – an extensible workload generator
• Yahoo! Research
• Brian F. Cooper - cooperb@yahoo-inc.com
• Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan
and Russell Sear
YCSB Details
• How it works?
YCSB Client
DBInterface
Layer
Client
Threads
Statistics
Workload
Executor
Cloud
Serving
Store
Workload file
• Read/write mix
• Record size
• Popularity distribution
• …
Command line
• DB to use
• Workload to use
• Target throughput
• Number of threads
• …
YCSB Details
Benchmark Tiers
• Performance
• Measure latency/throughput curve
• Increase throughput until saturation
• Scalability
• Scale up: increase hardware, data size and throughput
proportionally
• Elastic speedup: add servers while running a workload
YCSB Details
Load phase
- Load the database
$ ycsb load cassandra-10
–p hosts=127.0.0.1 –P workloadX
Transactions phase
- Executes the workload
$ ycsb run cassandra-10
–p hosts=127.0.0.1 –P workloadX
Random Load Distribution
YCSB Details
• # Yahoo! Cloud System Benchmark
• # Workload A: Update heavy workload
• # Application example: Session store recording recent actions
• #
• # Read/update ratio: 50/50
• # Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
• # Request distribution: zipfian
• recordcount=1000
• operationcount=1000
• workload=com.yahoo.ycsb.workloads.CoreWorkload
• readallfields=true
• readproportion=0.5
• updateproportion=0.5
• scanproportion=0
• insertproportion=0
• requestdistribution=zipfian
YCSB Details
• Execution parameters
• $ ./bin/ycsb run cassandra-10 –P workloads/workloada –s –threads 10 –target 100
> transactions.dat
[OVERALL],RunTime(ms), 10110
[OVERALL],Throughput(ops/sec), 98.91196834817013
[UPDATE], Operations, 491
[UPDATE], AverageLatency(ms), 0.054989816700611
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 1
[UPDATE], 95thPercentileLatency(ms), 1
[UPDATE], 99thPercentileLatency(ms), 1
[UPDATE], Return=0, 491
[UPDATE], 0, 464
[UPDATE], 1, 27
[UPDATE], 2, 0
[UPDATE], 3, 0
[UPDATE], 4, 0
...
YCSB Details
• $ ./bin/ycsb run basic -P workloads/workloada -P large.dat -s -threads 10 -
target 100 –p measurementtype=timeseries -p timeseries.granularity=2000 >
transactions.dat
[OVERALL],RunTime(ms), 10077
[OVERALL],Throughput(ops/sec), 9923.58836955443
[UPDATE], Operations, 50396
[UPDATE], AverageLatency(ms), 0.04339630129375347
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 338
[UPDATE], Return=0, 50396
[UPDATE], 0, 0.10264765784114054
[UPDATE], 2000, 0.026989343690867442
[UPDATE], 4000, 0.0352882703777336
[UPDATE], 6000, 0.004238958990536277
[UPDATE], 8000, 0.052813085033008175
[UPDATE], 10000, 0.0
[READ], Operations, 49604
[READ], AverageLatency(ms), 0.038242883638416256
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 230
[READ], Return=0, 49604
[READ], 0, 0.08997245741099663
[READ], 2000, 0.02207505518763797
[READ], 4000, 0.03188493260913297
[READ], 6000, 0.004869141813755326
[READ], 8000, 0.04355329949238579
[READ], 10000, 0.005405405405405406
YCSB Details
Status Output
Amazon EC2 Configuration
Large Instance
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large
Experiment Set-up
Cassandra Cluster
3 nodes + 1 node (Elasticity)
Hbase Cluster
3 nodes
Amazon EC2 Usage
Cassandra
Load phase: 60,000,000 records of 1Kb
Amazon EC2 Usage
HBase
Load phase: 60,000,000 records of 1Kb
Amazon EC2 Usage
Load phase: 60,000,000 records of 1Kb
Cassandra
HBase
Amazon EC2 Usage
Load phase: 60,000,000 records of 1KbCassandra HBase
Amazon EC2 Usage
Transaction phase:
- 10,000 records
- 1,000,000 operations
- 250 threads
Cassandra
YCSB Cassandra Results
Update Heavy Workload
(50/50)
0
10
20
30
40
50
60
0 1,000 2,000 3,000 4,000 5,000 6,000
AverageLatency(ms)
Throughput (ops/sec)
Update
0
10
20
30
40
50
60
0 1,000 2,000 3,000 4,000 5,000 6,000
AverageLatency(ms)
Throughput (ops/sec)
Read
YCSB HBase Results
0.00
0.05
0.10
0.15
0.20
0.25
0.30
471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15
AverageLatency(ms)
Throughput (ops/sec)
Update Hbase 0.90.5
0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15
AverageLatency(ms)
Throughput (ops/sec)
Read HBase 0.90.5
YCSB Cassandra Results
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
0 50000 100000 150000 200000 250000 300000 350000 400000
Latency(ms)
Time miliseconds
Elasticity Cassandra 1.0
YCSB Cassandra Results
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
0 50000 100000 150000 200000 250000 300000 350000 400000
Latency(ms)
Time miliseconds
Elasticity Cassandra 1.0
YCSB Future
Provide statistics for:
- Availability
- Replication
Additional Distributed Databases
Currently supported:
Cassandra Mapkeeper
MongoDB Redis
Voldemort Vmware vFabric Gemfire
Hbase
Conclusions
• YCSB provides a common ground for benchmarking cloud DB
services
• Good for leaning and experimenting with different distributed
databases
• Open source, extensible for new databases
• Laboratory with Amazon EC2 provided good insight into setting
up cloud services
• Challenges
• Installation problems
• Hard to follow documentation
• Working on distributed environment require lots of configuration
References
• YCSB (Yahoo! Cloud Serving Benchmark)
• https://github.com/brianfrankcooper/YCSB/wiki
• Yahoo! Research
• http://research.yahoo.com/Web_Information_Management/YCSB
• BigTable
• http://en.wikipedia.org/wiki/BigTable
• Cassandra
• http://wiki.apache.org/cassandra/
• HBase
• http://hbase.apache.org/
Questions

Weitere ähnliche Inhalte

Was ist angesagt?

HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBaseHBaseCon
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path HBaseCon
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 

Was ist angesagt? (20)

HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 

Andere mochten auch

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBlehresman
 
Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarkingSqrrl
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Sematext Group, Inc.
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
STAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureSTAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureGord Sissons
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyDataStax Academy
 
Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Alexey Rusnak
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Yahoo Cloud Serving Benchmark
Yahoo Cloud Serving BenchmarkYahoo Cloud Serving Benchmark
Yahoo Cloud Serving Benchmarkkevin han
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupCarlos Juzarte Rolo
 
Преимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBПреимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBUNETA
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
VENU_Hadoop_Resume
VENU_Hadoop_ResumeVENU_Hadoop_Resume
VENU_Hadoop_ResumeVenu Gopal
 

Andere mochten auch (20)

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDB
 
Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarking
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
STAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureSTAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructure
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
 
Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Yahoo Cloud Serving Benchmark
Yahoo Cloud Serving BenchmarkYahoo Cloud Serving Benchmark
Yahoo Cloud Serving Benchmark
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User Group
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Преимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBПреимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDB
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
VENU_Hadoop_Resume
VENU_Hadoop_ResumeVENU_Hadoop_Resume
VENU_Hadoop_Resume
 

Ähnlich wie NoSQL: Cassadra vs. HBase

Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPTony Rogerson
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Databases in the hosted cloud
Databases in the hosted cloud Databases in the hosted cloud
Databases in the hosted cloud Colin Charles
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesAmazon Web Services
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWSAmazon Web Services
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)Amazon Web Services Korea
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinAmazon Web Services
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinIan Massingham
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 

Ähnlich wie NoSQL: Cassadra vs. HBase (20)

Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Databases in the hosted cloud
Databases in the hosted cloud Databases in the hosted cloud
Databases in the hosted cloud
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
Drop acid
Drop acidDrop acid
Drop acid
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best Practices
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 

Mehr von Antonio Severien

Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
On Pragmatism and Scientific Freedom
On Pragmatism and Scientific FreedomOn Pragmatism and Scientific Freedom
On Pragmatism and Scientific FreedomAntonio Severien
 
Community cloud antonioseverien
Community cloud antonioseverienCommunity cloud antonioseverien
Community cloud antonioseverienAntonio Severien
 

Mehr von Antonio Severien (6)

Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
On Pragmatism and Scientific Freedom
On Pragmatism and Scientific FreedomOn Pragmatism and Scientific Freedom
On Pragmatism and Scientific Freedom
 
Community cloud antonioseverien
Community cloud antonioseverienCommunity cloud antonioseverien
Community cloud antonioseverien
 
Relational Cloud
Relational CloudRelational Cloud
Relational Cloud
 
Soap vs rest
Soap vs restSoap vs rest
Soap vs rest
 

Kürzlich hochgeladen

Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 

Kürzlich hochgeladen (20)

Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 

NoSQL: Cassadra vs. HBase

  • 1. YCSB Yahoo! Cloud Serving Benchmark Scalable Distributed Systems Antonio L. Severien antonio.severien@gmail.com João Rosa Joao.rui.rosa@gmail.com
  • 2. Overview • Distributed Databases • Cassandra • HBase • YCSB General View • YCSB Details • Amazon EC2 • YCSB Results • YCSB Future • Conclusions • References
  • 3. Distributed Databases Traditional RDBMS • ACID transactions • Query language (SQL) • Data tied to the modeling (hard to analyze) • Scalable to a limit Distributed Databases • Not ACID • Not Relational • Column oriented (key-value) • CAP (Consistency, Availability, Partitioning) • Big Data (Massively scalable)
  • 4. Distributed Databases • Sherpa/PNUTS • BigTable • HBase, Hypertable, HTable • Megastore • Azure • Cassandra • Amazon Web Services • S3, SimpleDB, EBS • CouchDB • Voldemort • Dynomite • Tokyo • Redis • MongoDB
  • 5. Distributed Databases • NoSQL Databases have different designs and architecture Cassandra Thrift Gossip Token ring … Hbase HDFS Zookeeper Hadoop (MapReduce) BigTable GFS Chubby (Lock Service) MapReduce
  • 6. Cassandra • Highlights • High availability • Incremental scalability • Eventually consistent • Tradeoffs between consistency and latency • Minimal administration • No SPF (Single Point of Failure)
  • 7. Cassandra • CAP-aware • Cassandra values Availability and Partitioning tolerance (AP)  eventually consistent • Providing strong Consistency in Cassandra increases latency • Partitioning • Token oriented • Explicit Replication • Replication factor ≤ Total nodes • High level clients • Python, Java, C#, .NET, Scala, Ruby, PHP, Erlang, Haskell…etc • Thrift  driver-level interface
  • 8. Cassandra • Data Model • Cluster: • Machines (nodes) in a logical Cassandra instance • can contain multiple keyspaces • Keyspace: • name for ColumnFamilies • ColumnFamilies: • contain multiple columns each with name, value and timestamp referenced by row keys. • Analogous to table on RDBMS • SuperColumns: • columns with subcolumns • Rows • Columns keyA Column1 Column2 Column3 keyB Column5 Column6 column10 Column Byte[] Name Byte[] Value I64 Timestamp
  • 10. HBase “HBase is more a datastore than a database” • It lacks many of the features of RDBMS • Distributed and scalable big data store. • Regions model • Strong consistency
  • 11. HBase Built on top of Hadoop Distributed Filesystem (HDFS)
  • 12. HBase • The NameNode is responsible for maintaining the filesystem metadata. • The DataNodes are responsible for storing HDFS blocks.
  • 13. HBase • The NameNode is responsible for maintaining the filesystem metadata. • The DataNodes are responsible for storing HDFS blocks. Note: In our study case, we only had interest on HDFS layer.
  • 14. HBase
  • 16. HBase • Data is stored into HBase tables. • Tables are made of rows and columns. • All columns belong to a particular column family. Important note: All column family members are stored together. • A query on a column family model has a better performance
  • 17. YCSB General View • Which is the best NoSQL DB? • How to compare? • Yahoo! Cloud Serving Benchmark (YCSB) • Benchmarking tool • Evaluate key-value and cloud DBs performance on a common set of workloads • Client – an extensible workload generator • Yahoo! Research • Brian F. Cooper - cooperb@yahoo-inc.com • Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sear
  • 18. YCSB Details • How it works? YCSB Client DBInterface Layer Client Threads Statistics Workload Executor Cloud Serving Store Workload file • Read/write mix • Record size • Popularity distribution • … Command line • DB to use • Workload to use • Target throughput • Number of threads • …
  • 19. YCSB Details Benchmark Tiers • Performance • Measure latency/throughput curve • Increase throughput until saturation • Scalability • Scale up: increase hardware, data size and throughput proportionally • Elastic speedup: add servers while running a workload
  • 20. YCSB Details Load phase - Load the database $ ycsb load cassandra-10 –p hosts=127.0.0.1 –P workloadX Transactions phase - Executes the workload $ ycsb run cassandra-10 –p hosts=127.0.0.1 –P workloadX Random Load Distribution
  • 21. YCSB Details • # Yahoo! Cloud System Benchmark • # Workload A: Update heavy workload • # Application example: Session store recording recent actions • # • # Read/update ratio: 50/50 • # Default data size: 1 KB records (10 fields, 100 bytes each, plus key) • # Request distribution: zipfian • recordcount=1000 • operationcount=1000 • workload=com.yahoo.ycsb.workloads.CoreWorkload • readallfields=true • readproportion=0.5 • updateproportion=0.5 • scanproportion=0 • insertproportion=0 • requestdistribution=zipfian
  • 22. YCSB Details • Execution parameters • $ ./bin/ycsb run cassandra-10 –P workloads/workloada –s –threads 10 –target 100 > transactions.dat [OVERALL],RunTime(ms), 10110 [OVERALL],Throughput(ops/sec), 98.91196834817013 [UPDATE], Operations, 491 [UPDATE], AverageLatency(ms), 0.054989816700611 [UPDATE], MinLatency(ms), 0 [UPDATE], MaxLatency(ms), 1 [UPDATE], 95thPercentileLatency(ms), 1 [UPDATE], 99thPercentileLatency(ms), 1 [UPDATE], Return=0, 491 [UPDATE], 0, 464 [UPDATE], 1, 27 [UPDATE], 2, 0 [UPDATE], 3, 0 [UPDATE], 4, 0 ...
  • 23. YCSB Details • $ ./bin/ycsb run basic -P workloads/workloada -P large.dat -s -threads 10 - target 100 –p measurementtype=timeseries -p timeseries.granularity=2000 > transactions.dat [OVERALL],RunTime(ms), 10077 [OVERALL],Throughput(ops/sec), 9923.58836955443 [UPDATE], Operations, 50396 [UPDATE], AverageLatency(ms), 0.04339630129375347 [UPDATE], MinLatency(ms), 0 [UPDATE], MaxLatency(ms), 338 [UPDATE], Return=0, 50396 [UPDATE], 0, 0.10264765784114054 [UPDATE], 2000, 0.026989343690867442 [UPDATE], 4000, 0.0352882703777336 [UPDATE], 6000, 0.004238958990536277 [UPDATE], 8000, 0.052813085033008175 [UPDATE], 10000, 0.0 [READ], Operations, 49604 [READ], AverageLatency(ms), 0.038242883638416256 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 230 [READ], Return=0, 49604 [READ], 0, 0.08997245741099663 [READ], 2000, 0.02207505518763797 [READ], 4000, 0.03188493260913297 [READ], 6000, 0.004869141813755326 [READ], 8000, 0.04355329949238579 [READ], 10000, 0.005405405405405406
  • 25. Amazon EC2 Configuration Large Instance 7.5 GB memory 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) 850 GB instance storage 64-bit platform I/O Performance: High API name: m1.large Experiment Set-up Cassandra Cluster 3 nodes + 1 node (Elasticity) Hbase Cluster 3 nodes
  • 26. Amazon EC2 Usage Cassandra Load phase: 60,000,000 records of 1Kb
  • 27. Amazon EC2 Usage HBase Load phase: 60,000,000 records of 1Kb
  • 28. Amazon EC2 Usage Load phase: 60,000,000 records of 1Kb Cassandra HBase
  • 29. Amazon EC2 Usage Load phase: 60,000,000 records of 1KbCassandra HBase
  • 30. Amazon EC2 Usage Transaction phase: - 10,000 records - 1,000,000 operations - 250 threads Cassandra
  • 31. YCSB Cassandra Results Update Heavy Workload (50/50) 0 10 20 30 40 50 60 0 1,000 2,000 3,000 4,000 5,000 6,000 AverageLatency(ms) Throughput (ops/sec) Update 0 10 20 30 40 50 60 0 1,000 2,000 3,000 4,000 5,000 6,000 AverageLatency(ms) Throughput (ops/sec) Read
  • 32. YCSB HBase Results 0.00 0.05 0.10 0.15 0.20 0.25 0.30 471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15 AverageLatency(ms) Throughput (ops/sec) Update Hbase 0.90.5 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15 AverageLatency(ms) Throughput (ops/sec) Read HBase 0.90.5
  • 33. YCSB Cassandra Results 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 0 50000 100000 150000 200000 250000 300000 350000 400000 Latency(ms) Time miliseconds Elasticity Cassandra 1.0
  • 34. YCSB Cassandra Results 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 0 50000 100000 150000 200000 250000 300000 350000 400000 Latency(ms) Time miliseconds Elasticity Cassandra 1.0
  • 35. YCSB Future Provide statistics for: - Availability - Replication Additional Distributed Databases Currently supported: Cassandra Mapkeeper MongoDB Redis Voldemort Vmware vFabric Gemfire Hbase
  • 36. Conclusions • YCSB provides a common ground for benchmarking cloud DB services • Good for leaning and experimenting with different distributed databases • Open source, extensible for new databases • Laboratory with Amazon EC2 provided good insight into setting up cloud services • Challenges • Installation problems • Hard to follow documentation • Working on distributed environment require lots of configuration
  • 37. References • YCSB (Yahoo! Cloud Serving Benchmark) • https://github.com/brianfrankcooper/YCSB/wiki • Yahoo! Research • http://research.yahoo.com/Web_Information_Management/YCSB • BigTable • http://en.wikipedia.org/wiki/BigTable • Cassandra • http://wiki.apache.org/cassandra/ • HBase • http://hbase.apache.org/