This is the summary materials of "Benchmarking Cloud Serving Systems with YCSB" paper for nosql summer reading in Tokyo on September 15, 2010 at Gemini Mobile Technologies in Shibuya, Tokyo.
Time Series Foundation Models - current state and future directions
Summary of "YCSB " paper for nosql summer reading in Tokyo" on Sep 15, 2010
1. Benchmarking Cloud Serving Systems with YCSBby Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R. Gemini Mobile Technologies, Inc. NOSQL Tokyo Reading Group (http://nosqlsummer.org/city/tokyo) September 15, 2010 Tags: #ycsb #nosql 10.9.11 Gemini Mobile Technologies, Inc. 1
2. Benchmarking Cloud Serving Systems with YCSB Authors: Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R, Sears, R.. Abstract: … We present the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems. We define a core set of benchmarks and report results for four widely used systems: Cassandra, HBase, Yahoo!'s PNUTS, and a simple shardedMySQL implementation. We also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source. In this regard, a key feature of the YCSB framework/tool is that it is extensible---it supports easy definition of new workloads, in addition to making it easy to benchmark new systems. Appeared in: ACM Symposium on Cloud Computing, ACM, Indianapolis, IN, USA (2010) http://research.yahoo.com/files/ycsb.pdf 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 2
7. Package of DB interface layers for Cassandra, HBase, MongoDB.
8. Extensible. Add new workloads. Add new DBs.10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 3
9. 2.1. Cloud Serving System Characteristics Scale-out To add capacity, add servers. Goal is constant performance/node. Elasticity Load is distributed by adding a server to a running system. Temporary performance decrease as data is re-distributed. High Availability System remains available in face of failures. 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 4
10. 2.2 Classifications of Systems and Tradeoffs Read vs. Write Performance Write-optimized. Log-structured systems. Append updates to commit log. Reads may need to merge update information. Latency vs. Durability Disk sync writes. Synchronous vs. Asynchronous Replication Data Partitioning Row-based storage: A row’s data is stored contiguously on disk. Column storage: Different columns can be stored separately. 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 5
11.
12.
13. 4.2 Core Workloads 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 8
14. 5.1 YCSB Client Architecture Workload Executor. Traffic generation for both “load” and “transaction” phases. DB Interface Layer. Custom for each DB. 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 9
15. 5.2 Extensibility YCSB package is open-source Java code. Workload Executor Modify configuration (e.g., operation mix, distribution, data size, etc.) Custom Java class to define workload. DB Interface Layer Implement interface (read,update, insert, delete, scan) for DB. 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 10
16. 6. Results: Setup Tested 4 DBs Cassandra 0.5.0 HBase 0.20.3 PNUTS MySQL 5.1.24 MySQL(sharded) 5.1.32. 6 servers. Dual 65-bit quad-core 2.5 GHz Intel Xeon CPUs, 8GB RAM, 6-disk RAID-10 array, GB ethernet. YCSB Client on a separate 8-core server. Up to 500 threads. Client was not the bottleneck. No replication Data is 120M 1KB records (total size: 120GB). Each server then stored 20GB data. Cassandra, PNUTS, MySQL configured to sync to disk. HBase not sync to disk. Periodic compaction operations. 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 11
17. 6. Results: Read vs. Write Performance Cassandra and HBase had better performance on write-heavy workload. PNUTS and MySQL had better performance on read-heavy workload. 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 12
18. 6. Results: Scalability Vary number of servers from 2 to 12. Data size and request rate varied proportionally. 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 13 HBase is erratic. Cassandra and PNUTS scale well.
19. 6. Results: Elasticity Start with 2 servers with 120GB data. Then add more servers up to 6. Cassandra, HBase, PNUTS were able to grow elastically. HBase does not repartition data until next compaction. PNUTS was best, most stable latency while elastically repartitioning data. 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 14 Go from 5 to 6 servers at 10 minute mark.
20. 7. Future Work Tier 3: Availability Tier 3: Replication 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 15
21. Further Study Main Site: http://research.yahoo.com/Web_Information_Management/YCSB Source Code: http://github.com/brianfrankcooper/YCSB Mailing list: http://tech.groups.yahoo.com/group/ycsb-users/ 10.9.11 Gemini Mobile Technologies, Inc. All rights reserved. 16