2. What is Cloudian?
Cloudian =
S3 Cloud Storage
as Packaged Software
2
#cassandra12 (c) Copyright , Cloudian Inc. & KK, 2012, All rights reserved.
3. Cloudian Features
1. Full Amazon S3 API Compatibility, including error codes
2. Multi-datacenter, peer-to-peer architecture. No single point of failure.
3. Multi-tenant: QoS controls, billing, reporting by each User and each Group
4. Public and Private Clouds.
5. Elastic Capacity: small start and scale-out as needed
6. System, Group, and User management by Management Console or REST API
7. Easy to Use Packaged Software, backed by 24x7 carrier grade support.
3
#cassandra12 (c) Copyright, Cloudian Inc. & KK, 2012, All rights reserved.
4. Cloudian Objectives
1. S3 API full compatibility
• Use S3 ecosystem applications “as is”.
• API already designed.
1. Fully packaged software
• Easy to deploy on existing
• Hide NoSQL complexity
hardware/network.
• Easy install/upgrade
• Flexible for different customer types.
• HyperStore: Best fit store
• Scalable. Start small and grow.
1. Complete service platform
• User/Group Provisioning • Turnkey system.
• Cluster Management • Can choose integration points
• Reporting with existing systems.
• Billing
4
#cassandra12 (c) Copyright, Cloudian Inc. & KK, 2012, All rights reserved.
5. Object vs. File vs. Block Storage
Application
Level
HTTP
OBJECTS
OS User
Level
NAS (NFS, CIFS)
FILES
OS Kernel
Level
SAN (iSCSI)
BLOCKS
Abstraction
Level
#cassandra12 (c) Copyright , Cloudian Inc. & KK, 2012, All rights reserved. Page 5
7. S3 Functions
• HTTP REST API. PUT, POST, GET, DELETE, HEAD.
• Objects organized into buckets.
• Security. Requests authenticated using keyed HMAC with symmetric keys.
Also, HTTPS option, client-side encryption, server-side encryption.
• Access control lists (ACLs) define access rights to bucket and object.
• Accounting of bytes inbound, outbound, stored and HTTP request counts.
Billing by tiered rating plans per accounting type, per-region.
• Multi-part uploads. Allows uploading large objects in multiple parts.
• Versioning. Multiple versions of same object.
• Location constraint. Buckets can be assigned to a specific region. Each
region has own domain.
• …
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved. Page 7
8. Works with leading Cloud Compute Platforms
Cloudian-Citrix CloudStack
(May 9, 2012)
Cloudian-OpenStack
(October 21, 2011)
#cassandra12 Page 8 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
9. Cloudian Customers
Public
Channel Partners:
Hybrid
Private
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved. Page 9
10. Why Cassandra?
Why Cassandra?
Scalable
• Add capacity by adding nodes to running system.
• Distributed (P2P architecture), no single point of failure
Reliable
• Resilient to network or hardware failures.
• Multi-datacenter replication
• Tuneable data consistency level.
Features
• TTL, secondary indexes, counters, compression,
encryption, …
Fast
• Write path especially fast.
10
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
11. Cassandra in Cloudian
• v1.0.7 in use (started at 0.7.x)
• Forked to add customizations
• Hector client
• Data stored includes:
• Object metadata
• Reports/logs
• Counters for rate control
•…
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved. Page 11
12. Cloudian: Logical Architecture
HTTPS
Login
Admin Credentials
Server DB
Account profile / HTTPS Servlets HTTP
Security keys Servlets
HTTP S3 Server UserData DB
Reports
(Cassandra)
Management
Console
Data Explorer HTTP AccountInfo &
QoS DB
(Cassandra)
WEB UI
HTTP or Data
HTTPS Servers Reports DB
(S3) (Cassandra)
Applications
12
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
13. Minimum Redundant Configuration
Servlets
Credentials
DB
Browser HTTPS Sticky
requests sessions
for UI HTTP/S
Cassandra
Server
LB
Application HTTP/HTTPS
requests
for S3 Servlets
Credentials
DB
HTTP/S
Cassandra
Server
13
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
14. Multi-Datacenter Example
2 datacenters / 4 nodes per datacenter
CMC Redis (S) CMC Redis (S) CMC Redis (S) CMC Redis (S)
S3/Admin S3/Admin S3/Admin S3/Admin
Cassandra Cassandra Cassandra Cassandra
/HyperStore /HyperStore /HyperStore /HyperStore
CMC Redis (M) CMC Redis (S) CMC Redis (S) CMC Redis (S)
S3/Admin S3/Admin S3/Admin S3/Admin
Cassandra Cassandra Cassandra Cassandra
/HyperStore /HyperStore /HyperStore /HyperStore
DC1 DC2
Storage objects, reports, profiles replicated across DCs by
Cassandra.
Credentials DB (Redis) has local DC slave and single global master.
14
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
15. Network Scaling Example
DC 1-2 DC 3-2
DC 1-1 DC 3-1 Region 3
Region 1
DC 2-1
Region 2
15
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
16. Cassandra for Object Store
Dynamically decide how to store each object
(Cassandra or file system).
Cassandra better for small objects.
Large objects split into multiple parts and chunks.
Column
Random Row Name
Partitioner key
Value
Column Family
16
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
17. Cassandra for Object Metadata
Size, Etag, MD5, timestamp, ACL, part info, version, etc.
Old versions of metadata format supported.
Column Column Column
Name Name Name
Random Row
Partitioner Key Value Value … Value
Column Family Sorted by Column Name
17
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
18. Cassandra for Account Info
DATA MODEL
User
- ID, name, contact info, etc.
Group
- ID, name, contact info, etc.
Rating Plan
Security Credentials
QoS Counters
NOTES
“Static” data. Fixed number of columns.
Could be put in a Relational DB like MySQL, but no need to add another
component.
18
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
19. Quality of Service / SLA Management
• Configurable maximum limits per-
region at per-user, per-group,
system level.
• Requests/minute
• Storage bytes
• Storage objects
• Data Bytes Inbound
• Data Bytes Outbound
• While limit is reached, requests are
rejected.
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved. Page 19
20. Cassandra for Reports
DATA MODEL
“Raw” column family
- User, Group, System
- Transaction type (HTTP GET, PUT, DELETE) …
- Object path
- Size
- …
“Rollup” column families.
- RollupHour. Summarizes data for each hour using Raw data.
- RollupDay. Summarizes data for each day using RollupHour data.
- RollupMonth. Summarizes data for each month using RollupDay data.
NOTES
High write rate. Low read rate.
Rollup tables used for direct queries.
Automatic deletion using Cassandra TTL (time-to-live).
20
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
21. Cassandra: Wish List
1. Repair
• Slow, impact on performance, difficult to monitor progress, manual
operator action required.
2. Compaction
• Heavy performance impact. Hard to tune. Capacity planning difficult.
3. Schema changes
• Fixed in 1.1.
4. Large column slices.
5. Caches (row and key) not useful. Slower performance, large
memory use.
6. JMX too slow. Need to directly use and expose Java interfaces.
21
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
22. HyperStore™
HyperStore: Management policies tailored Cloudian S3 Storage Server
for different object types.
Object metadata is still stored in Admin
NFS
Cassandra Credentials
Use Cassandra’s distributed systems
methods for data partitioning, replication, S3 REST Reporting
node health detection. API (Cassandra)
Fork Cassandra source for customizations.
HyperStore Accounting
Manager (Cassandra)
Benefits:
Better performance
More capacity per node Data Store Data Store
Higher disk utilization (Cassandra) (File System)
Storage layer flexibility
22
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
23. HyperStore: Hybrid Storage Example
Storage 1
Storage 2
optimal
U
X
Optimal solution is to choose the storage method that minimizes latency.
Generally, you want to maximize/minimize U, a performance metric, based
on random variables X using a mixture of N storage layers.
In a simple case,
U : average latency
X = {object size}
N = {cassandra, ext4 fs}.
23
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
25. HyperStore: Less Compaction
No HyperStore With HyperStore
PUT GET LIST DELETE PUT GET LIST DELETE
Operations 50478 1679 3642 422 Operations 50559 9195 3575 2224
Latency (msec) 149.78 314.80 41.60 34.50 Latency (msec) 96.64 35.63 28.14 23.93
iostat % utilization iostat % utilization
io read/write (MB) io read/write (MB)
#cassandra12 20 tps, 10 threads, 2MB data
Strictly Confidential
25
26. Finally
Cassandra and other enabling technologies has allowed
“leveling the playing field” for cloud storage
providers.
Info: www.cloudian.com
Download trial version.
Coming soon:
#1 best seller in “Database” category on amazon.co.jp.
#cassandra12 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved. Page 26
Hinweis der Redaktion
Add symbols for different files from motohashi ppt.