4. what is key value storage?
global name space
key, value with metadata
http accessible
sites on demand
unlimited scaling
Global namespace == think gmail account... separates you from
others and must be unique
Inter-site copying
Distribution publishing
5. key value storage concepts
GLOBAL NAME SPACE
KEY/ VALUE
azure scopes container names to accounts
nirvanix scopes based on application and then subaccount
6. THE VENDORS
Rackspace: cloud files Amazon web services: S3 Windows Azure: Blob Service Nirvaniz: Storage
Delivery Network
Mezeo: Cloud Storage Platform
limelight customers: Disney, MSNBC, NetFlix, Microsoft Xbox, and Amazon Video on Demand
azure containers can have metadata, rackspace doesn’t support acls
8. • FILE SIZE
• RESUMABILITY
nirvanix max 256G; no support for range
mezeo; max is limited to filesystem (exabytes)
azure; max is account size (50GB)
9. CONTENT DELIVERY NETWORK
REPLICATION
SLA
nirvanix 5 nodes under one namespace 3 locations; west socal,
central, dallas, jersey, frank, tokyo
cloudlayer/cloudfiles single location both with CDN 21 location
unsure azure/s3 eu/us write ( multiple namespace) 7 node CDN
cloudfront
nirvanix - 99.9 on single 99.99 99.999 <- first sla
10. Consistency Model
How long can the gaps be between updates and re-reads? How does
CDN affect this?
Nirvanix, azure, cloudlayer, rackspace <- immediate for local - delay
on remote
S3 eventual consistency
11. AUTHORIZATION
Can you setup policies for who can access files. delegated billing?
nirvanix storage and usage limits per subaccount, fine grained access within
application
12. api complexity
nirvanix - rest and soap - cloudnas nfs and cifs - physical cache on
cloudnas
mezeo - webdav and rest
cloudfiles - rest
s3 and azure - rest and soap
13. CODE AND SIGN THE HTTP REQUEST
PUT /adriansmovies/sushi.avi HTTP/1.1
PUT /sushi.avi HTTP/1.1 Host: <account>.blob.core.windows.net
Host: adriansmovies.s3.amazonaws.com Content-Length: 734859264
Content-Length: 734859264 Date: Wed, 01 Mar 2006 12:00:00 GMT
Date: Wed, 01 Mar 2006 12:00:00 GMT Authorization: SharedKey <app>:signature
Authorization: signature x-ms-meta-Chef: Kawasaki
x-amz-meta-Chef: Kawasaki
POST /namespace/adriansmovies/sushi.avi HTTP/1.1
PUT /<api version>/<account>/
Content-Length: 734859264
adriansmovies/sushi.avi HTTP/1.1
Date: Wed, 01 Mar 2006 12:00:00 GMT
Host: storage.clouddrive.com
x-emc-uid: <uid>
Transfer-Encoding: chunked
x-emc-signature: signature
X-Auth-Token: session-token
x-emc-meta: Chef=Kawasaki
X-Object-Meta-Chef: Kawasaki
PUT for overwrites
14. CODE AND SIGN THE HTTP REQUEST
GET /ws/IMFS/GetStorageNodeExtended.ashx?&fileOverwrite=true&ipRestricted=true&destFolderPath= adriansmovies&sizeBytes=
734859264&firstByteExpiration=6000&lastByteExpiration=259200&sessionToken=session-token HTTP/1.1
POST /Upload.ashx?uploadToken=from_above&destFolderPath=adriansmovies HTTP/1.1
Host: from_above
Content-Length: 734859382
Content-Type=multipart/form-data; boundary=--jclouds--
Authorization=Basic GpjbG9=
----jclouds--
Content-Disposition: form-data; name="sushi.avi"; filename="sushi.avi"
Content-Type: application/octetstring
...
PUT /ws/Metadata/SetMetadata.ashx?&path=Folders/adriansmovies/sushi.avi&sessionToken=session-token&metadata=Chef:Kawasaki HTTP/1.1
overwrite optional
15. CODE AND SIGN THE HTTP REQUEST
POST /<api version>/containers/id_of_ adriansmovies/contents HTTP/1.1
Content-Length: 734859382
Content-Type=multipart/form-data; boundary=--jclouds--
Authorization=Basic GpjbG9=
----jclouds--
Content-Disposition: form-data; name="sushi.avi"; filename="sushi.avi"
Content-Type: application/octetstring
...
PUT /<api version>/files/from_above/metadata/Chef HTTP/1.1
Content-Length: 8
Content-Type: text/plain
Authorization: Basic GpjbG9=
Kawasaki
118 bytes overhead for mezeo form
16. do you want to
• Deal with Errors
• Deal with Concurrency
• Deal with Cloud Complexity
Deal with Errors: How about retries? XML implies encoding problems...
Deal with Concurrency: Google app engine factor.
Deal with Cloud Complexity: Server outages/errors, Upgrades to Cloud APIs, Eventual Consistency, Changing endpoints,
Changing signing methods, Chunking large values
17. jclouds
open source
feels like java (and clojure)
portability between clouds
deal with web complexity
unit testability
thread-safe and scalable
works in google app engine
high performance
thread safe
enterprise ready
22. commons vfs
vfs > open blobstore://user:key@cloudfiles/mycontainer
Opened blobstore://cloudfiles/mycontainer/
Current folder is blobstore://cloudfiles/mycontainer/
vfs > ls
Contents of blobstore://cloudfiles/mycontainer/
README.txt
0 Folder(s), 1 File(s)
vfs > close
25. The Good
provisioning (and re-provisioning) is cheap
APIs = automation
tools exist
.10c/hr for a virtual machine, more for physical, billed for what you
use
APIs are ways developers can use and automate these services
most cloud vendors offer SDKs to their services
26. The Bad
forgetting to turn things off
licensing
erratic service quality
licensing is a challenge from cost control, and also products you use
may not be cloud friendly
sometimes your VMs can disappear, and performance isn’t always
consistent
27. The Ugly
cloud apis are sometimes unreliable
apis are very different across clouds
features are very different across clouds
accidental complexity
cloud apis - can throw errors or have frequent upgrades
features - build or buy load balancers, volumes
complexity - needing to manually replicate images across regions,
image conversion, polling
28. Things to consider when provisioning
Can you create an image?
Can you push credentials or files?
Do you need to VPN in?
How is storage provisioned?
How close are your dependencies?
30. jclouds github jclouds/jclouds
service = new ComputeServiceContextFactory().createContext(
“rimuhosting”, user, password ).getComputeService();
template = service.templateBuilder().any().biggest().build();
template.getOptions().installPrivateKey(privateRSA)
.authorizePublicKey(publicRSA)
.runScript(installGemsAndRunChef);
nodes = service.runNodesWithTag(“webserver”, 5, template);
focused on semantic portability across clouds
I want an image running ubuntu and don’t want to know the id
absolute portability where possible, but expose vendor apis where
needed
31. dasein sourceforge dasein-cloud
CloudProvider provider = providerClass.newInstance();
ProviderContext context = new ProviderContext();
context.setAccountNumber(accountNumber);
context.setAccessPublic(apiKeyBytes);
context.setAccessPrivate(privateKeyBytes);
provider.connect(context);
ServerServices services = provider.getServerServices();
server = services.launch(imageId, size,
dataCenterId, serverName,
keypairOrPassword, vlan,
analytics, firewalls);
focused on service portability
under the enstratus platform
richer service support
32. whirr github tomwhite/whirr
ServiceSpec serviceSpec = new ServiceSpec();
serviceSpec.setProvider("gogrid");
serviceSpec.setAccount(account);
serviceSpec.setKey(key);
serviceSpec.setSecretKeyFile(secretKeyFile);
serviceSpec.setClusterName(clusterName);
service = new HadoopService(serviceSpec);
ClusterSpec clusterSpec = new ClusterSpec(
new InstanceTemplate(1, HadoopService.MASTER_ROLE),
new InstanceTemplate(1, HadoopService.WORKER_ROLE));
cluster = service.launchCluster(clusterSpec);
proxy = new HadoopProxy(serviceSpec, cluster);
proxy.start();
multi-language service management
now has zookeeper and hadoop support
36. crane github clj-sys/crane
(def hadoop-config (conf "/path/to/conf.clj"))
(def compute (ec2 (creds "/path/to/creds.clj")))
(launch-hadoop-cluster compute hadoop-config)
focus on aws and hadoop, but in the future will focus on clusters and
multi-cloud
37. pallet github hugoduncan/pallet
(defnode webserver []
:bootstrap [(public-dns-if-no-nameserver)
(automated-admin-user)]
:configure [(chef)])
(with-compute-service [service]
(converge {webserver 3})
(cook webserver "/cookbooks/apache-chef-demo"))
serverless and agentless = uses ssh
removes bootstrap complexity, like sudo, gems, etc.
kick off processes like chef, or provision stacks such as hudson