2. Object Stores are the Future
2
$14,639
$12,597
$14,193
$13,228
$15,305
$11,812
$10,868
$10,432
$9,924
$13,147
$15,700
$15,200
10 14 18 29 40
82 102
262
449
556
762
905
1,000
1,300
2,000
0
500
1000
1500
2000
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13
IDC Wordwide Server Sales in $ Millions Vs Billions of Objects in AWS S3
The Number of Objects in Amazon S3 is Growing Fast
Server Sales are basically flat
3. Manta is Joyent’s new Object Storage Service
3
Joyent Object Store
Manta
Put Data into Manta
Get Data from Manta
Via a RESTful API
An object is non-interpreted data of any size that you read and write to the store.
4. Manta is Live and Available Today
4
http://www.joyent.com/products/manta
5. A file is an example of an object
• The code below does the following:
1. Creates a file called hello.txt that contains the words “Hello Manta”
2. Puts the file into Manta
3. Gets the file back from Manta and outputs it’s contents
5
$ echo "Hello, Manta" > /tmp/hello.txt
$ mput -f /tmp/hello.txt /$MANTA_USER/stor/hello-foo
/$MANTA_USER/stor/hello-foo [====================>] 100% 13B
$ mget /$MANTA_USER/stor/hello-foo
Hello, Manta
6. Manta Partners support File Interfaces
6
Joyent Object Store
Manta
Partners offer NAS File Interfaces
that run in existing data centers but
back up to the Manta Object Store
Panzura solution is available today. The other solutions are due to be available by end of Q4, 2013.
7. Manta adds Big Data to Object Storage
7
Joyent Object Store
Manta
Only 1 Step - Analyze or Process Data using Manta Jobs
Send in the Big Data Job
Manta acts like a Platform as a Service (PaaS) for Big Data Analytics
Manta is the only Object Storage System that brings Compute directly to the Data.
8. Big Data is easy on Manta vs complex on AWS
8
1 - Download Data
3 - Upload Data Again
Cloud Object Store
S3
2 - Analyze or Process Data
Netflix has open-sourced their Genie Management Tools for Running Hadoop Jobs with S3.
To Analyze Data in S3, the Netflix system requires coordinating 9 pieces of Software:
Hadoop, Hive, Pig, Karyon, Servo, Ribbon, Archaius, Eureka, and Genie
Big Data analytics on AWS/S3 requires 3 complex steps
vs 1 simple step on Manta.
9. S3 + EC2 also requires new Sysadmins
9
Admins are needed because “Genie is not an end-to-
end resource management tool - it doesn’t provision or
launch clusters, and neither does it scale clusters up
and down based on their utilization”
End-users are the data-scientists who want
to analyze or process data stored in S3
10. 4
Big Data Made Simple
• Single store of record for your data
• Do analysis without the learning curve of server administration
• Do big data analysis in any language
“There is no learning curve to run
Manta for us, since it runs on Unix.”
Konstantin Gredeskoul, CTO
11. Manta delivers Value
• Requests
• Delete! Free
• POST, PUT, LIST (“GET DIR”)! $0.005/1000 requests
• GET, OPTION, HEAD! $0.004/10000 requests
• Bandwidth
• All bandwidth in $0.000 (free)
• Bandwidth out after 1st TB $0.120 /GB to $0.050 / GB
11
Storage Tier Per Individual Copy Per 2 Copies (default)
First 1 TB/month $0.043 per GB $0.086 per GB
Next 49 TB/month $0.036 per GB $0.072 per GB
Next 450 TB/month $0.032 per GB $0.064 per GB
Next 500 TB/month $0.029 per GB $0.058 per GB
Next 4000 TB/month $0.027 per GB $0.054 per GB
Next 5000 TB/month $0.025 per GB $0.050 per GB
Default is 2 copies.
When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6).
Default is 2 copies.
When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6).
Default is 2 copies.
When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6).
• Storage
• Compute
• $0.00004/GB DRAM•sec
• If you run 1000 parallel tasks on 1000 objects
and they each take a second, then you've used
1000 seconds of time and the cost for this job
would be $0.04.
14. Technical Description of Manta
• Multi-datacenter Object Store
• Granular datacenter and copy policies
• No size limits
• In-kernel (clustered ZFS DMU)
• More akin to a MetroCluster Netapp
• S3: JVM on ext3 on Linux
• Strongly consistent and transactional data semantics
• Close to UNIX file-system semantics
14
15. Analytics Capability: Codename Marlin
• A facility for running compute jobs directly on Manta storage nodes
• Complete EC2-like batch compute environment
• A framework for distributing work to the right physical servers,
tracking which pieces are complete, capturing the output, and
repeating the whole process to facilitate multi-phase computation on
objects at rest
• Complete unix environment without any ETL
• A non-interactive unix shell environment for doing "work" on Manta
objects as local files
15
16. Why Marlin is Revolutionary
Customers are able to do queries, create datapipes, do transformations and
map reduce on objects very quickly and without data movement and without
the additional costs of spinning up instances
16
17. Big Data Use Case Examples - Part 1
• Log processing
• Clickstream analysis, map reduce on logs
• Image processing
• converting formats, generating thumbnails
• Video processing
• transcoding, extracting segments, resizing
• “Hardcore" data analysis
• NumPy, SciPy, R, machine learning, data mining
17
18. Big Data Use Case Examples - Part 2
• SQL-like queries over structured data
• Similar to what Hive provides for Hadoop
• Datapipeling
• MySQL, Postgres plus other clients
• Text processing
• e-discovery and internal search engines
• Backup and Disaster recovery
• Encrypt and verify integrity without moving/downloading the data
18
19. Key Security & Sharing Example
• With rich access controls in Manta, it is possible to run compute on
other users' data that's been made available to you
• Without actually having access to it
• Without having to ship it
• Without being able to egress the dataset itself
19