SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Introduction to Manta
Rod Boothby
VP
415-819-9253
rod@joyent.com
August 12, 2013
Object Stores are the Future
2
$14,639
$12,597
$14,193
$13,228
$15,305
$11,812
$10,868
$10,432
$9,924
$13,147
$15,700
$15,200
10 14 18 29 40
82 102
262
449
556
762
905
1,000
1,300
2,000
0
500
1000
1500
2000
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13
IDC Wordwide Server Sales in $ Millions Vs Billions of Objects in AWS S3
The Number of Objects in Amazon S3 is Growing Fast
Server Sales are basically flat
Manta is Joyent’s new Object Storage Service
3
Joyent Object Store
Manta
Put Data into Manta
Get Data from Manta
Via a RESTful API
An object is non-interpreted data of any size that you read and write to the store.
Manta is Live and Available Today
4
http://www.joyent.com/products/manta
A file is an example of an object
• The code below does the following:
1. Creates a file called hello.txt that contains the words “Hello Manta”
2. Puts the file into Manta
3. Gets the file back from Manta and outputs it’s contents
5
$ echo "Hello, Manta" > /tmp/hello.txt
$ mput -f /tmp/hello.txt /$MANTA_USER/stor/hello-foo
/$MANTA_USER/stor/hello-foo [====================>] 100% 13B
$ mget /$MANTA_USER/stor/hello-foo
Hello, Manta
Manta Partners support File Interfaces
6
Joyent Object Store
Manta
Partners offer NAS File Interfaces
that run in existing data centers but
back up to the Manta Object Store
Panzura solution is available today. The other solutions are due to be available by end of Q4, 2013.
Manta adds Big Data to Object Storage
7
Joyent Object Store
Manta
Only 1 Step - Analyze or Process Data using Manta Jobs
Send in the Big Data Job
Manta acts like a Platform as a Service (PaaS) for Big Data Analytics
Manta is the only Object Storage System that brings Compute directly to the Data.
Big Data is easy on Manta vs complex on AWS
8
1 - Download Data
3 - Upload Data Again
Cloud Object Store
S3
2 - Analyze or Process Data
Netflix has open-sourced their Genie Management Tools for Running Hadoop Jobs with S3.
To Analyze Data in S3, the Netflix system requires coordinating 9 pieces of Software:
Hadoop, Hive, Pig, Karyon, Servo, Ribbon, Archaius, Eureka, and Genie
Big Data analytics on AWS/S3 requires 3 complex steps
vs 1 simple step on Manta.
S3 + EC2 also requires new Sysadmins
9
Admins are needed because “Genie is not an end-to-
end resource management tool - it doesn’t provision or
launch clusters, and neither does it scale clusters up
and down based on their utilization”
End-users are the data-scientists who want
to analyze or process data stored in S3
4
Big Data Made Simple
• Single store of record for your data
• Do analysis without the learning curve of server administration
• Do big data analysis in any language
“There is no learning curve to run
Manta for us, since it runs on Unix.”
Konstantin Gredeskoul, CTO
Manta delivers Value
• Requests
• Delete! Free
• POST, PUT, LIST (“GET DIR”)! $0.005/1000 requests
• GET, OPTION, HEAD! $0.004/10000 requests
• Bandwidth
• All bandwidth in $0.000 (free)
• Bandwidth out after 1st TB $0.120 /GB to $0.050 / GB
11
Storage Tier Per Individual Copy Per 2 Copies (default)
First 1 TB/month $0.043 per GB $0.086 per GB
Next 49 TB/month $0.036 per GB $0.072 per GB
Next 450 TB/month $0.032 per GB $0.064 per GB
Next 500 TB/month $0.029 per GB $0.058 per GB
Next 4000 TB/month $0.027 per GB $0.054 per GB
Next 5000 TB/month $0.025 per GB $0.050 per GB
Default is 2 copies.
When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6).
Default is 2 copies.
When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6).
Default is 2 copies.
When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6).
• Storage
• Compute
• $0.00004/GB DRAM•sec
• If you run 1000 parallel tasks on 1000 objects
and they each take a second, then you've used
1000 seconds of time and the cost for this job
would be $0.04.
Technical Appendix
Accessing Manta is Easy
• Manta REST API
• Manta CLI & Shell
• Manta Node.js SDK
• Manta Python SDK
• Manta Ruby SDK
• Manta Java SDK
13
Technical Description of Manta
• Multi-datacenter Object Store
• Granular datacenter and copy policies
• No size limits
• In-kernel (clustered ZFS DMU)
• More akin to a MetroCluster Netapp
• S3: JVM on ext3 on Linux
• Strongly consistent and transactional data semantics
• Close to UNIX file-system semantics
14
Analytics Capability: Codename Marlin
• A facility for running compute jobs directly on Manta storage nodes
• Complete EC2-like batch compute environment
• A framework for distributing work to the right physical servers,
tracking which pieces are complete, capturing the output, and
repeating the whole process to facilitate multi-phase computation on
objects at rest
• Complete unix environment without any ETL
• A non-interactive unix shell environment for doing "work" on Manta
objects as local files
15
Why Marlin is Revolutionary
Customers are able to do queries, create datapipes, do transformations and
map reduce on objects very quickly and without data movement and without
the additional costs of spinning up instances
16
Big Data Use Case Examples - Part 1
• Log processing
• Clickstream analysis, map reduce on logs
• Image processing
• converting formats, generating thumbnails
• Video processing
• transcoding, extracting segments, resizing
• “Hardcore" data analysis
• NumPy, SciPy, R, machine learning, data mining
17
Big Data Use Case Examples - Part 2
• SQL-like queries over structured data
• Similar to what Hive provides for Hadoop
• Datapipeling
• MySQL, Postgres plus other clients
• Text processing
• e-discovery and internal search engines
• Backup and Disaster recovery
• Encrypt and verify integrity without moving/downloading the data
18
Key Security & Sharing Example
• With rich access controls in Manta, it is possible to run compute on
other users' data that's been made available to you
• Without actually having access to it
• Without having to ship it
• Without being able to egress the dataset itself
19
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Joyent Corporate Overview
Joyent Corporate OverviewJoyent Corporate Overview
Joyent Corporate Overview
Carly Guarcello
 
Workshop eNovance/OpenStack 20-12-2012
Workshop eNovance/OpenStack 20-12-2012Workshop eNovance/OpenStack 20-12-2012
Workshop eNovance/OpenStack 20-12-2012
eNovance
 
Ultimate hybrid cloud
Ultimate hybrid cloudUltimate hybrid cloud
Ultimate hybrid cloud
Mirantis
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
openstackindia
 
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
OpenStack
 

Was ist angesagt? (20)

Microcontainers, Microservices, Microservers? Less [Linux] is more!
Microcontainers, Microservices, Microservers? Less [Linux] is more!Microcontainers, Microservices, Microservers? Less [Linux] is more!
Microcontainers, Microservices, Microservers? Less [Linux] is more!
 
Performance of joyent cloud
Performance of joyent cloudPerformance of joyent cloud
Performance of joyent cloud
 
Joyent Corporate Overview
Joyent Corporate OverviewJoyent Corporate Overview
Joyent Corporate Overview
 
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on DemandLinux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
Linux VDI with OpenStack – How to Deliver Linux Virtual Desktops on Demand
 
Workshop eNovance/OpenStack 20-12-2012
Workshop eNovance/OpenStack 20-12-2012Workshop eNovance/OpenStack 20-12-2012
Workshop eNovance/OpenStack 20-12-2012
 
Cloud Based VDI with OpenStack, by Shifen Yang
Cloud Based VDI with OpenStack, by Shifen YangCloud Based VDI with OpenStack, by Shifen Yang
Cloud Based VDI with OpenStack, by Shifen Yang
 
2021 March Pravega Community Meeting
2021 March Pravega Community Meeting2021 March Pravega Community Meeting
2021 March Pravega Community Meeting
 
Building clouds with apache cloudstack apache roadshow 2018
Building clouds with apache cloudstack   apache roadshow 2018Building clouds with apache cloudstack   apache roadshow 2018
Building clouds with apache cloudstack apache roadshow 2018
 
[OpenStack Day in Korea 2015] Keynote 5 - The evolution of OpenStack Networking
[OpenStack Day in Korea 2015] Keynote 5 - The evolution of OpenStack Networking[OpenStack Day in Korea 2015] Keynote 5 - The evolution of OpenStack Networking
[OpenStack Day in Korea 2015] Keynote 5 - The evolution of OpenStack Networking
 
Microservices Runtimes
Microservices RuntimesMicroservices Runtimes
Microservices Runtimes
 
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStack
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStackGPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStack
GPU Accelerated Virtual Desktop Infrastructure (VDI) on OpenStack
 
Gdg izmir kubernetes
Gdg izmir kubernetesGdg izmir kubernetes
Gdg izmir kubernetes
 
Ultimate hybrid cloud
Ultimate hybrid cloudUltimate hybrid cloud
Ultimate hybrid cloud
 
Microservices and Cloud Native Apps Meetup with Diamanti and Nirmata
Microservices and Cloud Native Apps Meetup with Diamanti and NirmataMicroservices and Cloud Native Apps Meetup with Diamanti and Nirmata
Microservices and Cloud Native Apps Meetup with Diamanti and Nirmata
 
Introduction To OpenStack
Introduction To OpenStackIntroduction To OpenStack
Introduction To OpenStack
 
Storage as a service and OpenStack Cinder
Storage as a service and OpenStack CinderStorage as a service and OpenStack Cinder
Storage as a service and OpenStack Cinder
 
Cassandra on Docker
Cassandra on DockerCassandra on Docker
Cassandra on Docker
 
Cloudstack at Spotify
Cloudstack at SpotifyCloudstack at Spotify
Cloudstack at Spotify
 
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
Building a GPU-enabled OpenStack Cloud for HPC - Blair Bethwaite, Monash Univ...
 
Containers and workload security an overview
Containers and workload security an overview Containers and workload security an overview
Containers and workload security an overview
 

Ähnlich wie Intro to Joyent's Manta Object Storage Service

Ähnlich wie Intro to Joyent's Manta Object Storage Service (20)

Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)Why Scale Matters and How the Cloud is Really Different (at scale)
Why Scale Matters and How the Cloud is Really Different (at scale)
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
AWS Summit 2013 | India - Petabyte Scale Data Warehousing at Low Cost, Abhish...
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Iot meets Serverless
Iot meets ServerlessIot meets Serverless
Iot meets Serverless
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Intro to Joyent's Manta Object Storage Service

  • 1. Introduction to Manta Rod Boothby VP 415-819-9253 rod@joyent.com August 12, 2013
  • 2. Object Stores are the Future 2 $14,639 $12,597 $14,193 $13,228 $15,305 $11,812 $10,868 $10,432 $9,924 $13,147 $15,700 $15,200 10 14 18 29 40 82 102 262 449 556 762 905 1,000 1,300 2,000 0 500 1000 1500 2000 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13 IDC Wordwide Server Sales in $ Millions Vs Billions of Objects in AWS S3 The Number of Objects in Amazon S3 is Growing Fast Server Sales are basically flat
  • 3. Manta is Joyent’s new Object Storage Service 3 Joyent Object Store Manta Put Data into Manta Get Data from Manta Via a RESTful API An object is non-interpreted data of any size that you read and write to the store.
  • 4. Manta is Live and Available Today 4 http://www.joyent.com/products/manta
  • 5. A file is an example of an object • The code below does the following: 1. Creates a file called hello.txt that contains the words “Hello Manta” 2. Puts the file into Manta 3. Gets the file back from Manta and outputs it’s contents 5 $ echo "Hello, Manta" > /tmp/hello.txt $ mput -f /tmp/hello.txt /$MANTA_USER/stor/hello-foo /$MANTA_USER/stor/hello-foo [====================>] 100% 13B $ mget /$MANTA_USER/stor/hello-foo Hello, Manta
  • 6. Manta Partners support File Interfaces 6 Joyent Object Store Manta Partners offer NAS File Interfaces that run in existing data centers but back up to the Manta Object Store Panzura solution is available today. The other solutions are due to be available by end of Q4, 2013.
  • 7. Manta adds Big Data to Object Storage 7 Joyent Object Store Manta Only 1 Step - Analyze or Process Data using Manta Jobs Send in the Big Data Job Manta acts like a Platform as a Service (PaaS) for Big Data Analytics Manta is the only Object Storage System that brings Compute directly to the Data.
  • 8. Big Data is easy on Manta vs complex on AWS 8 1 - Download Data 3 - Upload Data Again Cloud Object Store S3 2 - Analyze or Process Data Netflix has open-sourced their Genie Management Tools for Running Hadoop Jobs with S3. To Analyze Data in S3, the Netflix system requires coordinating 9 pieces of Software: Hadoop, Hive, Pig, Karyon, Servo, Ribbon, Archaius, Eureka, and Genie Big Data analytics on AWS/S3 requires 3 complex steps vs 1 simple step on Manta.
  • 9. S3 + EC2 also requires new Sysadmins 9 Admins are needed because “Genie is not an end-to- end resource management tool - it doesn’t provision or launch clusters, and neither does it scale clusters up and down based on their utilization” End-users are the data-scientists who want to analyze or process data stored in S3
  • 10. 4 Big Data Made Simple • Single store of record for your data • Do analysis without the learning curve of server administration • Do big data analysis in any language “There is no learning curve to run Manta for us, since it runs on Unix.” Konstantin Gredeskoul, CTO
  • 11. Manta delivers Value • Requests • Delete! Free • POST, PUT, LIST (“GET DIR”)! $0.005/1000 requests • GET, OPTION, HEAD! $0.004/10000 requests • Bandwidth • All bandwidth in $0.000 (free) • Bandwidth out after 1st TB $0.120 /GB to $0.050 / GB 11 Storage Tier Per Individual Copy Per 2 Copies (default) First 1 TB/month $0.043 per GB $0.086 per GB Next 49 TB/month $0.036 per GB $0.072 per GB Next 450 TB/month $0.032 per GB $0.064 per GB Next 500 TB/month $0.029 per GB $0.058 per GB Next 4000 TB/month $0.027 per GB $0.054 per GB Next 5000 TB/month $0.025 per GB $0.050 per GB Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). Default is 2 copies. When submitting an object to the service, you can specify the number of copies stored, from one (1) to six (6). • Storage • Compute • $0.00004/GB DRAM•sec • If you run 1000 parallel tasks on 1000 objects and they each take a second, then you've used 1000 seconds of time and the cost for this job would be $0.04.
  • 13. Accessing Manta is Easy • Manta REST API • Manta CLI & Shell • Manta Node.js SDK • Manta Python SDK • Manta Ruby SDK • Manta Java SDK 13
  • 14. Technical Description of Manta • Multi-datacenter Object Store • Granular datacenter and copy policies • No size limits • In-kernel (clustered ZFS DMU) • More akin to a MetroCluster Netapp • S3: JVM on ext3 on Linux • Strongly consistent and transactional data semantics • Close to UNIX file-system semantics 14
  • 15. Analytics Capability: Codename Marlin • A facility for running compute jobs directly on Manta storage nodes • Complete EC2-like batch compute environment • A framework for distributing work to the right physical servers, tracking which pieces are complete, capturing the output, and repeating the whole process to facilitate multi-phase computation on objects at rest • Complete unix environment without any ETL • A non-interactive unix shell environment for doing "work" on Manta objects as local files 15
  • 16. Why Marlin is Revolutionary Customers are able to do queries, create datapipes, do transformations and map reduce on objects very quickly and without data movement and without the additional costs of spinning up instances 16
  • 17. Big Data Use Case Examples - Part 1 • Log processing • Clickstream analysis, map reduce on logs • Image processing • converting formats, generating thumbnails • Video processing • transcoding, extracting segments, resizing • “Hardcore" data analysis • NumPy, SciPy, R, machine learning, data mining 17
  • 18. Big Data Use Case Examples - Part 2 • SQL-like queries over structured data • Similar to what Hive provides for Hadoop • Datapipeling • MySQL, Postgres plus other clients • Text processing • e-discovery and internal search engines • Backup and Disaster recovery • Encrypt and verify integrity without moving/downloading the data 18
  • 19. Key Security & Sharing Example • With rich access controls in Manta, it is possible to run compute on other users' data that's been made available to you • Without actually having access to it • Without having to ship it • Without being able to egress the dataset itself 19