SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Scalable Object Storage with
Apache CloudStack and Apache
           Hadoop
              
          February 26 2013
                  
           Chiradeep Vittal
            @chiradeep
Agenda
•    What is CloudStack
•    Object Storage for IAAS
•    Current Architecture and Limitations
•    Requirements for Object Storage
•    Object Storage integrations in CloudStack
•    HDFS for Object Storage
•    Future directions
Apache CloudStack

                              •  History!
                                  •  Incubating in the Apache
                                     Software Foundation since
                                     April 2012!
                                  •  Open Source since May
  Build your cloud the way           2010!
the world’s most successful
       clouds are built!      •  In production since 2009!
                                 –  Turnkey platform for delivering
                                    IaaS clouds!
                                 –  Full featured GUI, end-user API
                                    and admin API!
How did Amazon build its cloud?

           Amazon eCommerce Platform


              AWS API (EC2, S3, …)


           Amazon Orchestration Software



            Open Source Xen Hypervisor


                    Commodity    Commodity
       Networking
                     Servers      Storage
How can YOU build a cloud?

       Amazon eCommerce Platform
            Optional Portal


          AWS API (EC2, S3, …)
          CloudStack or AWS API


     CloudStack Orchestration Software
      Amazon Orchestration Software


       Hypervisor (Xen/KVM/VMW/)
       Open Source Xen Hypervisor


     Networking    Servers    Storage
Zone Architecture
                                                                                    End	
  users	
  
     Admin/User	
  API	
  

                         CloudStack	
  
                                                                               DC	
  Edge	
  
                                  MySQL	
  

                                                                               L3/L2	
  core	
  	
  

         Access	
  Sw	
  

  Hypervisor	
  (Xen	
                                                                                                                Snapshot	
  


  /VMWare/KVM)	
                                                                                                                          Image	
  
                                                                                                                                            Image	
  

                                                                                                                            Secondary	
  Storage	
  
                                                                                                            VM	
  


                                                                 VM	
  




                                                                                                             Snapshot	
  
Primary	
  Storage	
                                                Disk	
                               Disk	
  

NFS/ISCSI/FC	
  

                            Pod	
         Pod	
     Pod	
     Pod	
                                    Pod	
  
Cloud-Style Workloads

•  Low cost
   –  Standardized, cookie cutter infrastructure
   –  Highly automated and efficient
•  Application owns availability
   –  At scale everything breaks
   –  Focus on MTTR instead of MTBF
Scale
“At scale, everything breaks”
     
     
    


                                    -­‐	
  Urs	
  Hölzle,	
  Google!

                                                                 Server failure comes from:!
                                                                       ᵒ  70% - hard disk!




      8%	
  
                                                                       ᵒ  6% - RAID controller!
                                                                       ᵒ  5% - memory!
                                                                       ᵒ  18% - other factors!
                                                                 Application can still fail for
   Annual	
  Failure	
  Rate	
  of	
  servers	
                  other reasons:!
                                                                       ᵒ  Network failure!
Kashi	
  Venkatesh	
  Vishwanath	
  and	
  
Nachiappan	
  Nagappan,	
  Characterizing	
  
                                                                       ᵒ  Software bugs!
Cloud	
  Compu3ng	
  Hardware	
  Reliability,	
                        ᵒ  Human admin error!
SoCC’10	
  
At scale…everything breaks




                                                                           DC	
  Edge	
  


                                                                           L3/L2	
  core	
  	
  

         Access	
  Sw	
  

  Hypervisor	
  (Xen	
                                                                                                            Snapshot	
  


  /VMWare/KVM)	
                                                                                                                      Image	
  
                                                                                                                                        Image	
  

                                                                                                                        Secondary	
  Storage	
  
                                                                                                        VM	
  


                                                             VM	
  




                                                                                                         Snapshot	
  
Primary	
  Storage	
                                            Disk	
                               Disk	
  

NFS/ISCSI/FC	
  

                            Pod	
     Pod	
     Pod	
     Pod	
                                    Pod	
  
Regions and zones


Region “West”
                        Zone “West-Beta”
   Zone “West-Alpha”

                                   Low Latency Backbone
                                   (e.g., SONET ring)


         Zone “West-Delta”


                            Zone “West-Gamma”
Region “West”
             Region “East”

                     Geographic 
                     separation




                       Internet
Low Latency 





                  Region “South”
Secondary Storage in CloudStack 4.0

•  NFS server default
   –  can be mounted by hypervisor
   –  Easy to obtain, set up and operate
•  Problems with NFS:
   –  Scale: max limits of file systems
      •  Solution: CloudStack can manage multiple NFS stores (+
         complexity)
   –  Performance
      •  N hypervisors : 1 storage CPU / 1 network link
   –  Wide area suitability for cross-region storage
      •  Chatty protocol
   –  Lack of replication
Object Storage in a region


Region “West”
                           Zone “West-Beta”
                                                 •    Replication
   Zone “West-Alpha”
                            •    Audit
                                                 •    Repair
                                                 •    Maintenance


           Object	
  Storage	
  Technology	
  
         Zone “West-Delta”


                                   Zone “West-Gamma”
Object Storage enables reliability


Region “West”
Object Storage also enables other
          applications

Region “West”

                                                              •  DropBox
                                                              •  Static Content
  Object	
  Store	
                                           •  Archival
  API	
  Servers	
  


                        Object	
  Storage	
  Technology	
  
Object Storage characteristics
•    Highly reliable and durable
      –  99.9 % availability for AWS S3
      –  99.999999999 % durability
•    Massive scale
      –  1.3 trillion objects stored across 7 AWS regions [Nov 2012 figures]
      –  Throughput: 830,000 requests per second
•    Immutable objects
      –  Objects cannot be modified, only deleted
•    Simple API
      –  PUT/POST objects, GET objects, DELETE objects
      –  No seek / no mutation / no POSIX API
•    Flat namespace
      –  Everything stored in buckets.
      –  Bucket names are unique
      –  Buckets can only contain objects, not other buckets
•    Cheap and getting cheaper
CloudStack S3 API Server


         S3	
  	
  
         API	
  Servers	
  


MySQL
                              Object	
  Storage	
  Technology	
  
CloudStack S3 API Server
•  Understands AWS S3 REST-style and SOAP API
•  Pluggable backend
  –  Backend storage needs to map simple calls to their
     API
     •  E.g., createContainer, saveObject, loadObject!
  –  Default backend is a POSIX filesystem
  –  Backend with Caringo Object Store (commercial
     vendor) available
  –  HDFS backend also available
•  MySQL storage
  –  Bucket -> object mapping
  –  ACLs, bucket policies
Object Store Integration into
            CloudStack 
•  For images and snapshots
•  Replacement for NFS secondary storage 
                       Or
  Augmentation for NFS secondary storage
•  Integrations available with
  –  Riak CS
  –  Openstack Swift
•  New in 4.2 (upcoming):
  –  Framework for integrating storage providers
What do we want to build ?
•    Open source, ASL licensed object storage
•    Scales to at least 1 billion objects
•    Reliability and durability on par with S3
•    S3 API (or similar, e.g., Google Storage)
•    Tooling around maintenance and
     operation, specific to object storage
The following slides are a design
           discussion
Architecture of Scalable Object
            Storage
               Auth	
  Servers	
  
                                                 Object	
  
                                                 Lookup	
  
                                                 Servers	
  



  API	
  Servers	
  
                                     Object	
  Servers	
       Replicators/Auditors	
  
Why HDFS
•  ASF Project (Apache Hadoop)
•  Immutable objects, replication
•  Reliability, scale and performance
  –  200 million objects in 1 cluster [Facebook]
  –  100 PB in 1 cluster [Facebook]
•  Simple operation
  –  Just add data nodes
HDFS-based Object Storage
                S3	
  Auth	
  Servers	
  
                                                                Namenode	
  
                                                                pair	
  




  S3	
  API	
  Servers	
  
                                  HDFS	
  API	
     Data	
  nodes	
  
BUT
•  Name Node Scalability
  –  150 bytes RAM / block
  –  GC issues
•  Name Node SPOF
  –  Being addressed in the community✔
•  Cross-zone replication
  –  Rack-awareness placement ✔
  –  What if the zones are spread a little further apart?
•  Storage for object metadata
  –  ACLs, policies, timers
Name Node scalability
•  1 billion objects = 3 billion blocks (chunks)
  –  Average of 5 MB/object = 5 PB (actual), 15
     PB (raw)
  –  450 GB of RAM per Name Node
     •  150b x 3 x 10^9 

  –  16 TB / node => 1000 Data nodes
•  Requires Name Node federation ?
•  Or an approach like HAR files
Name Node Federation




 Extension:	
  Federated	
  NameNodes	
  are	
  HA	
  pairs	
  
Federation issues
•  HA for name nodes
•  Namespace shards
  –  Map object -> name node
    •  Requires another scalable key-value store
       –  HBase?

•  Rebalancing between name nodes
Replication over lossy/slower links
A.  Asynchronous replication
  –  Use distcp to replicate between clusters
  –  6 copies vs. 3
  –  Master/Slave relationship
    •    Possibility of loss of data during failover
    •    Need coordination logic outside of HDFS
B.  Synchronous replication
  –  API server writes to 2 clusters and acks only
     when both writes are successful
  –  Availability compromised when one zone is
     down
CAP Theorem
Consistency or Availability during partition
             Many nuances
Storage for object metadata
A.  Store it in HDFS along with the object
  –  Reads are expensive (e.g., to check ACL)
  –  Mutable data, needs layer over HDFS
B.  Use another storage system (e.g. HBase)
  –  Name node federation also requires this.
C.  Modify Name Node to store metadata
  –  High performance
  –  Not extensible
Object store on HDFS Future
•  Viable for small-sized deployments
  –  Up to 100-200 million objects
  –  Datacenters close together
•  Larger deployments needs development
  –  No effort ongoing at this time
Conclusion
•  CloudStack needs object storage for
   “cloud-style” workloads
•  Object Storage is not easy
•  HDFS comes close but not close enough
•  Join the community!

Weitere ähnliche Inhalte

Was ist angesagt?

Building clouds with apache cloudstack apache roadshow 2018
Building clouds with apache cloudstack   apache roadshow 2018Building clouds with apache cloudstack   apache roadshow 2018
Building clouds with apache cloudstack apache roadshow 2018ShapeBlue
 
Giles Sirett: Introduction and CloudStack news
Giles Sirett: Introduction and CloudStack news   Giles Sirett: Introduction and CloudStack news
Giles Sirett: Introduction and CloudStack news ShapeBlue
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin
 
New stuff in CloudStack!
New stuff in CloudStack!New stuff in CloudStack!
New stuff in CloudStack!ShapeBlue
 
Introduction to cloudstack 4.3 networking
Introduction to cloudstack 4.3 networking  Introduction to cloudstack 4.3 networking
Introduction to cloudstack 4.3 networking ShapeBlue
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittalbuildacloud
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
Ceph with CloudStack
Ceph with CloudStackCeph with CloudStack
Ceph with CloudStackShapeBlue
 
CloudStack vs OpenStack
CloudStack vs OpenStackCloudStack vs OpenStack
CloudStack vs OpenStackVictor Zhang
 
Cloud stack networking shapeblue technical deep dive
Cloud stack networking   shapeblue technical deep diveCloud stack networking   shapeblue technical deep dive
Cloud stack networking shapeblue technical deep diveShapeBlue
 
CloudStack vs OpenStack vs Eucalyptus: IaaS Private Cloud Brief Comparison
CloudStack vs OpenStack vs Eucalyptus: IaaS Private Cloud Brief ComparisonCloudStack vs OpenStack vs Eucalyptus: IaaS Private Cloud Brief Comparison
CloudStack vs OpenStack vs Eucalyptus: IaaS Private Cloud Brief Comparisonbizalgo
 
Cloudstack vs Openstack
Cloudstack vs OpenstackCloudstack vs Openstack
Cloudstack vs OpenstackHuzefa Husain
 
CloudStack - Top 5 Technical Issues and Troubleshooting
CloudStack - Top 5 Technical Issues and TroubleshootingCloudStack - Top 5 Technical Issues and Troubleshooting
CloudStack - Top 5 Technical Issues and TroubleshootingShapeBlue
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overviewgavin_lee
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overviewhowie YU
 
Paul Angus – Backup & Recovery in CloudStack
Paul Angus – Backup & Recovery in CloudStackPaul Angus – Backup & Recovery in CloudStack
Paul Angus – Backup & Recovery in CloudStackShapeBlue
 
Hypervisor Selection in Apache CloudStack 4.4
Hypervisor Selection in Apache CloudStack 4.4Hypervisor Selection in Apache CloudStack 4.4
Hypervisor Selection in Apache CloudStack 4.4Tim Mackey
 

Was ist angesagt? (20)

Building clouds with apache cloudstack apache roadshow 2018
Building clouds with apache cloudstack   apache roadshow 2018Building clouds with apache cloudstack   apache roadshow 2018
Building clouds with apache cloudstack apache roadshow 2018
 
Cloud stack for_beginners
Cloud stack for_beginnersCloud stack for_beginners
Cloud stack for_beginners
 
CloudStack vs Openstack
CloudStack vs OpenstackCloudStack vs Openstack
CloudStack vs Openstack
 
Giles Sirett: Introduction and CloudStack news
Giles Sirett: Introduction and CloudStack news   Giles Sirett: Introduction and CloudStack news
Giles Sirett: Introduction and CloudStack news
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
New stuff in CloudStack!
New stuff in CloudStack!New stuff in CloudStack!
New stuff in CloudStack!
 
Introduction to cloudstack 4.3 networking
Introduction to cloudstack 4.3 networking  Introduction to cloudstack 4.3 networking
Introduction to cloudstack 4.3 networking
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
Ceph with CloudStack
Ceph with CloudStackCeph with CloudStack
Ceph with CloudStack
 
CloudStack vs OpenStack
CloudStack vs OpenStackCloudStack vs OpenStack
CloudStack vs OpenStack
 
Cloud stack networking shapeblue technical deep dive
Cloud stack networking   shapeblue technical deep diveCloud stack networking   shapeblue technical deep dive
Cloud stack networking shapeblue technical deep dive
 
CloudStack vs OpenStack vs Eucalyptus: IaaS Private Cloud Brief Comparison
CloudStack vs OpenStack vs Eucalyptus: IaaS Private Cloud Brief ComparisonCloudStack vs OpenStack vs Eucalyptus: IaaS Private Cloud Brief Comparison
CloudStack vs OpenStack vs Eucalyptus: IaaS Private Cloud Brief Comparison
 
Cloudstack vs Openstack
Cloudstack vs OpenstackCloudstack vs Openstack
Cloudstack vs Openstack
 
CloudStack - Top 5 Technical Issues and Troubleshooting
CloudStack - Top 5 Technical Issues and TroubleshootingCloudStack - Top 5 Technical Issues and Troubleshooting
CloudStack - Top 5 Technical Issues and Troubleshooting
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overview
 
Cloud stack overview
Cloud stack overviewCloud stack overview
Cloud stack overview
 
Introduction to CloudStack
Introduction to CloudStack Introduction to CloudStack
Introduction to CloudStack
 
Paul Angus – Backup & Recovery in CloudStack
Paul Angus – Backup & Recovery in CloudStackPaul Angus – Backup & Recovery in CloudStack
Paul Angus – Backup & Recovery in CloudStack
 
Hypervisor Selection in Apache CloudStack 4.4
Hypervisor Selection in Apache CloudStack 4.4Hypervisor Selection in Apache CloudStack 4.4
Hypervisor Selection in Apache CloudStack 4.4
 

Andere mochten auch

Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
 
Keep me in the Loop: INotify in HDFS
Keep me in the Loop: INotify in HDFSKeep me in the Loop: INotify in HDFS
Keep me in the Loop: INotify in HDFSDataWorks Summit
 
Configuration and deployment guide for SWIFT on Intel Architecture
Configuration and deployment guide for SWIFT on Intel ArchitectureConfiguration and deployment guide for SWIFT on Intel Architecture
Configuration and deployment guide for SWIFT on Intel ArchitectureOdinot Stanislas
 
Using CloudStack With Clustered LVM
Using CloudStack With Clustered LVMUsing CloudStack With Clustered LVM
Using CloudStack With Clustered LVMMarcus L Sorensen
 
Using the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackUsing the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackShapeBlue
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFSDataWorks Summit
 
Linux Cluster Concepts
Linux Cluster ConceptsLinux Cluster Concepts
Linux Cluster Conceptsnixsavy
 
Deploying OpenStack Object Storage (Swift)
Deploying OpenStack Object Storage (Swift)Deploying OpenStack Object Storage (Swift)
Deploying OpenStack Object Storage (Swift)Juan José Martínez
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 

Andere mochten auch (10)

CloudStack S3
CloudStack S3CloudStack S3
CloudStack S3
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
Keep me in the Loop: INotify in HDFS
Keep me in the Loop: INotify in HDFSKeep me in the Loop: INotify in HDFS
Keep me in the Loop: INotify in HDFS
 
Configuration and deployment guide for SWIFT on Intel Architecture
Configuration and deployment guide for SWIFT on Intel ArchitectureConfiguration and deployment guide for SWIFT on Intel Architecture
Configuration and deployment guide for SWIFT on Intel Architecture
 
Using CloudStack With Clustered LVM
Using CloudStack With Clustered LVMUsing CloudStack With Clustered LVM
Using CloudStack With Clustered LVM
 
Using the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackUsing the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStack
 
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFSOzone: An Object Store in HDFS
Ozone: An Object Store in HDFS
 
Linux Cluster Concepts
Linux Cluster ConceptsLinux Cluster Concepts
Linux Cluster Concepts
 
Deploying OpenStack Object Storage (Swift)
Deploying OpenStack Object Storage (Swift)Deploying OpenStack Object Storage (Swift)
Deploying OpenStack Object Storage (Swift)
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 

Ähnlich wie Scalable Object Storage with Apache CloudStack and Apache Hadoop

vFabric - Ideal Platform for SaaS Apps
vFabric - Ideal Platform for SaaS AppsvFabric - Ideal Platform for SaaS Apps
vFabric - Ideal Platform for SaaS AppsVMware vFabric
 
Virtual Data Centers with OpenStack Quantum
Virtual Data Centers with OpenStack QuantumVirtual Data Centers with OpenStack Quantum
Virtual Data Centers with OpenStack Quantumlaurabeckcahoon
 
Virtual data centers with OpenStack Quantum
Virtual data centers with OpenStack QuantumVirtual data centers with OpenStack Quantum
Virtual data centers with OpenStack QuantumLew Tucker
 
Windows Azure Interoperability
Windows Azure InteroperabilityWindows Azure Interoperability
Windows Azure InteroperabilityMihai Dan Nadas
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practiceOpenCity Community
 
Using Storage Manager 5.1 to Manage and Optimize your Environment
Using Storage Manager 5.1 to Manage and Optimize your Environment Using Storage Manager 5.1 to Manage and Optimize your Environment
Using Storage Manager 5.1 to Manage and Optimize your Environment SolarWinds
 
Virtualization Technology Overview
Virtualization Technology OverviewVirtualization Technology Overview
Virtualization Technology OverviewOpenCity Community
 
virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009ACMBangalore
 
VMUG ISRAEL November 2012, EMC session by Itzik Reich
VMUG ISRAEL November 2012, EMC session by Itzik ReichVMUG ISRAEL November 2012, EMC session by Itzik Reich
VMUG ISRAEL November 2012, EMC session by Itzik ReichItzik Reich
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesDataWorks Summit
 
Networking Israeli Day 2013 - Hecatonchire: Transparent Memory Aggregation
Networking Israeli Day 2013 - Hecatonchire: Transparent Memory AggregationNetworking Israeli Day 2013 - Hecatonchire: Transparent Memory Aggregation
Networking Israeli Day 2013 - Hecatonchire: Transparent Memory Aggregationaidanshribman
 
CloudStack for Java User Group
CloudStack for Java User GroupCloudStack for Java User Group
CloudStack for Java User GroupSebastien Goasguen
 
Windows 2008 R2 Virtualization
Windows 2008  R2  VirtualizationWindows 2008  R2  Virtualization
Windows 2008 R2 VirtualizationEduardo Castro
 
Transcending Computing Environment Boundaries: Seamless Computing Environmen...
Transcending  Computing Environment Boundaries: Seamless Computing Environmen...Transcending  Computing Environment Boundaries: Seamless Computing Environmen...
Transcending Computing Environment Boundaries: Seamless Computing Environmen...HCL Infosystems
 
Virtualization securityv2
Virtualization securityv2Virtualization securityv2
Virtualization securityv2vivekbhat
 
Virtually Secure: Uncovering the risks of virtualization
Virtually Secure: Uncovering the risks of virtualizationVirtually Secure: Uncovering the risks of virtualization
Virtually Secure: Uncovering the risks of virtualizationSeccuris Inc.
 

Ähnlich wie Scalable Object Storage with Apache CloudStack and Apache Hadoop (20)

Apache CloudStack AlpesJUG
Apache CloudStack AlpesJUGApache CloudStack AlpesJUG
Apache CloudStack AlpesJUG
 
vFabric - Ideal Platform for SaaS Apps
vFabric - Ideal Platform for SaaS AppsvFabric - Ideal Platform for SaaS Apps
vFabric - Ideal Platform for SaaS Apps
 
Virtual Data Centers with OpenStack Quantum
Virtual Data Centers with OpenStack QuantumVirtual Data Centers with OpenStack Quantum
Virtual Data Centers with OpenStack Quantum
 
Virtual data centers with OpenStack Quantum
Virtual data centers with OpenStack QuantumVirtual data centers with OpenStack Quantum
Virtual data centers with OpenStack Quantum
 
Windows Azure Interoperability
Windows Azure InteroperabilityWindows Azure Interoperability
Windows Azure Interoperability
 
Learn OpenStack from trystack.cn ——Folsom in practice
Learn OpenStack from trystack.cn  ——Folsom in practiceLearn OpenStack from trystack.cn  ——Folsom in practice
Learn OpenStack from trystack.cn ——Folsom in practice
 
Using Storage Manager 5.1 to Manage and Optimize your Environment
Using Storage Manager 5.1 to Manage and Optimize your Environment Using Storage Manager 5.1 to Manage and Optimize your Environment
Using Storage Manager 5.1 to Manage and Optimize your Environment
 
Virtualization Technology Overview
Virtualization Technology OverviewVirtualization Technology Overview
Virtualization Technology Overview
 
virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009virtualization tutorial at ACM bangalore Compute 2009
virtualization tutorial at ACM bangalore Compute 2009
 
VMUG ISRAEL November 2012, EMC session by Itzik Reich
VMUG ISRAEL November 2012, EMC session by Itzik ReichVMUG ISRAEL November 2012, EMC session by Itzik Reich
VMUG ISRAEL November 2012, EMC session by Itzik Reich
 
Cloud and Grids
Cloud and GridsCloud and Grids
Cloud and Grids
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
V fabric overview
V fabric overviewV fabric overview
V fabric overview
 
Networking Israeli Day 2013 - Hecatonchire: Transparent Memory Aggregation
Networking Israeli Day 2013 - Hecatonchire: Transparent Memory AggregationNetworking Israeli Day 2013 - Hecatonchire: Transparent Memory Aggregation
Networking Israeli Day 2013 - Hecatonchire: Transparent Memory Aggregation
 
CloudStack for Java User Group
CloudStack for Java User GroupCloudStack for Java User Group
CloudStack for Java User Group
 
Windows 2008 R2 Virtualization
Windows 2008  R2  VirtualizationWindows 2008  R2  Virtualization
Windows 2008 R2 Virtualization
 
Intro to Cloudstack
Intro to CloudstackIntro to Cloudstack
Intro to Cloudstack
 
Transcending Computing Environment Boundaries: Seamless Computing Environmen...
Transcending  Computing Environment Boundaries: Seamless Computing Environmen...Transcending  Computing Environment Boundaries: Seamless Computing Environmen...
Transcending Computing Environment Boundaries: Seamless Computing Environmen...
 
Virtualization securityv2
Virtualization securityv2Virtualization securityv2
Virtualization securityv2
 
Virtually Secure: Uncovering the risks of virtualization
Virtually Secure: Uncovering the risks of virtualizationVirtually Secure: Uncovering the risks of virtualization
Virtually Secure: Uncovering the risks of virtualization
 

Mehr von Chiradeep Vittal

Loadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro servicesLoadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro servicesChiradeep Vittal
 
Load Balancing for Containers and Cloud Native Architecture
Load Balancing for Containers and Cloud Native ArchitectureLoad Balancing for Containers and Cloud Native Architecture
Load Balancing for Containers and Cloud Native ArchitectureChiradeep Vittal
 
Load Balancing for Containers and Cloud Native Architecture
Load Balancing for Containers and Cloud Native ArchitectureLoad Balancing for Containers and Cloud Native Architecture
Load Balancing for Containers and Cloud Native ArchitectureChiradeep Vittal
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack NetworkingChiradeep Vittal
 
Private cloud networking_cloudstack_days_austin
Private cloud networking_cloudstack_days_austinPrivate cloud networking_cloudstack_days_austin
Private cloud networking_cloudstack_days_austinChiradeep Vittal
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackChiradeep Vittal
 
Network Functions Virtualization and CloudStack
Network Functions Virtualization and CloudStackNetwork Functions Virtualization and CloudStack
Network Functions Virtualization and CloudStackChiradeep Vittal
 
CloudStack Networking Deepdive CCCEU13
CloudStack Networking Deepdive CCCEU13CloudStack Networking Deepdive CCCEU13
CloudStack Networking Deepdive CCCEU13Chiradeep Vittal
 
StackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStackStackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStackChiradeep Vittal
 
SDN in Apache CloudStack (ApacheCon NA 2013)
SDN in Apache CloudStack (ApacheCon NA 2013)SDN in Apache CloudStack (ApacheCon NA 2013)
SDN in Apache CloudStack (ApacheCon NA 2013)Chiradeep Vittal
 
Networking in the Cloud Age (LISA 2012 Tutorial)
Networking in the Cloud Age (LISA 2012 Tutorial)Networking in the Cloud Age (LISA 2012 Tutorial)
Networking in the Cloud Age (LISA 2012 Tutorial)Chiradeep Vittal
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)Chiradeep Vittal
 
Evolution of CloudStack Architecture (Collab 2012)
Evolution of CloudStack Architecture (Collab 2012)Evolution of CloudStack Architecture (Collab 2012)
Evolution of CloudStack Architecture (Collab 2012)Chiradeep Vittal
 
Scalable networking in Apache CloudStack
Scalable networking in Apache CloudStackScalable networking in Apache CloudStack
Scalable networking in Apache CloudStackChiradeep Vittal
 

Mehr von Chiradeep Vittal (15)

Loadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro servicesLoadbalancers: The fabric for your micro services
Loadbalancers: The fabric for your micro services
 
Load Balancing for Containers and Cloud Native Architecture
Load Balancing for Containers and Cloud Native ArchitectureLoad Balancing for Containers and Cloud Native Architecture
Load Balancing for Containers and Cloud Native Architecture
 
Load Balancing for Containers and Cloud Native Architecture
Load Balancing for Containers and Cloud Native ArchitectureLoad Balancing for Containers and Cloud Native Architecture
Load Balancing for Containers and Cloud Native Architecture
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
 
Private cloud networking_cloudstack_days_austin
Private cloud networking_cloudstack_days_austinPrivate cloud networking_cloudstack_days_austin
Private cloud networking_cloudstack_days_austin
 
StackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStackStackWatch: A prototype CloudWatch service for CloudStack
StackWatch: A prototype CloudWatch service for CloudStack
 
Network Functions Virtualization and CloudStack
Network Functions Virtualization and CloudStackNetwork Functions Virtualization and CloudStack
Network Functions Virtualization and CloudStack
 
CloudStack Networking Deepdive CCCEU13
CloudStack Networking Deepdive CCCEU13CloudStack Networking Deepdive CCCEU13
CloudStack Networking Deepdive CCCEU13
 
StackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStackStackMate - CloudFormation for CloudStack
StackMate - CloudFormation for CloudStack
 
SDN in Apache CloudStack (ApacheCon NA 2013)
SDN in Apache CloudStack (ApacheCon NA 2013)SDN in Apache CloudStack (ApacheCon NA 2013)
SDN in Apache CloudStack (ApacheCon NA 2013)
 
Networking in the Cloud Age (LISA 2012 Tutorial)
Networking in the Cloud Age (LISA 2012 Tutorial)Networking in the Cloud Age (LISA 2012 Tutorial)
Networking in the Cloud Age (LISA 2012 Tutorial)
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
 
Evolution of CloudStack Architecture (Collab 2012)
Evolution of CloudStack Architecture (Collab 2012)Evolution of CloudStack Architecture (Collab 2012)
Evolution of CloudStack Architecture (Collab 2012)
 
Scalable networking in Apache CloudStack
Scalable networking in Apache CloudStackScalable networking in Apache CloudStack
Scalable networking in Apache CloudStack
 
CloudStack + SDN
CloudStack + SDNCloudStack + SDN
CloudStack + SDN
 

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Scalable Object Storage with Apache CloudStack and Apache Hadoop

  • 1. Scalable Object Storage with Apache CloudStack and Apache Hadoop February 26 2013 Chiradeep Vittal @chiradeep
  • 2. Agenda •  What is CloudStack •  Object Storage for IAAS •  Current Architecture and Limitations •  Requirements for Object Storage •  Object Storage integrations in CloudStack •  HDFS for Object Storage •  Future directions
  • 3. Apache CloudStack •  History! •  Incubating in the Apache Software Foundation since April 2012! •  Open Source since May Build your cloud the way 2010! the world’s most successful clouds are built! •  In production since 2009! –  Turnkey platform for delivering IaaS clouds! –  Full featured GUI, end-user API and admin API!
  • 4. How did Amazon build its cloud? Amazon eCommerce Platform AWS API (EC2, S3, …) Amazon Orchestration Software Open Source Xen Hypervisor Commodity Commodity Networking Servers Storage
  • 5. How can YOU build a cloud? Amazon eCommerce Platform Optional Portal AWS API (EC2, S3, …) CloudStack or AWS API CloudStack Orchestration Software Amazon Orchestration Software Hypervisor (Xen/KVM/VMW/) Open Source Xen Hypervisor Networking Servers Storage
  • 6. Zone Architecture End  users   Admin/User  API   CloudStack   DC  Edge   MySQL   L3/L2  core     Access  Sw   Hypervisor  (Xen   Snapshot   /VMWare/KVM)   Image   Image   Secondary  Storage   VM   VM   Snapshot   Primary  Storage   Disk   Disk   NFS/ISCSI/FC   Pod   Pod   Pod   Pod   Pod  
  • 7. Cloud-Style Workloads •  Low cost –  Standardized, cookie cutter infrastructure –  Highly automated and efficient •  Application owns availability –  At scale everything breaks –  Focus on MTTR instead of MTBF
  • 8. Scale “At scale, everything breaks” -­‐  Urs  Hölzle,  Google! Server failure comes from:! ᵒ  70% - hard disk! 8%   ᵒ  6% - RAID controller! ᵒ  5% - memory! ᵒ  18% - other factors! Application can still fail for Annual  Failure  Rate  of  servers   other reasons:! ᵒ  Network failure! Kashi  Venkatesh  Vishwanath  and   Nachiappan  Nagappan,  Characterizing   ᵒ  Software bugs! Cloud  Compu3ng  Hardware  Reliability,   ᵒ  Human admin error! SoCC’10  
  • 9. At scale…everything breaks DC  Edge   L3/L2  core     Access  Sw   Hypervisor  (Xen   Snapshot   /VMWare/KVM)   Image   Image   Secondary  Storage   VM   VM   Snapshot   Primary  Storage   Disk   Disk   NFS/ISCSI/FC   Pod   Pod   Pod   Pod   Pod  
  • 10. Regions and zones Region “West” Zone “West-Beta” Zone “West-Alpha” Low Latency Backbone (e.g., SONET ring) Zone “West-Delta” Zone “West-Gamma”
  • 11. Region “West” Region “East” Geographic separation Internet Low Latency Region “South”
  • 12. Secondary Storage in CloudStack 4.0 •  NFS server default –  can be mounted by hypervisor –  Easy to obtain, set up and operate •  Problems with NFS: –  Scale: max limits of file systems •  Solution: CloudStack can manage multiple NFS stores (+ complexity) –  Performance •  N hypervisors : 1 storage CPU / 1 network link –  Wide area suitability for cross-region storage •  Chatty protocol –  Lack of replication
  • 13. Object Storage in a region Region “West” Zone “West-Beta” •  Replication Zone “West-Alpha” •  Audit •  Repair •  Maintenance Object  Storage  Technology   Zone “West-Delta” Zone “West-Gamma”
  • 14. Object Storage enables reliability Region “West”
  • 15. Object Storage also enables other applications Region “West” •  DropBox •  Static Content Object  Store   •  Archival API  Servers   Object  Storage  Technology  
  • 16. Object Storage characteristics •  Highly reliable and durable –  99.9 % availability for AWS S3 –  99.999999999 % durability •  Massive scale –  1.3 trillion objects stored across 7 AWS regions [Nov 2012 figures] –  Throughput: 830,000 requests per second •  Immutable objects –  Objects cannot be modified, only deleted •  Simple API –  PUT/POST objects, GET objects, DELETE objects –  No seek / no mutation / no POSIX API •  Flat namespace –  Everything stored in buckets. –  Bucket names are unique –  Buckets can only contain objects, not other buckets •  Cheap and getting cheaper
  • 17. CloudStack S3 API Server S3     API  Servers   MySQL Object  Storage  Technology  
  • 18. CloudStack S3 API Server •  Understands AWS S3 REST-style and SOAP API •  Pluggable backend –  Backend storage needs to map simple calls to their API •  E.g., createContainer, saveObject, loadObject! –  Default backend is a POSIX filesystem –  Backend with Caringo Object Store (commercial vendor) available –  HDFS backend also available •  MySQL storage –  Bucket -> object mapping –  ACLs, bucket policies
  • 19. Object Store Integration into CloudStack •  For images and snapshots •  Replacement for NFS secondary storage Or Augmentation for NFS secondary storage •  Integrations available with –  Riak CS –  Openstack Swift •  New in 4.2 (upcoming): –  Framework for integrating storage providers
  • 20. What do we want to build ? •  Open source, ASL licensed object storage •  Scales to at least 1 billion objects •  Reliability and durability on par with S3 •  S3 API (or similar, e.g., Google Storage) •  Tooling around maintenance and operation, specific to object storage
  • 21. The following slides are a design discussion
  • 22. Architecture of Scalable Object Storage Auth  Servers   Object   Lookup   Servers   API  Servers   Object  Servers   Replicators/Auditors  
  • 23. Why HDFS •  ASF Project (Apache Hadoop) •  Immutable objects, replication •  Reliability, scale and performance –  200 million objects in 1 cluster [Facebook] –  100 PB in 1 cluster [Facebook] •  Simple operation –  Just add data nodes
  • 24. HDFS-based Object Storage S3  Auth  Servers   Namenode   pair   S3  API  Servers   HDFS  API   Data  nodes  
  • 25. BUT •  Name Node Scalability –  150 bytes RAM / block –  GC issues •  Name Node SPOF –  Being addressed in the community✔ •  Cross-zone replication –  Rack-awareness placement ✔ –  What if the zones are spread a little further apart? •  Storage for object metadata –  ACLs, policies, timers
  • 26. Name Node scalability •  1 billion objects = 3 billion blocks (chunks) –  Average of 5 MB/object = 5 PB (actual), 15 PB (raw) –  450 GB of RAM per Name Node •  150b x 3 x 10^9 –  16 TB / node => 1000 Data nodes •  Requires Name Node federation ? •  Or an approach like HAR files
  • 27. Name Node Federation Extension:  Federated  NameNodes  are  HA  pairs  
  • 28. Federation issues •  HA for name nodes •  Namespace shards –  Map object -> name node •  Requires another scalable key-value store –  HBase? •  Rebalancing between name nodes
  • 29. Replication over lossy/slower links A.  Asynchronous replication –  Use distcp to replicate between clusters –  6 copies vs. 3 –  Master/Slave relationship •  Possibility of loss of data during failover •  Need coordination logic outside of HDFS B.  Synchronous replication –  API server writes to 2 clusters and acks only when both writes are successful –  Availability compromised when one zone is down
  • 30. CAP Theorem Consistency or Availability during partition Many nuances
  • 31. Storage for object metadata A.  Store it in HDFS along with the object –  Reads are expensive (e.g., to check ACL) –  Mutable data, needs layer over HDFS B.  Use another storage system (e.g. HBase) –  Name node federation also requires this. C.  Modify Name Node to store metadata –  High performance –  Not extensible
  • 32. Object store on HDFS Future •  Viable for small-sized deployments –  Up to 100-200 million objects –  Datacenters close together •  Larger deployments needs development –  No effort ongoing at this time
  • 33. Conclusion •  CloudStack needs object storage for “cloud-style” workloads •  Object Storage is not easy •  HDFS comes close but not close enough •  Join the community!