SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Survey of Distributed Storage System frankey0207@gmail.com
Outline Background Storage Virtualization Object Storage Distributed File System
Outline Background Storage Virtualization Object Storage Distributed File System
Background As more and more digital devices(e.g. PC, laptop, ipad and smart phone) connect to the Internet, massive amount of new data are created on the web There were 5 exabytes of data online in 2002, which had risen to 281 exabytes in 2009, and the online data growth rate is rising faster than Moore's Law Then, how to store and manage these massive data effectively and efficiently ? An natural approach: Distributed  	Storage System!
Traditional Storage Architecture Direct Attached Storage(DAS) - huge management burden  	- limited number of connected host 	- severely limited data sharing Fabric Attached Storage - central system serves data to 	   connected hosts 	- hosts and devices interconnected 	   through Ethernet or Fibre Channel 	- NAS & SAN
FAS Implementations Network Attached Storage(NAS) - file-based storage architecture 	- data sharing across platforms 	- file sever can be the bottleneck Storage Area Networks(SAN) 	- scalable performance, high          capacity 	- limited ability of sharing data 	- unreliable security Since the traditional storage architectures can not satisfy the emerging requirement well, novel  approaches need to be proposed !
Outline Background Storage Virtualization Object Storage Distributed File System
Storage Virtualization Definitions of storage virtualization by SNIA - the act of abstracting, hiding, or isolating the internal functions of a storage (sub)system or service from applications, computer servers, or general network resources for the purposes of enabling application and network independent management of storage or data - The application of virtualization to storage services or     devices for the purpose of aggregating, hiding complexity, or adding new capabilities to lower-level storage resources Simply speaking, storage virtualization aggregates storage components, such as disks, controllers, and storage networks, in a coordinated way to share them more efficiently among the applications it serves!
Charactristics of ideal solution A good storage virtualization solution should: Enhance the storage resources it is virtualizing through the aggregation of services to increase the return of existing assets Not add another level of complexity in configuration and management Improve performance rather than act as a bottleneck in order for it to be scalable. Scalability is the capability of a system to maintain performance linearly as new resources (typically hardware) are added Provide secure multi-tenancy so that users and data can share virtual resources without exposure to other users’ bad behavior or mistakes Not be proprietary, but virtualize other vendor storage in the same way as its own storage to make the management seamless.
Types of Storage Virtualization Modern storage virtualization technologies can be implemented in three layers of the infrastructure In the server, some of the earliest forms of storage virtualization came from within the server’s operating systems In the storage network, network-based storage virtualization embeds the intelligence of managing the storage resources in the network layer In the storage controller, controller-based storage virtualization allows external storage to appear as if it’s internal
Server-based ,[object Object],in the system software. ,[object Object]
It does not require additional hardware in the storage infrastructure, and works with any devices that can be seen by the operating system.
Although it helps maximize the efficiency and resilience of storage resources, it’s optimized on a per-server basis only.
The task of mirroring, striping, and calculating parity requires additional processing, taking valuable CPU and memory resources away from the application.
Since every operating system implements file systems and volume management in different ways, organizations with multiple IT vendors need to maintain different skill sets and processes, with higher costs.
When it comes to the migration or replication of data (either locally or remotely) it becomes difficult to keep track of data protection across the entire environment.,[object Object]
Perform replication between non-like devices.
Provide a single management interface.Only the in-band approach can cache data for increased performance. Both approaches also suffer from a number of drawbacks: ,[object Object]
The virtualization devices are typically servers running system software and requiring as much maintenance as a regular server.
The I/O can suffer from latency, impacting performance and scalability due to the multiple steps required to complete the request, and limited to the amount of memory and CPU available in the appliance nodes.
Decoupling the virtualization from the storage once it has been implemented is impossible because all the meta-data resides in the appliance, thereby making it proprietary.
Solutions on the market only exist for fibre channel (FC) based SANs.,[object Object]
Complexity is reduced as it needs no additional hardware to extend the benefits of virtualization. In many cases the requirement for SAN hardware is greatly reduced.
Controller-based virtualization is typically cheaper than other approaches due to the ability to leverage existing SAN infrastructure, and the opportunity to consolidatemanagement, replication, and availability tools. ,[object Object]
Heterogeneous data replication between non-like vendors or different storage classes reduces data protection costs.
Interoperability issues are reduced as the virtualized controller mimics a server connection to external storage.Although a few downsides to controller-based virtualization exist, the advantages not only far outweigh them but they also address most of the deficiencies found in server- and network based approaches.
Outline Background Storage Virtualization Object Storage Distributed File System
Motivation of Object Storage Improved device and data sharing - platform-dependent metadata moved to device Improved scalability & security - devices directly handle client requests 	- object security Improved performance - data types can be differentiated at the device Improved storage management - self-managed, policy-driven storage - storage devices become more autonomous
Objects in Storage The root object -- The OSD itself User object -- Created by SCSI commands from the  application or client Collection object -- A group of user objects, such as all .mp3 Partition object -- Containers that share common security and  space managementcharacteristics P4 P3 P2 OSD P1 Root Object (one per device) Partition Objects U1 User Data Collection Objects Metadata Attributes User Objects(for user data) Object ID
Object Storage Device Two changes 	- Object-based storage offloads         the storage component to the         storage device 	- The device interface changes         from blocks to objects Applications Applications System call interface System call interface File system user component File system user component File system storage component Object interface File system storage component Block interface Block I/O manager Block I/O manager Storage device Storage device Traditional model OSD model
Object Storage Architecture Summary of OSD Key Benefits ■ Better data sharing – Using objects means less metadata to keep coherent, which makes it possible to share the data across different platforms. ■ Better security – Unlike blocks, objects can protect themselves and authorize each I/O. ■ More intelligence – Object attributes help the storage devices learn about its users, the applications and the workloads. This leads to a variety of improvements, such as better data management through caching. Active disks can be implemented on OSDs to implement database filters. An intelligent OSD can also continuously reorganize the data, manage its own backups and deal with failures.
Lustre Lustre (Linux + Cluster) - first open sourced system with object storage  	- a massively parallel distributed file system 	- consist of clients, MDS and OST 	- used by fifteen of the top 30 supercomputers in the world A single metadata server (MDS) that has a single metadata target (MDT) per Lustrefilesystem that stores namespace metadata, such as filenames, directories, access permissions, and file layout.  Client(s)that access and use the data, concurrent and coherent read and write access to the files are allowed One or more object storage servers (OSSes) that store file data on one or more object storage targets (OSTs)
Ceph Ceph is a distributed file system that provides excellent performance, reliability, and scalability based on object storage devices Metadata Cluster store the cluster map and control the data placement, higher-level POSIX functions (such as open, close, and rename)  are managed.
Panasas Panasas (Panasas, Inc.) 	- consist of OSD, Panasas File  	   System, MDS 	- claim to be the world's fastest  	   HPC storage system

Weitere ähnliche Inhalte

Was ist angesagt?

Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsSeagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsRed_Hat_Storage
 
Hedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for DockerHedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for DockerEric Carter
 
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)HTS Hosting
 
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatThe Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatOpenStack
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraDataStax Academy
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red_Hat_Storage
 
Storage area network
Storage area networkStorage area network
Storage area networkNeha Agarwal
 
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red_Hat_Storage
 
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
Implementation of Dense Storage Utilizing  HDDs with SSDs and PCIe Flash  Acc...Implementation of Dense Storage Utilizing  HDDs with SSDs and PCIe Flash  Acc...
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...Red_Hat_Storage
 
Introducing StorNext5 and Lattus
Introducing StorNext5 and LattusIntroducing StorNext5 and Lattus
Introducing StorNext5 and Lattusinside-BigData.com
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3Alluxio, Inc.
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed_Hat_Storage
 
Home For Gypsies – Storage for NoSQL Databases​
Home For Gypsies – Storage for NoSQL Databases​Home For Gypsies – Storage for NoSQL Databases​
Home For Gypsies – Storage for NoSQL Databases​Atish Kathpal
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red_Hat_Storage
 
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application Red_Hat_Storage
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationAlluxio, Inc.
 

Was ist angesagt? (20)

Ceph c01
Ceph c01Ceph c01
Ceph c01
 
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsSeagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
 
Hedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for DockerHedvig & ClusterHQ - Persistent, portable storage for Docker
Hedvig & ClusterHQ - Persistent, portable storage for Docker
 
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
OpenStack and Red Hat: How we learned to adapt with our customers in a maturi...
 
Software-Defined Storage (SDS)
Software-Defined Storage (SDS)Software-Defined Storage (SDS)
Software-Defined Storage (SDS)
 
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red HatThe Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
 
AltaVault
AltaVaultAltaVault
AltaVault
 
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with CassandraCisco UCS Integrated Infrastructure for Big Data with Cassandra
Cisco UCS Integrated Infrastructure for Big Data with Cassandra
 
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...
 
Storage area network
Storage area networkStorage area network
Storage area network
 
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
Red Hat Storage Day Seattle: Stretching A Gluster Cluster for Resilient Messa...
 
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
Implementation of Dense Storage Utilizing  HDDs with SSDs and PCIe Flash  Acc...Implementation of Dense Storage Utilizing  HDDs with SSDs and PCIe Flash  Acc...
Implementation of Dense Storage Utilizing HDDs with SSDs and PCIe Flash Acc...
 
Introducing StorNext5 and Lattus
Introducing StorNext5 and LattusIntroducing StorNext5 and Lattus
Introducing StorNext5 and Lattus
 
What's New in Alluxio 2.3
What's New in Alluxio 2.3What's New in Alluxio 2.3
What's New in Alluxio 2.3
 
Red Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and FutureRed Hat Ceph Storage: Past, Present and Future
Red Hat Ceph Storage: Past, Present and Future
 
Home For Gypsies – Storage for NoSQL Databases​
Home For Gypsies – Storage for NoSQL Databases​Home For Gypsies – Storage for NoSQL Databases​
Home For Gypsies – Storage for NoSQL Databases​
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
MyCloud for $100k
MyCloud for $100kMyCloud for $100k
MyCloud for $100k
 
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
 
Introducing the Hub for Data Orchestration
Introducing the Hub for Data OrchestrationIntroducing the Hub for Data Orchestration
Introducing the Hub for Data Orchestration
 

Andere mochten auch

Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon Nexus, Inc.
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Micah Altman
 
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Phil Cryer
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingTakuma Wakamori
 
DumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionDumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionNuno Loureiro
 
Distribute Storage System May-2014
Distribute Storage System May-2014Distribute Storage System May-2014
Distribute Storage System May-2014Công Lợi Dương
 
Identity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeIdentity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeVenkatesh Devam ☁
 
Use Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierUse Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierManfred Furuholmen
 
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...Gluster.org
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Data Con LA
 
Strategies for Distributed Data Storage
Strategies for Distributed Data StorageStrategies for Distributed Data Storage
Strategies for Distributed Data Storagekakugawa
 
Energy storage technologies
Energy storage technologiesEnergy storage technologies
Energy storage technologiessrikanth reddy
 

Andere mochten auch (14)

Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage System
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks
 
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...Clustered and distributed
 storage with
 commodity hardware 
and open source ...
Clustered and distributed
 storage with
 commodity hardware 
and open source ...
 
ICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and ProcessingICDE2015 Research 3: Distributed Storage and Processing
ICDE2015 Research 3: Distributed Storage and Processing
 
DumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage SolutionDumpFS - A Distributed Storage Solution
DumpFS - A Distributed Storage Solution
 
Distribute Storage System May-2014
Distribute Storage System May-2014Distribute Storage System May-2014
Distribute Storage System May-2014
 
Integrated Distributed Solar and Storage
Integrated Distributed Solar and StorageIntegrated Distributed Solar and Storage
Integrated Distributed Solar and Storage
 
Identity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage SchemeIdentity Based Secure Distributed Storage Scheme
Identity Based Secure Distributed Storage Scheme
 
Use Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage TierUse Distributed Filesystem as a Storage Tier
Use Distributed Filesystem as a Storage Tier
 
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
Deploying pNFS over Distributed File Storage w/ Jiffin Tony Thottan and Niels...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Strategies for Distributed Data Storage
Strategies for Distributed Data StorageStrategies for Distributed Data Storage
Strategies for Distributed Data Storage
 
Energy storage technologies
Energy storage technologiesEnergy storage technologies
Energy storage technologies
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 

Ähnlich wie Survey of distributed storage system

Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesSudarshan Dhondaley
 
final-unit-ii-cc-cloud computing-2022.pdf
final-unit-ii-cc-cloud computing-2022.pdffinal-unit-ii-cc-cloud computing-2022.pdf
final-unit-ii-cc-cloud computing-2022.pdfSamiksha880257
 
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
Introduction to Enterprise Data Storage,  Direct Attached Storage, Storage Ar...Introduction to Enterprise Data Storage,  Direct Attached Storage, Storage Ar...
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...ssuserec8a711
 
A cloud environment for backup and data storage
A cloud environment for backup and data storageA cloud environment for backup and data storage
A cloud environment for backup and data storageIGEEKS TECHNOLOGIES
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageIGEEKS TECHNOLOGIES
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptxVMahesh5
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
 
Research Paper  Find a peer reviewed article in the following dat.docx
Research Paper  Find a peer reviewed article in the following dat.docxResearch Paper  Find a peer reviewed article in the following dat.docx
Research Paper  Find a peer reviewed article in the following dat.docxaudeleypearl
 
IRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET Journal
 
What is a Network-Attached-Storage device and how does it work?
What is a Network-Attached-Storage device and how does it work?What is a Network-Attached-Storage device and how does it work?
What is a Network-Attached-Storage device and how does it work?MaryJWilliams2
 
What is Network Attached Storage Used for?.pdf
What is Network Attached Storage Used for?.pdfWhat is Network Attached Storage Used for?.pdf
What is Network Attached Storage Used for?.pdfEnterprisenas
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 
Compare Array vs Host vs Hypervisor vs Network-Based Replication
Compare Array vs Host vs Hypervisor vs Network-Based ReplicationCompare Array vs Host vs Hypervisor vs Network-Based Replication
Compare Array vs Host vs Hypervisor vs Network-Based ReplicationMaryJWilliams2
 
Cloud Computing MechanismsChapter 7 – InfrastructureChapter .docx
Cloud Computing MechanismsChapter 7 – InfrastructureChapter .docxCloud Computing MechanismsChapter 7 – InfrastructureChapter .docx
Cloud Computing MechanismsChapter 7 – InfrastructureChapter .docxmary772
 
Research Paper  Find a peer reviewed article in the following d.docx
Research Paper  Find a peer reviewed article in the following d.docxResearch Paper  Find a peer reviewed article in the following d.docx
Research Paper  Find a peer reviewed article in the following d.docxeleanorg1
 
ParaScale Cloud Storage Customer overview presentation
ParaScale Cloud Storage Customer overview presentationParaScale Cloud Storage Customer overview presentation
ParaScale Cloud Storage Customer overview presentationParaScale Marketing
 

Ähnlich wie Survey of distributed storage system (20)

Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 Notes
 
final-unit-ii-cc-cloud computing-2022.pdf
final-unit-ii-cc-cloud computing-2022.pdffinal-unit-ii-cc-cloud computing-2022.pdf
final-unit-ii-cc-cloud computing-2022.pdf
 
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
Introduction to Enterprise Data Storage,  Direct Attached Storage, Storage Ar...Introduction to Enterprise Data Storage,  Direct Attached Storage, Storage Ar...
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...
 
A cloud environment for backup and data storage
A cloud environment for backup and data storageA cloud environment for backup and data storage
A cloud environment for backup and data storage
 
A cloud enviroment for backup and data storage
A cloud enviroment for backup and data storageA cloud enviroment for backup and data storage
A cloud enviroment for backup and data storage
 
storage.pptx
storage.pptxstorage.pptx
storage.pptx
 
DFS PPT.pptx
DFS PPT.pptxDFS PPT.pptx
DFS PPT.pptx
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Research Paper  Find a peer reviewed article in the following dat.docx
Research Paper  Find a peer reviewed article in the following dat.docxResearch Paper  Find a peer reviewed article in the following dat.docx
Research Paper  Find a peer reviewed article in the following dat.docx
 
Challenges in Managing IT Infrastructure
Challenges in Managing IT InfrastructureChallenges in Managing IT Infrastructure
Challenges in Managing IT Infrastructure
 
IRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFSIRJET- Distributed Decentralized Data Storage using IPFS
IRJET- Distributed Decentralized Data Storage using IPFS
 
Mis cloud computing
Mis cloud computingMis cloud computing
Mis cloud computing
 
What is a Network-Attached-Storage device and how does it work?
What is a Network-Attached-Storage device and how does it work?What is a Network-Attached-Storage device and how does it work?
What is a Network-Attached-Storage device and how does it work?
 
What is Network Attached Storage Used for?.pdf
What is Network Attached Storage Used for?.pdfWhat is Network Attached Storage Used for?.pdf
What is Network Attached Storage Used for?.pdf
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
Compare Array vs Host vs Hypervisor vs Network-Based Replication
Compare Array vs Host vs Hypervisor vs Network-Based ReplicationCompare Array vs Host vs Hypervisor vs Network-Based Replication
Compare Array vs Host vs Hypervisor vs Network-Based Replication
 
Cloud Computing MechanismsChapter 7 – InfrastructureChapter .docx
Cloud Computing MechanismsChapter 7 – InfrastructureChapter .docxCloud Computing MechanismsChapter 7 – InfrastructureChapter .docx
Cloud Computing MechanismsChapter 7 – InfrastructureChapter .docx
 
Advanced DB chapter 2.pdf
Advanced DB chapter 2.pdfAdvanced DB chapter 2.pdf
Advanced DB chapter 2.pdf
 
Research Paper  Find a peer reviewed article in the following d.docx
Research Paper  Find a peer reviewed article in the following d.docxResearch Paper  Find a peer reviewed article in the following d.docx
Research Paper  Find a peer reviewed article in the following d.docx
 
ParaScale Cloud Storage Customer overview presentation
ParaScale Cloud Storage Customer overview presentationParaScale Cloud Storage Customer overview presentation
ParaScale Cloud Storage Customer overview presentation
 

Mehr von Zhichao Liang

微软Bot framework简介
微软Bot framework简介微软Bot framework简介
微软Bot framework简介Zhichao Liang
 
青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker RegistryZhichao Liang
 
开源Pass平台flynn功能简介
开源Pass平台flynn功能简介开源Pass平台flynn功能简介
开源Pass平台flynn功能简介Zhichao Liang
 
青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes 青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes Zhichao Liang
 
Introduction of own cloud
Introduction of own cloudIntroduction of own cloud
Introduction of own cloudZhichao Liang
 
Power drill列存储底层设计
Power drill列存储底层设计Power drill列存储底层设计
Power drill列存储底层设计Zhichao Liang
 
C store底层存储设计
C store底层存储设计C store底层存储设计
C store底层存储设计Zhichao Liang
 
Storage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsStorage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsZhichao Liang
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redisZhichao Liang
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structureZhichao Liang
 
A novel method to extend flash memory lifetime in flash based dbms
A novel method to extend flash memory lifetime in flash based dbmsA novel method to extend flash memory lifetime in flash based dbms
A novel method to extend flash memory lifetime in flash based dbmsZhichao Liang
 
Sub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseSub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseZhichao Liang
 
Hush…tell you something novel about flash memory
Hush…tell you something novel about flash memoryHush…tell you something novel about flash memory
Hush…tell you something novel about flash memoryZhichao Liang
 

Mehr von Zhichao Liang (14)

微软Bot framework简介
微软Bot framework简介微软Bot framework简介
微软Bot framework简介
 
青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry青云虚拟机部署私有Docker Registry
青云虚拟机部署私有Docker Registry
 
开源Pass平台flynn功能简介
开源Pass平台flynn功能简介开源Pass平台flynn功能简介
开源Pass平台flynn功能简介
 
青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes 青云CoreOS虚拟机部署kubernetes
青云CoreOS虚拟机部署kubernetes
 
Introduction of own cloud
Introduction of own cloudIntroduction of own cloud
Introduction of own cloud
 
Power drill列存储底层设计
Power drill列存储底层设计Power drill列存储底层设计
Power drill列存储底层设计
 
C store底层存储设计
C store底层存储设计C store底层存储设计
C store底层存储设计
 
Storage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System ImpactsStorage Class Memory: Technology Overview & System Impacts
Storage Class Memory: Technology Overview & System Impacts
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
 
Memcached简介
Memcached简介Memcached简介
Memcached简介
 
Some key value stores using log-structure
Some key value stores using log-structureSome key value stores using log-structure
Some key value stores using log-structure
 
A novel method to extend flash memory lifetime in flash based dbms
A novel method to extend flash memory lifetime in flash based dbmsA novel method to extend flash memory lifetime in flash based dbms
A novel method to extend flash memory lifetime in flash based dbms
 
Sub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based databaseSub join a query optimization algorithm for flash-based database
Sub join a query optimization algorithm for flash-based database
 
Hush…tell you something novel about flash memory
Hush…tell you something novel about flash memoryHush…tell you something novel about flash memory
Hush…tell you something novel about flash memory
 

Kürzlich hochgeladen

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 

Kürzlich hochgeladen (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 

Survey of distributed storage system

  • 1. Survey of Distributed Storage System frankey0207@gmail.com
  • 2. Outline Background Storage Virtualization Object Storage Distributed File System
  • 3. Outline Background Storage Virtualization Object Storage Distributed File System
  • 4. Background As more and more digital devices(e.g. PC, laptop, ipad and smart phone) connect to the Internet, massive amount of new data are created on the web There were 5 exabytes of data online in 2002, which had risen to 281 exabytes in 2009, and the online data growth rate is rising faster than Moore's Law Then, how to store and manage these massive data effectively and efficiently ? An natural approach: Distributed Storage System!
  • 5. Traditional Storage Architecture Direct Attached Storage(DAS) - huge management burden - limited number of connected host - severely limited data sharing Fabric Attached Storage - central system serves data to connected hosts - hosts and devices interconnected through Ethernet or Fibre Channel - NAS & SAN
  • 6. FAS Implementations Network Attached Storage(NAS) - file-based storage architecture - data sharing across platforms - file sever can be the bottleneck Storage Area Networks(SAN) - scalable performance, high capacity - limited ability of sharing data - unreliable security Since the traditional storage architectures can not satisfy the emerging requirement well, novel approaches need to be proposed !
  • 7. Outline Background Storage Virtualization Object Storage Distributed File System
  • 8. Storage Virtualization Definitions of storage virtualization by SNIA - the act of abstracting, hiding, or isolating the internal functions of a storage (sub)system or service from applications, computer servers, or general network resources for the purposes of enabling application and network independent management of storage or data - The application of virtualization to storage services or devices for the purpose of aggregating, hiding complexity, or adding new capabilities to lower-level storage resources Simply speaking, storage virtualization aggregates storage components, such as disks, controllers, and storage networks, in a coordinated way to share them more efficiently among the applications it serves!
  • 9. Charactristics of ideal solution A good storage virtualization solution should: Enhance the storage resources it is virtualizing through the aggregation of services to increase the return of existing assets Not add another level of complexity in configuration and management Improve performance rather than act as a bottleneck in order for it to be scalable. Scalability is the capability of a system to maintain performance linearly as new resources (typically hardware) are added Provide secure multi-tenancy so that users and data can share virtual resources without exposure to other users’ bad behavior or mistakes Not be proprietary, but virtualize other vendor storage in the same way as its own storage to make the management seamless.
  • 10. Types of Storage Virtualization Modern storage virtualization technologies can be implemented in three layers of the infrastructure In the server, some of the earliest forms of storage virtualization came from within the server’s operating systems In the storage network, network-based storage virtualization embeds the intelligence of managing the storage resources in the network layer In the storage controller, controller-based storage virtualization allows external storage to appear as if it’s internal
  • 11.
  • 12. It does not require additional hardware in the storage infrastructure, and works with any devices that can be seen by the operating system.
  • 13. Although it helps maximize the efficiency and resilience of storage resources, it’s optimized on a per-server basis only.
  • 14. The task of mirroring, striping, and calculating parity requires additional processing, taking valuable CPU and memory resources away from the application.
  • 15. Since every operating system implements file systems and volume management in different ways, organizations with multiple IT vendors need to maintain different skill sets and processes, with higher costs.
  • 16.
  • 17. Perform replication between non-like devices.
  • 18.
  • 19. The virtualization devices are typically servers running system software and requiring as much maintenance as a regular server.
  • 20. The I/O can suffer from latency, impacting performance and scalability due to the multiple steps required to complete the request, and limited to the amount of memory and CPU available in the appliance nodes.
  • 21. Decoupling the virtualization from the storage once it has been implemented is impossible because all the meta-data resides in the appliance, thereby making it proprietary.
  • 22.
  • 23. Complexity is reduced as it needs no additional hardware to extend the benefits of virtualization. In many cases the requirement for SAN hardware is greatly reduced.
  • 24.
  • 25. Heterogeneous data replication between non-like vendors or different storage classes reduces data protection costs.
  • 26. Interoperability issues are reduced as the virtualized controller mimics a server connection to external storage.Although a few downsides to controller-based virtualization exist, the advantages not only far outweigh them but they also address most of the deficiencies found in server- and network based approaches.
  • 27. Outline Background Storage Virtualization Object Storage Distributed File System
  • 28. Motivation of Object Storage Improved device and data sharing - platform-dependent metadata moved to device Improved scalability & security - devices directly handle client requests - object security Improved performance - data types can be differentiated at the device Improved storage management - self-managed, policy-driven storage - storage devices become more autonomous
  • 29. Objects in Storage The root object -- The OSD itself User object -- Created by SCSI commands from the application or client Collection object -- A group of user objects, such as all .mp3 Partition object -- Containers that share common security and space managementcharacteristics P4 P3 P2 OSD P1 Root Object (one per device) Partition Objects U1 User Data Collection Objects Metadata Attributes User Objects(for user data) Object ID
  • 30. Object Storage Device Two changes - Object-based storage offloads the storage component to the storage device - The device interface changes from blocks to objects Applications Applications System call interface System call interface File system user component File system user component File system storage component Object interface File system storage component Block interface Block I/O manager Block I/O manager Storage device Storage device Traditional model OSD model
  • 31. Object Storage Architecture Summary of OSD Key Benefits ■ Better data sharing – Using objects means less metadata to keep coherent, which makes it possible to share the data across different platforms. ■ Better security – Unlike blocks, objects can protect themselves and authorize each I/O. ■ More intelligence – Object attributes help the storage devices learn about its users, the applications and the workloads. This leads to a variety of improvements, such as better data management through caching. Active disks can be implemented on OSDs to implement database filters. An intelligent OSD can also continuously reorganize the data, manage its own backups and deal with failures.
  • 32. Lustre Lustre (Linux + Cluster) - first open sourced system with object storage - a massively parallel distributed file system - consist of clients, MDS and OST - used by fifteen of the top 30 supercomputers in the world A single metadata server (MDS) that has a single metadata target (MDT) per Lustrefilesystem that stores namespace metadata, such as filenames, directories, access permissions, and file layout. Client(s)that access and use the data, concurrent and coherent read and write access to the files are allowed One or more object storage servers (OSSes) that store file data on one or more object storage targets (OSTs)
  • 33. Ceph Ceph is a distributed file system that provides excellent performance, reliability, and scalability based on object storage devices Metadata Cluster store the cluster map and control the data placement, higher-level POSIX functions (such as open, close, and rename) are managed.
  • 34. Panasas Panasas (Panasas, Inc.) - consist of OSD, Panasas File System, MDS - claim to be the world's fastest HPC storage system
  • 35. Outline Background Storage Virtualization Object Storage Distributed File System
  • 36. Distributed File System A distributed file system or network file system is any file system that allows access to files from multiple hosts sharing via a computer network(Wikipedia) The history - 1st generation(1980s): NFS, AFS - 2nd generation(1990~1995): Tiger Shark, Slice File System - 3rd generation(1995~2000): Global File System, General Parallel File System, DiFFs, CXFS, HighRoad - 4th generation(2000~now): Lustre, GFSm, GlusterFS, HDFS Performance Scalability Reliability Availability Fault-tolerant
  • 37. Google File System(GFS) GFS is a scalable distributed file system for large distributed data-intensive application in Google Beyond the traditional choices - normal component failures - huge files by traditional standards - appending new data rather than overwriting - co-designing the application and file system API GFS Interface - create, delete, open, close, read, write - snapshot & record append Master maintains all file system metadata, such as namespace, access control information, mapping from files to chunks and the location of chunks Clients interact with the master for metadata operations, but all data-bearing communication goes directly to the chunkservers Files are divided into fix-size(64MB) chunks, and each chunk is identified by immutable and global unique 64 bit chunk handle. Chunkservers store chunks on local disks as Linux files. In addition, each chunk is replicated on multiple chunkservers, in default, 3 replicas.
  • 38. The client sends a write request to the primary once all the replicas have acknowledged receiving the data. The primary assigns consecutive serial numbers to all the mutations it receives and applies the mutation to its own local state in serial number order. Write Control and Data Flow The client asks the master which chunkserver holds the current lease for the chunk and the locations of the other replicas. If no one has, the master grants one to a replica it chooses. Error cases: Failed at the primary, it would not have been assigned a serial number and forwarded; Succeeded at primary and an arbitrary subset of the secondary replicas. The client code handles such errors by retrying the failed mutation. The primary forwards the write request to secondary replicas The client pushes the data to all replicas in any order. The master replies with the identity of primary and the locations of the other replicas. The client caches the information. The primary replies to the client. The secondaries all reply to the primary indicating that they have completed the operation.
  • 39. Hadoop Distributed File System (HDFS) NameNode, a master server that manages the file system namespace and regulates access to files by clients. The Hadoop Distributed File System (HDFS) is an open source implementation of GFS DataNodes, manage storage attached to the nodes that they run on A file is split into one or more blocks and these blocks are stored in a set of DataNodes
  • 40. Taobao File System Taobao File System(TFS) is a distributed file system optimized for the management of massive small files(1MB), such as pictures and descriptions of commodity Application/Client: access the name server & data server through TFSClient Name Sever: store metadata, monitor data server through heartbeat message, control IO balance, and data location info such <block id, data server> Data Sever: store application data, load blance, redundant backup
  • 41.
  • 42. effective distribution of data, file distribution is intelligently handled using elastic hash
  • 43.
  • 44. Sheepdog Automatically detect removed nodes Sheepdog is a distributed storage system for QEMU/KVM - amazon EBS-like volume pool - highly scalable, available and reliable - support for advanced volume management - not general file system, API is designed specific to QEMU Zero configuration about cluster nodes Automatically detect added nodes
  • 45. Sheepdog Volumes are divided into 4 MB objects and each object is identified by globally unique 64 bit id, and replicated to multiple nodes Consistent hashing is used to decide which node to store objects. Each node is also placed on the ring.Addition or removal of nodes does not significantly change the mapping of objects
  • 46. Reference [1] A. D. Luca and M. Bhide. Storage virtualization for dummies, Hitachi Data Systems Edition. Wiley Publishing, 2010. [2] S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles 2003, SOSP 2003, Bolton Landing, NY, USA, October 19-22, 2003. [3] R. MacManus. The coming data explosion. Available: http://www.readwriteweb.com/archives/the_coming_data_explosion.php, 2010.
  • 47. Reference (cont.) [4] Intel white paper: Object-based storage, the next wave of storage technology and devices, 2003. [5] M. Mesnier, G. R. Ganger and E. Riedel. Object-based storage. IEEE Communications Magazine, August 2003, 84-89. [6] Lustre. Available: http://wiki.lustre.org/index.php, 2010. [7] Panasas. Available: http://www.panasas.com/. [8] Hadoop. Available: http://hadoop.apache.org/. [9] tfs. Available: http://code.taobao.org/trac/tfs/wiki/intro. [10] GlusterFS. Available: http://www.gluster.org/.
  • 48. Reference (cont.) [11] Sheep dog. Available: http://www.osrg.net/sheepdog/. [12] Ceph. Available: http://ceph.newdream.net/. [13] S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long. Ceph: A Scalable, High-Performance Distributed File System. In Proceedings of 7th Symposium on Operating Systems Design and Implementation (OSDI '06), November 6-8, Seattle, WA, USA. [14] Gluster Whitepaper: Gluster file system architecture.