2. CLOUD STORAGE
ABSTRACT
Cloud Storage. The incredible growth of data drives not only the need for information
governance but also the development of cloud technologies. Cloud computing and storage
technologies, which provide on-demand, shared and network-access to data, are well-
positioned to address the needs for storage capacity, availability, performance, and services.
The benefits of cloud, as compared with traditional storage systems, are speed and agility, cost
competitiveness due to the economies of scale and the pay-as-you-go usage. According to the
IDC report, by 2015, nearly 20% of all digital information will be "touched" by the cloud, and
10% will be maintained by the cloud.
INTRODUCTION
Cloud storage is a model of data storage where the digital data is stored in logical pools, the
physical storage spans multiple servers (and often locations), and the physical environment is
typically owned and managed by a hosting company. These cloud storage providers are
responsible for keeping the data available and accessible, and the physical environment
protected and running. People and organizations buy or lease storage capacity from the
providers to store user, organization, or application data.
Cloud storage services may be accessed through a co-located cloud computer service, a web
service application programming interface (API) or by applications that utilize the API, such as
cloud desktop storage, a cloud storage gateway or Web-based content management systems.
Cloud storage typically refers to a hosted object storage service, but the term has broadened to
include other types of data storage that are now available as a service, like block storage.
Cloud storage is:
Made up of many distributed resources, but still acts as one - often referred to as
federated storage clouds
Highly fault tolerant through redundancy and distribution of data
Highly durable through the creation of versioned copies
Typically eventually consistent with regard to data replicas.
3. Advantages
Companies need only pay for the storage they actually use, typically an average of consumption
during a month. This does not mean that cloud storage is less expensive, only that it incurs
operating expenses rather than capital expenses.
Organizations can choose between off-premises and on-premises cloud storage options, or a
mixture of the two options, depending on relevant decision criteria that is complementary to
initial direct cost savings potential; for instance, continuity of operations (COOP), disaster
recovery (DR), security (PII, HIPAA, SARBOX, IA/CND), and records retention laws,
regulations, and policies.
Storage availability and data protection is intrinsic to object storage architecture, so depending
on the application, the additional technology, effort and cost to add availability and protection
can be eliminated.
Storage maintenance tasks, such as purchasing additional storage capacity, are offloaded to
the responsibility of a service provider.
Cloud storage provides users with immediate access to a broad range of resources and
applications hosted in the infrastructure of another organization via a web service interface.
Cloud storage can be used for copying virtual machine images from the cloud to on-premises
locations or to import a virtual machine image from an on-premises location to the cloud image
library. In addition, cloud storage can be used to move virtual machine images between user
accounts or between data centers.
Cloud storage can be used as natural disaster proof backup, as normally there are 2 or 3
different backup servers located in different places around the globe.
Findings
Performance for outsourced storage is likely to be lower than local storage, depending on how
much a customer is willing to spend for WAN bandwidth.
Reliability and availability depends on wide area network availability and on the level of
precautions taken by the service provider. Reliability should be based on hardware as well as
various algorithms used.
4. Conclusions
Security of stored data and data in transit may be a concern when storing sensitive data at a
cloud storage provider
Users with specific records-keeping requirements, such as public agencies that must retain
electronic records according to statute, may encounter complications with using cloud
computing and storage. For instance, the U.S. Department of Defense designated the Defense
Information Systems Agency (DISA) to maintain a list of records management products that
meet all of the records retention, personally identifiable information (PII), and security
(Information Assurance; IA) requirements
Cloud storage is a rich resource for both hackers and national security agencies.
Piracy and copyright infringement may be enabled by sites that permit file sharing. For example,
the Codex Cloud ebook storage site has faced litigation from the owners of the intellectual
property uploaded and shared there, as have the Groove Shark and YouTube sites it has been
compared to.
The legal aspect, from a regulatory compliance standpoint, is of concern when storing files
domestically and especially internationally.
5. FILE SYSTEM
ABSTRACT
File system for cloud is a file system that allows many clients to have access to the same
data/file providing important operations (create, delete, modify, read, write). Each file may be
partitioned into several parts called chunks. Each chunk is stored in remote machines. Typically,
data is stored in files in a hierarchical tree where the nodes represent the directories. Hence, it
facilitates the parallel execution of applications. There are several ways to share files in a
distributed architecture. Each solution must be suitable for a certain type of application relying
on how complex is the application or how simple it is. Meanwhile, the security of the system
must be ensured. Confidentiality, availability and integrity are the main keys for a secure
system. Nowadays, users can share resources from any computer/device, anywhere and
everywhere through internet thanks to cloud computing which is typically characterized by
the scalable and elastic resources -such as physical servers, applications and any services that
are virtualized and allocated dynamically.
Thus, synchronization is required to make sure that all devices are update. Distributed file
systems enable also many big, medium and small enterprises to store and access their remote
data exactly as they do locally, facilitating the use of variable resources.
INTRODUCTION
Today, there are many implementations of distributed file systems. The first file servers were
developed by researchers in the 1970s, and the Sun's Network File System were disposable in
the early 1980. Before that, people who wanted to share files used the sneakernet method.
Once the computer networks start to progress, it became obvious that the existing file systems
had a lot of limitations and were unsuitable for multi-user environments. At the beginning, many
users started to use FTP to share files. It started running on the PDP-10 in the end of 1973.
Even with FTP, files needed to be copied from the source computer onto a server and also from
the server onto the destination computer. And that force the users to know the physical
addresses of all computers concerned by the file sharing.
Supporting techniques
Cloud computing use important techniques to enforce the performance of all the system.
Modern Data centers provide a huge environment with data center networking (DCN) and
consisting of big number of computers characterized by different capacity of storage. Map
Reduce framework had shown its performance with Data-intensive computing applications in a
parallel and distributed system. Moreover, virtualization technique has been employed to
6. provide dynamic resource allocation and allowing multiple operating systems to coexist on the
same physical server.
Applications
As cloud computing provides a large-scale computing thanks to its ability of providing to the
user the needful CPU and storage resources with a complete transparency, it makes it very
suitable to different types of applications that require a large-scale distributed processing. That
kind of Data-intensive computing needs a high performance file system that can share data
between VMs (Virtual machine).
Designing of File System
GFS and HDFS are specifically built for handling batch processing on very large data sets. For
that, the following hypotheses must be taken into account:
High availability: the cluster can contain thousands of file servers and some of them can be
down at any time
A servers belongs to a rack, a room, a data center, a country and a continent in order to
precisely identify its geographical location
The size of file can vary from many gigabytes to many terabytes. The file system should be
able to support a massive number of files
Need to support append operations and allow file contents to be visible even while a file is
being written
Communication is reliable among working machines: TCP/IP is used with a remote
procedure call RPC communication abstraction. TCP allows the client to know almost
immediately that there is a problem and it can try to set up a new connection.
Load balancing
Load balancing is essential for efficient operations in distributed environments. It means
distributing the amount of work to do between different servers in order to get more work done
in the same amount of time and serve clients faster. In this case, consider a large-scale
distributed file system. The system contains N chunkservers in a cloud (N can be 1000, 10000,
or more), where a certain number of files are stored. Each file is split into several parts or
chunks of fixed size (for example 64 megabytes). The load of each chunkserver is proportional
to the number of chunks hosted by the server.In a load-balanced cloud, the resources can be
well used while maximizing the performance of MapReduce-based applications.
7. Load rebalancing
In a cloud computing environment, failure is the norm,[13][14]
and chunkservers may be upgraded,
replaced, and added in the system. Files can also be dynamically created, deleted, and
appended. That leads to load imbalance in a distributed file system, meaning that the file
chunks are not distributed equitably between the nodes.
In order to manage large number of chunkservers to work in collaboration, and solve the
problem of load balancing in distributed file systems, several approaches have been proposed
such as reallocating file chunks such that the chunks can be distributed to the system as
uniformly as possible while reducing the movement cost as much as possible.
Security keys
In cloud computing, the most important security concepts are confidentiality, availability and
integrity. In fact, confidentiality becomes indispensable in order to keep private data from being
disclosed and maintain privacy. In addition, integrity assures that data is not corrupted.
Confidentiality
Confidentiality means that data and computation tasks are confidential: neither the cloud
provider nor others clients could access to data. Much research has been done about
confidentiality because it is one of the crucial points that still represents challenges for cloud
computing. The lack of trust toward the cloud providers is also a related issue.[41]
So the
infrastructure of the cloud must make assurance that all consumer's data will not be accessed
by any an unauthorized persons. The environment becomes unsecured if the service provider:
can locate consumer's data in the cloud
has the privilege to access and retrieve consumer's data
can understand the meaning of data (types of data, functionalities and interfaces of the
application and format of the data).
CONCLUSION
Cloud File system is an idea that has taken into consideration current events in cloud world
related to data storage as a sevice
It is a prediction of how infrastructure around cloud service and management has changed. This
model that will improve performance will enable seamless transitions across CDMI compliant
and non compliant clouds for large enterprises with very less hassle.