SlideShare a Scribd company logo
1 of 22
Download to read offline
Introduction to Object Storage and Hitachi
Content Platform
The Fundamentals of Hitachi Content Platform
DATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG
ON POWERFUL RELEVANT PERFORMANCE SOLUTION CLO
VIRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V
WHITEPAPER
By Hitachi Data Systems
May 2013
WHITE PAPER 2
Contents
Executive Summary	 3
Introduction	 4
Main Concepts and Features	 4
Object-Based Storage	 4
Object Structure	 4
Distributed Design	 6
Open Architecture	 6
Multitenancy	 7
Object Versioning	 7
Spin-Down and Storage Tiering	 7
Search	 8
Replication	 8
Common Use Cases	 8
Fixed-Content Archiving	 8
Backup-Free Data Protection and Content Preservation	 8
Cloud-Enabled Storage	 10
E-Discovery, Compliance and Metadata Analysis	 11
System Fundamentals	 12
Hardware Overview	 12
Software Overview	 14
System Organization	 14
Namespaces and Tenants	 15
Main Concepts	 15
User and Group Accounts	 17
System and Tenant Management	 17
Policies	 18
Content Management Services	 19
Conclusion	 20
WHITE PAPER 3
Introduction to Object Storage and Hitachi Content Platform
Executive Summary
One of IT’s greatest challenges today is an explosive, uncontrolled growth of unstructured data. Continual growth of
email and documents, video, Web pages, presentations, medical images, and so forth, increase both complexity and
risk. This effect is seen particularly in distributed IT environments, such as cloud service providers and organizations
with branch or remote office sites. The vast quantity of data being created, difficulties in management and proper
handling of unstructured content, and complexity of supporting more users and applications pose challenges to IT
departments. Organizations often end up with sprawling storage silos for a multitude of applications and workloads,
with few resources available to manage, govern, protect, and search the data.
Hitachi Data Systems provides an alternative solution to these challenges through a single object storage platform
that can be divided into virtual storage systems, each configured for the desired level of service. The great scale and
rich features of this solution help IT organizations in both private enterprises and cloud service providers manage
distributed IT environments. It helps them to control the flood of storage requirements for unstructured content and
addresses a variety of workloads.
WHITE PAPER 4
Introduction
Hitachi Content Platform (HCP) is a multipurpose distributed object-based storage system designed to sup-
port large-scale repositories of unstructured data. HCP enables IT organizations and cloud service providers to
store, protect, preserve and retrieve unstructured content with a single storage platform. It supports multiple levels
of service and readily evolves with technology and scale changes. With a vast array of data protection and content
preservation technologies, the system can significantly reduce or even eliminate tape-based self backups or backups
of edge devices connected to the platform. HCP obviates the need for a siloed approach to storing unstructured
content. Massive scale, multiple storage tiers, Hitachi reliability, nondisruptive hardware and software updates, mul-
titenancy and configurable attributes for each tenant allow the platform to support a wide range of applications on
a single physical HCP instance. By dividing the physical system into multiple, uniquely configured tenants, adminis-
trators create “virtual content platforms” that can be further subdivided into namespaces for further organization of
content, policies and access. With support for thousands of tenants, tens of thousands of namespaces, and pet-
abytes of capacity in one system, HCP is truly cloud-ready.
Main Concepts and Features
Object-Based Storage
Hitachi Content Platform, as a general-purpose object store, allows unstructured data files to be stored as objects.
An object is essentially a container that includes both file data and associated metadata that describes the data.
The objects are stored in a repository. The metadata is used to define the structure and administration of the data.
HCP can also leverage object metadata to apply specific management functions, such as storage tiering, to each
object. The objects have intelligence that enables them to automatically take advantage of advanced storage and
data management features to ensure proper placement and distribution of content.
HCP architecture isolates stored data from the hardware layer. Internally, ingested files are represented as objects
that encapsulate both the data and metadata required to support applications. Externally, HCP presents each object
either as a set of files in a standard directory structure or as a uniform resource locator (URL) accessible by users and
applications via HTTP/HTTPS.
HCP stores objects in a repository. Data that is ingested and stored in the repository is permanently associated with
the information about that data, called metadata. Each data object encapsulates both object data and metadata, and
is treated within HCP as a single unit for all intents and purposes.
Object Structure
An HCP repository object is composed of file data and the associated metadata, which in turn consists of system
metadata and, optionally, custom metadata and an access control list (ACL). The structure of the object is shown in
Figure 1.
File data is an exact digital copy of the actual file contents at the time of its ingestion. If the object is under retention,
it cannot be deleted before the expiration of its retention period, except when using a special privileged operation.
If versioning is enabled, multiple versions of a file can be retained. If appendable objects are enabled, data can be
appended to an object (with the CIFS or NFS protocols) without modifying the original fixed-content data.
WHITE PAPER 5
Figure 1. HCP Object
Metadata is system or user generated data that describes the fixed-content data of an object and defines the
object’s properties. System metadata, the system-managed properties of the object, includes HCP-specific meta-
data and POSIX metadata.
HCP-specific metadata includes the date and time the object was added to the namespace (ingest time), the
date and time the object was last changed (change time), the cryptographic hash value of the object along with the
namespace hash algorithm used to generate that value, and the protocol through which the object was ingested. It
also includes the object’s policy settings, such as data protection level (DPL), retention, shredding, indexing, and, for
HCP namespaces only, versioning.
POSIX metadata includes a user ID and group ID, a POSIX permissions value, and POSIX time attributes.
Custom metadata is optional, user-supplied descriptive information about a data object that is usually provided as
well-formed XML. It is typically intended for more detailed description of the object. This metadata can also be used
by future users and applications to understand and repurpose the object content. HCP supports multiple custom
metadata fields for each object.
ACL is optional, user-provided metadata containing a set of permissions granted to users or user groups to perform
operations on an object. The ACLs are supported only in HCP namespaces.
The complete metadata structure, as supported in HCP namespaces, is shown in Figure 2. It includes all metafiles
supported by HCP for objects, which were generated for the sample data structure (assuming that custom metadata
and ACLs were added for each object).
WHITE PAPER 6
Figure 2. HCP Namespace: Complete Metadata Structure
Distributed Design
An HCP system consists of both hardware and software and comprises many different components that are con-
nected together to form a robust, scalable architecture for object-based storage. HCP runs on an array of servers, or
nodes, that are networked together to form a single physical instance. Each node is a storage node. Storage nodes
store data objects. All runtime operations and physical storage, including data and metadata, are distributed among
the storage nodes. All objects in the repository are distributed across all available storage space but still presented as
files in a standard directory structure. Objects that are physically stored on any particular node are available from all
other nodes.
Open Architecture
HCP has an open architecture that insulates stored data from technology changes, as well as from changes in HCP
itself due to product enhancements. This open architecture ensures that users will have access to the data long after
it has been added to the repository. HCP acts as both a repository that can store customer data and an online
portal that enables access to that data by means of several industry-standard interfaces, as well as through an
integrated search facility, Hitachi Data Discovery Suite (HDDS). The HTTP or HTTPS, WebDAV, CIFS and NFS pro-
tocols support various operations. These operations include storing data, creating and viewing directories, viewing
and retrieving objects and their metadata, modifying object metadata, and deleting objects. Objects that were added
using any protocol are immediately accessible through any other supported protocol. These protocols can be used to
access the data with a Web browser, the HCP client tools, 3rd-party applications, Microsoft®
Windows®
Explorer, or
native Windows or UNIX tools.
HCP allows special-purpose access to the repository through the SMTP protocol, which is used only for storing
email. For data backup and restore, HCP supports the NDMP protocol.
WHITE PAPER 7
Multitenancy
Multitenancy support allows the repository in a single physical HCP instance to be partitioned into multiple name-
spaces. A namespace is a logical partition that contains a collection of objects particular to one or more applications.
Each namespace is a private object store that is represented by a separate directory structure and has a set of
independently configured attributes. Namespaces provide segregation of data, while tenants, or groupings of
namespaces, provide segregation of management. An HCP system can have up to 1,000 tenants. Each tenant and
its set of namespaces constitute a virtual HCP system that can be accessed and managed independently by users
and applications. This HCP feature is essential in enterprise, cloud and service-provider environments.
Data access to HCP namespaces can be either authenticated or nonauthenticated, depending on the type and
configuration of the access protocol. Authentication can be performed using HCP local accounts or Microsoft Active
Directory®
groups.
Object Versioning
HCP supports object versioning, which is the capability of a namespace to create, store and manage multiple
versions of objects in the HCP repository. This ability provides a history of how the data has changed over time.
Versioning facilitates storage and replication of evolving content, thereby creating new opportunities for HCP in
markets such as content depots and workflow applications.
Versioning is available in HCP namespaces and is configured at the namespace level. Versioning is only supported
with HTTP or REST. Other protocols cannot be enabled if versioning is enabled for the namespace. Versioning
applies only to objects, not to directories or symbolic links. A new version of an object is created when an object
with the same name and location as an existing object is added to the namespace. A special type of version, called
a deleted version, is created when an object is deleted. Updates to the object metadata affect only the current ver-
sion of an object and do not create new versions.
Previous versions of objects that are older than a specified amount of time can be automatically deleted, or pruned.
It is not possible to delete specific historical versions of an object; however, a user or application with appropriate
permissions can purge the object to delete all its versions, including the current one.
Spin-Down and Storage Tiering
HCP implements spin-down disk support as an early step towards the long-term goal of supporting information life-
cycle management (ILM) and intelligent objects. In the near term, the goal of the HCP spin-down feature is to take
advantage of the energy savings potential of the spin-down technology.
HCP spindown-capable storage is based on the power savings feature of Hitachi midrange storage systems and is a
core element of the new storage tiering functionality, which is implemented as an HCP service. According to storage
tiering strategy that is specified by customers, the storage tiering service identifies objects that are eligible to reside
on spin-down storage and moves them to and from the spin-down storage as needed.
Tiering selected content to spindown-enabled storage lowers overall cost by reducing energy consumption for
large-scale unstructured data storage, such as deep archives and disaster recovery sites. Storage tiering can very
effectively be used with customer-identified “dark data” (rarely accessed data) or data replicated for disaster recovery
by moving that data to spin-down storage some time after ingestion or replication. Customer sites where data pro-
tection is critical can use storage tiering to move all redundant data copies to spin-down storage, which makes the
cost of keeping data protection copies competitive to a tape solution.
Storage tiering also enables service providers to use a turnkey framework to offer differentiated object data man-
agement plans. This capability further enhances HCP as an attractive target for fixed content, especially for
archive-oriented use cases where tape may be considered an alternative.
WHITE PAPER 8
Search
HCP provides the only integrated metadata query engine on the market. HCP includes comprehensive search capa-
bilities that enable users to search for objects in namespaces, analyze namespace contents, and manipulate groups
of objects. To satisfy government requirements, HCP supports e-discovery for audits and litigation.
The metadata query engine is always available in any HCP system, but the content search facility requires installation
of a separate HDS product, Hitachi Data Discovery Suite.
Replication
Replication, an add-on feature to HCP, is the process that keeps selected tenants and namespaces in 2 or more
HCP systems in sync with each other. The replication service copies one or more tenants or namespaces from one
HCP system to another, propagating object creations, object deletions, and metadata changes. HCP also replicates
tenant and namespace configuration, tenant-level user accounts, compliance and tenant log messages, and reten-
tion classes.
The HCP system in which the objects are initially created is called the primary system. The 2nd system is called
the replica. Typically, the primary system and the replica are in separate geographic locations and connected by
a high-speed wide area network. HCP supports different replication topologies including many-to-one and chain
configurations.
Common Use Cases
Fixed-Content Archiving
Hitachi Content Platform is optimized for fixed-content data archiving. Fixed-content data is information that does
not change but must be kept available for future reference and be easily accessible when needed. A fixed-content
storage system is one in which the data cannot be modified. HCP uses “write-once, read-many” (WORM) storage
technology, and a variety of policies and services (such as retention, content verification and protection) to ensure the
integrity of data in the repository. The WORM storage means that data, once ingested into the repository, cannot be
updated or modified; that is, the data is guaranteed to remain unchanged from when it was originally stored. If the
versioning feature is enabled within the HCP system, different versions of the data can be stored and retrieved, in
which case each version is WORM.
Backup-Free Data Protection and Content Preservation
HCP is a true backup-free platform. HCP protects content without the need for backup. It uses sophisticated data
preservation technologies, such as configurable data and metadata protection levels (MDPL), object versioning and
change tracking, multisite replication with seamless application failover, and many others. HCP includes a variety of
features designed to protect integrity, provide privacy, and ensure availability and security of stored data. Below is a
summary of the key HCP data protection features:
■■ Content immutability. This intrinsic feature of HCP WORM storage design protects the integrity of the data in the
repository.
■■ Content verification. The content verification service maintains data integrity and protects against data corrup-
tion or tampering by ensuring that the data of each object matches its cryptographic hash value. Any violation is
repaired in a self-healing fashion.
■■ Scavenging. The scavenging service ensures that all objects in the repository have valid metadata. In case meta-
data is lost or corrupted, the service tries to reconstruct it by using the secondary, or scavenging, metadata (a
copy of the metadata stored with each copy of the object data).
WHITE PAPER 9
■■ Data encryption. HCP supports encryption at rest capability that allows seamless encryption of data on the physi-
cal volumes of the repository. This ensures data privacy by preventing unauthorized access to the stored data. The
encryption and decryption are handled automatically and transparently to users and applications.
■■ Versioning. HCP uses versioning to protect against accidental deletes and storing wrong copies of objects.
■■ Data availability.
■■ RAID protection. RAID storage technology provides efficient protection from simple disk failures. SAN-based
HCP systems typically use RAID-6 erasure coding protection to guard against dual drive failures.
■■ Multipathing and zero-copy failover. These features provide data availability in SAN-attached array of inde-
pendent nodes (SAIN) systems.
■■ Data protection level and protection service. In addition to using RAID and SAN technologies to provide
data integrity and availability, HCP can use software mirroring to store the data for each object in multiple loca-
tions on different nodes. HCP groups storage nodes into protection sets with the same number of nodes in
each set, and tries to store all the copies of the data for an object in a single protection set where each copy
is stored on a different node. The protection service enforces the required level of data redundancy by check-
ing and repairing protection sets. In case of violation, it creates additional copies or deletes extra copies of an
object to bring the object into compliance. If replication is enabled, the protection service can use an object
copy from a replica system if the copy on the primary system is unavailable.
■■ Metadata redundancy. In addition to the data redundancy as specified by DPL, HCP creates multiple copies
of the metadata for an object on different nodes. Metadata protection level or MDPL is a system-wide setting
that specifies the number of copies of the metadata that the HCP system must maintain (normally 2 copies,
MDPL2). Management of MDPL redundancy is independent of the management of data copies for DPL.
■■ Nondisruptive software and hardware upgrades. HCP employs a number of techniques that minimize or
eliminate any disruption of normal system functions during software and hardware upgrades. Nondisruptive
software upgrade (NDSU) is one of these techniques that includes greatly enhanced online upgrade support,
nondisruptive patch management, and online upgrade performance improvements. HCP supports media-free
and remote upgrades, HTTP or REST drain mode, and parallel operating system (OS) installation. It also sup-
ports automatic online upgrade commit, offline upgrade duration estimate, enhanced monitoring and email
alerts, and other features.
Storage nodes can be added to an HCP system without causing any downtime. HCP also supports nondisrup-
tive storage upgrades that allow online storage addition to SAIN systems without any data outage.
■■ Seamless application failover. This feature is supported by HCP systems in a replicated topology. This
capability includes seamless failover routing feature that enables direct integration with customer-owned load
balancers by allowing HTTP requests to be serviced by any HCP system in a replication topology. Seamless
domain name system (DNS) failover is an HCP built-in multisite load-balancing and high-availability technology
that is ideal for cost efficient, best-effort customer environments.
■■ Replication. If enabled, this feature provides a multitude of mechanisms that ensure data availability. The rep-
lica system can be used both as a source for disaster recovery and to maintain data availability by providing
good object copies for protection and content verification services. If an object cannot be read from the primary
system, HCP can try to read the object from the replica if read-from-replica feature is enabled.
■■ Data security.
■■ Authentication of management and data access.
WHITE PAPER 10
■■ Granular, multilayer data access permission scheme.
■■ IP filtering technology and protocol-specific access or deny lists.
■■ Secure Sockets Layer (SSL) for HTTP or WebDAV data access, management access, and replication.
■■ Node login prevention.
■■ Shredding policy and service.
■■ Autonomic technology refresh feature, implemented as HCP migration service, enables organizations to main-
tain continuously operating content stores that allows them to preserve their digital content assets for the long
term.
Cloud-Enabled Storage
The powerful, industry-leading capabilities of HCP make it well suited to the cloud storage space. An HCP-based
infrastructure solution is sufficiently flexible to accommodate any cloud deployment models (public, private or
hybrid) and simplify the migration to the cloud for both service providers and subscribers. HCP provides edge-to-
core, secure multitenancy and robust management capabilities, and a host of features to optimize cloud storage
operations.
HCP, in its role as an online data repository, is truly ready for a cloud-enabled market. While numerous HCP features
were already discussed earlier in this paper, the purpose of this section is to summarize those that contribute the
most to HCP cloud capabilities. They include:
■■ Large-scale multitenancy.
■■ Management segregation. HCP supports up to 1,000 tenants, each of which can be uniquely configured for
use by a separate cloud service subscriber.
■■ Data segregation. HCP supports up to 10,000 namespaces, each of which can be uniquely configured for a
particular application or workload.
■■ Massive scale.
■■ Petabyte repository offers 40PB of storage, 80 nodes, 32 billion user objects, and 15 million files per directory,
all on a single physical system.
■■ Best node density in the object storage industry supports 500TB per node and 400+M objects per node. With
fewer number of nodes, HCP requires less power, less cooling, and less floor space.
■■ Unparalleled expandability that allows organizations to “start small” and expand according to demand.
■■ Nodes and/or storage can be added to expand an HCP system’s storage and throughput capacity, without
disruptions. Multiple storage systems are supported by a single HCP system.
■■ Easy tenant and storage provisioning.
■■ Geographical dispersal and global accessibility.
■■ WAN-friendly REST interface for namespace data access and replication.
■■ Replication of content across multiple sites using advanced, flexible replication topologies.
■■ WAN-optimized, high-throughput data transfer.
WHITE PAPER 11
■■ High availability.
■■ Fully redundant hardware.
■■ Automatic routing of client requests around hardware failures.
■■ Load balancing across all available hardware.
■■ Multiple REST interfaces. These interfaces include the REST API for namespace data access, management API,
and metadata query API. REST API is a technology of choice for cloud enablers and consumers. Some of the
reasons for its popularity include high efficiency and low overhead, caching at both the client and the server and
API uniformity. In addition, this technology offers a stateless nature that allows accommodation of the latencies of
Internet access and potentially complex firewall configurations.
■■ Secure, granular access to tenants, namespaces and objects, which is crucial in any cloud environment. This
access is facilitated by the HCP multilayer, flexible permission mechanism, including object-level ACLs.
■■ Usage metering. HCP has built-in chargeback capabilities, indispensable for cloud use, to facilitate provider and
subscriber transactions. HCP also provides tools for 3rd-party vendors and customers to write to the API for easy
integration with the HDS solution for billing and reporting.
■■ Low-touch system that is self-monitoring, self-managing and self-healing. HCP features advanced monitor-
ing, audit and reporting capabilities. HCP services can automatically repair issues if they arise.
■■ Support for multiple levels of service. This support is provided through HCP policies, service plans and quotas
that can be configured for each tenant helps enforce service-level agreements (SLAs). It allows the platform to
accommodate a wide range of subscriber use cases and business models on a single physical system.
■■ Edge-to-core solution. HCP, working in tandem with Hitachi Data Ingestor (HDI), provides an integrated edge-
to-core solution for cloud storage deployments. HCP serves as the “engine” at the core of the HDS cloud
architecture. HDI resides at the edge of the storage cloud (for instance, at a remote office or subscriber site) and
serves as the “on-ramp” for application data to enter the cloud infrastructure. HDI acts as a local storage cache
while migrating data into HCP and maintaining links to stored content for later retrieval. Users and applications
interact with HDI at the edge of the cloud but perceive bottomless, backup-free storage provided by HCP at the
core.
E-Discovery, Compliance and Metadata Analysis
Custom metadata enables building massive unstructured data stores by providing means for faster and more
accurate access of content and giving storage managers the meaningful information they need to efficiently and
intelligently process data and apply the right object policies to meet all business, compliance and protection require-
ments. Regulatory compliance features include namespace retention mode (compliance and enterprise), retention
classes, retention hold, automated content disposition, and privileged delete and purge. HCP search capabili-
ties include support for e-discovery for litigation or audit purposes. On HCP, open APIs allow direct 3rd-party
integration.
HCP supports search facilities that provide an interactive interface. The search console offers a structured environ-
ment for creating and executing queries (sets of criteria that each object in the search results must satisfy). Users
can apply various selection criteria, such as objects stored before a certain date or larger than a specified size.
Queries return metadata for objects included in the search result. This metadata can be used to retrieve the object.
From the search console, users can open objects, perform bulk operations on objects (hold, release, delete, purge,
privileged delete and purge, change owner, set ACL), and export search results in standard file formats for use as
input to other applications.
WHITE PAPER 12
The metadata query engine (MQE) is integrated with HCP and is always available in the HCP system. It is also used
by the metadata query API, a programmatic interface for querying namespaces. The MQE index resides on desig-
nated logical volumes on the HCP storage nodes, sharing or not sharing the space on these volumes with the object
data, depending on the type of system and volume configuration.
Search is enabled at both the tenant and namespace levels. Indexing is enabled on a per-namespace basis. Settings
at the system and namespace levels determine whether custom metadata is indexed in addition to system meta-
data and ACLs. If indexing of custom metadata is disabled, the MQE indexes do not include custom metadata. If a
namespace is not indexed at all, searches do not return any results for objects in this namespace.
Each object has an index setting that affects differently what content is indexed by the metadata query engine. If
indexing is enabled for a namespace, MQE always indexes system metadata and ACLs regardless of the index set-
ting for an object. If the index setting is set to true, MQE also indexes custom metadata for this object.
System Fundamentals
Hardware Overview
An individual physical HCP instance, or HCP system, is not a single device; it is a collection of devices that, com-
bined with HCP software, can provide all the features of an online object repository while tolerating node, disk and
other component failures.
From a hardware perspective, each HCP system consists of the following categories of components:
■■ Nodes (servers).
■■ Internal or SAN-attached storage.
■■ Networking components (switches and cabling).
■■ Infrastructure components (racks and power distribution units).
Storage nodes are the vital part of HCP. They store and manage the objects that reside in the physical system stor-
age. The nodes are conventional off-the-shelf servers. Each node can have multiple internal physical drives and/or
connect to external Fibre Channel storage (SAN). In addition to using RAID and SAN technologies and a host of other
features to protect the data, HCP uses software mirroring to store the data and metadata for each object in multiple
locations on different nodes. For data, this feature is managed by the namespace DPL setting, which specifies the
number of copies of each object HCP must maintain in the repository to ensure the required level of data protection.
For metadata, this feature is managed by the MDPL, which is a system-wide setting.
A storage node runs the complete HCP software and serves as both a repository for objects and a gateway to the
data and metadata they contain. All runtime operations are distributed among the storage nodes, ensuring reliability
and performance.
HCP runs on a redundant array of independent nodes (RAIN) or a SAN-attached array of independent nodes
(SAIN). RAIN systems use the internal storage in each node. SAIN systems use the external SAN storage. HCP is
offered as 2 products: HCP 300 (based on RAIN configuration) and HCP 500 (based on SAIN configuration).
HCP RAIN (HCP 300)
The nodes in an HCP 300 system are Hitachi Compute Rack 220 (CR 220) servers. RAIN nodes contain internal
storage: RAID controller and disks. All nodes use hardware RAID-5 data protection. In an HCP RAIN system, the
physical disks in each node form a single RAID group, normally RAID-5 (5D+1P) (see Figure 3). This helps ensure the
integrity of the data stored on each node.
WHITE PAPER 13
An HCP 300 (RAIN) system must have a minimum of 4 storage nodes. Additional storage nodes are added in
4-node increments. An HCP 300 system can have a maximum of 20 nodes.
HCP 300 systems are normally configured with a DPL setting of 2 (DPL2), which, coupled with hardware RAID-5,
yields an effective RAID-5+1 total protection level.
Figure 3. HCP 300 Hardware Architecture
HCP SAIN (HCP 500/500XL)
The nodes in an HCP 500 system are either Hitachi Compute Rack 220 (CR 220) servers or blades in Hitachi
Compute Blade 320 (CB 320) servers. The HCP 500 nodes contain Fibre Channel host bus adapters (HBAs) and
use external Fibre Channel SAN storage; they are diskless servers that boot from the SAN-attached storage.
The nodes in a SAIN system can have internal storage in addition to being connected to external storage. These
nodes are called HCP 500XL nodes. They are an alternative to the standard HCP 500 nodes and have the same
hardware configuration, except the addition of the RAID controller and internal hard disk drives. In HCP 500XL nodes,
the system metadata database resides on the local disks, which leads to more efficient and faster database opera-
tions. As a result, the system has the ability to better support larger capacity and higher object counts per node and
address higher performance requirements.
A typical 500XL node internal storage configuration includes six 500GB 7200RPM SATA II drives in a single RAID-5
(5D+1P) RAID group, with 2 LUNs: 31GB (operating system) and 2.24TB (database). The HCP 500XL nodes are usu-
ally considered when the system configuration exceeds 4 standard nodes.
WHITE PAPER 14
HCP 500 and 500XL (SAIN) systems are supported with a minimum of 4 storage nodes. With a SAIN system, addi-
tional storage nodes are added in pairs, so the system always has an even number of storage nodes. A SAIN system
can have a maximum of 80 nodes.
Both RAIN and SAIN systems can have a DPL as high as 4, which affords maximum data availability but greatly
sacrifices storage utilization. Typically, the external SAN-attached storage uses RAID-6. Best protection and
high availability of an HCP 500 system is achieved by giving each node its own RAID group or Hitachi Dynamic
Provisioning (HDP) pool containing 1 RAID group.
Software Overview
HCP system software consists of an operating system (the appliance operating system) and core software. The core
software includes components that:
■■ Enable access to the object repository through the industry-standard HTTP or HTTPS, WebDAV, CIFS, NFS,
SMTP and NDMP protocols.
■■ Ingest fixed-content data, convert it into HCP objects, and manage the objects data and metadata over time.
■■ Maintain the integrity, stability, availability and security of stored data by enforcing repository policies and executing
system services.
■■ Enable configuration, monitoring and management of the HCP system through a human-readable interface.
■■ Support searching the repository through an interactive Web interface (the search console) and a programmatic
interface (the metadata query API).
System Organization
HCP is a fully symmetric, distributed application that stores and manages objects (see Figure 4). An HCP object
encapsulates the raw fixed-content data that is written by a client application, and its associated system and
custom metadata. Each node in an HCP system is a Linux-based server that runs a complete HCP instance. The
HCP system can withstand multiple simultaneous node failures, and acts automatically to ensure that all object and
namespace policies are valid.
WHITE PAPER 15
Figure 4. The High-Level Structure of an HCP System
External system communication is managed by the DNS manager, a distributed network component that balances
client requests across all nodes to ensure maximum system throughput and availability. The DNS manager works in
conjunction with a corporate DNS server to allow clients to access the system as a single entity, even though the
system is made up of multiple independent nodes.
The HCP system is configured as a subdomain of an existing corporate domain. Clients access the system using
predefined protocol-specific or namespace-specific names.
While not required, using DNS is important in ensuring balanced and problem-free client access to an HCP system,
especially for the HTTP or REST clients.
Namespaces and Tenants
Main Concepts
An HCP repository is partitioned into namespaces. A namespace is a logical repository as viewed by an applica-
tion. Each namespace consists of a distinct logical grouping of objects with its own directory structure, such that the
objects in one namespace are not visible in any other namespace. Access to one namespace does not grant a user
access to any other namespace. To the user of a namespace, the namespace is the repository. Namespaces are not
associated with any preallocated storage; they share the same underlying physical storage. Namespaces provide a
mechanism for separating the data stored for different applications, business units or customers. For example, there
may be one namespace for accounts receivable and another for accounts payable. While a single namespace can
WHITE PAPER 16
host one or more applications, it typically hosts only one application. Namespaces also enable operations to work
against selected subsets of repository objects. For example, a search could target the accounts receivable and
accounts payable namespaces but not the employees namespace.
Figure 5 shows the logical structure of an HCP system with respect to its multitenancy features.
Figure 5. HCP System Logical Layout: Namespaces and Tenants
Namespaces are owned and managed by tenants. Tenants are administrative entities that provide segregation of
management, while namespaces offer segregation of data. A tenant typically represents an actual organization
WHITE PAPER 17
such as a company or a department within a company that uses a portion of a repository. A tenant can also corre-
spond to an individual person. Namespace administration is done at the owning tenant level.
Clients can access HCP namespaces through HTTP or HTTPS, WebDAV, CIFS, NFS and SMTP protocols. These
protocols can support authenticated and/or anonymous types of access (types of access and their combinations are
discussed in more detail later in this document). HCP namespaces are owned by HCP tenants. An HCP system can
have multiple HCP tenants, each of which can own multiple namespaces. The number of namespaces each HCP
tenant can own can be limited by an administrator.
User and Group Accounts
User and group accounts control access to various HCP interfaces and give users permission to perform administra-
tive tasks and access namespace content.
An HCP user account is defined in HCP; it has a set of credentials, username and password, which is stored locally
in the system. The HCP system uses these credentials to authenticate a user, performing local authentication.
An HCP group account is a representation of an Active Directory (AD) group. To create group accounts, HCP must
be configured to support Active Directory. The group account enables AD users in the AD group to access one or
more of HCP interfaces.
Like HCP user accounts, HCP group accounts are defined separately at the system and tenant levels. Different ten-
ants have different user and group accounts. These accounts cannot be shared across tenants. Group membership
is different at the system and tenant levels.
HCP administrative roles can be associated with both system-level and tenant-level user and group accounts. Data
access permissions can be associated with only tenant-level user and group accounts. Consequently, system-level
local and AD users can only be administrative users, while tenant-level local and AD users can be both adminis-
trative users and have data access permissions. Tenant-level users can have only administrative roles without
namespace data permissions, or only namespace data permissions without administrative roles, or any combination
of administrative roles and namespace data permissions.
System and Tenant Management
The implementation of segregation of management in the HCP system is illustrated in Figure 6.
An HCP system has both system-level and tenant-level administrators:
■■ System-level administrative accounts are used for configuring system-wide features, monitoring system hard-
ware and software and overall repository usage, and managing system-level users. The system administrator user
interface, the system management console, provides the functionality needed by the maintainer of the physi-
cal HCP system. For example, it allows the maintainer to shut down the system, see information about nodes,
manage policies and services, and create HCP tenants. System administrators have a view of the system as a
whole. This view includes all HCP software and hardware that make up the system, and can perform all of the
administration for actions that have system scope.
■■ Tenant-level administrative accounts are used for creating HCP namespaces. They can configure individual ten-
ants and namespaces, monitor namespace usage at the tenant and namespace level, manage tenant-level users,
and control access to namespaces. The required functionality is provided by the tenant administrator user inter-
face, tenant management console. This interface is intended for use by the maintainer of the virtual HCP system
(an individual tenant with a set of namespaces it owns). The tenant-level administration feature facilitates segrega-
tion of management, which is essential in cloud environments.
WHITE PAPER 18
An HCP tenant can optionally grant system-level users administrative access to itself. In this case, system-level
users with the monitor, administrator, security or compliance role can log into the tenant management console or
use the HCP management API for that tenant. System-level users with the monitor or administrator role can also
access the tenant management console directly from the system management console. This effectively enables a
system administrator to function as a tenant administrator, as shown in Figure 4. System-level users can perform all
the activities allowed by the tenant-level roles that correspond to their system-level roles. An AD user may belong
to AD groups for which the corresponding HCP group accounts exist at both the system and tenant levels. This
user has the roles associated with both the applicable system-level group accounts and the applicable tenant-level
group accounts.
Policies
Objects in a namespace have a variety of properties, such as the retention setting or index setting. These proper-
ties are defined for each object by the object system metadata. Objects can also be affected by some namespace
properties, such as the default metadata settings that are inherited by new objects stored in the namespace, or the
versioning setting. Both the namespace-level settings and the properties that are part of the object metadata serve as
parameters for the HCP system’s transactions and services, and determine the object’s behavior during its life cycle
within the repository. These settings are called policies.
An HCP policy is one or more settings that influence how transactions and internal processes (services) affect
objects in a namespace. Policies ensure that objects behave in expected ways.
The HCP policies are described in Table 1.
Table 1. HITACHI CONTENT PLATFORM Policies
Policy Name Policy Description and Components Transactions and Services Influenced
DPL System DPL setting, namespace DPL setting. Object creation. Protection service.
Retention Default retention setting, object retention setting,
hold setting, system metadata and custom
metadata options for objects under retention.
Object creation, object deletion, system and
custom metadata handling. Disposition, Garbage
collection services.
Shredding Default shred setting, object shred setting. Object deletion. Shredding service.
Indexing Default index setting, object index setting. MQE.
Versioning Versioning setting, pruning setting. Object creation and deletion. Garbage collection
service.
Custom Metadata Validation XML syntax validation. Add/replace custom metadata operations.
Each policy may consist of one or more settings that may have different scopes of application and methods of con-
figuration. Policy settings are defined at the object and the namespace level. Note that the same policy setting may
be set at different levels depending on the namespace. The default retention, shred and index settings are set at the
namespace level in HCP namespaces.
WHITE PAPER 19
Table 2 lists all policy settings sorted according to their scope and method of configuration.
Table 2. HITACHI CONTENT PLATFORM Policy Settings: Scope and Configuration
Policy Policy Setting
HCP Namespaces
Scope/Level Configured Via
Data Protection Level System DPL: 1-4 System System UI
Namespace DPL: 1-4, dynamic Namespace Tenant UI, MAPI
Retention Default retention setting: fixed date, offset, special value,
retention class
Namespace Tenant UI, MAPI
Retention setting: fixed date, offset, special value,
retention class
Object REST API,
retention.txt
Hold setting: true or false Object REST API
Ownership and POSIX permission changes under
retention: true or false
Namespace Tenant UI, MAPI
Custom metadata operations allowed under retention Namespace Tenant UI, MAPI
Indexing Index setting: true or false (1/0) Object REST API, index.txt
Default index setting: true or false Namespace Tenant UI, MAPI
Shredding Shred setting: true or false (1/0) Object REST API, shred.txt
Default shred setting: true or false Namespace Tenant UI, MAPI
Custom Metadata Validation XML validation: true or false Namespace Tenant UI, MAPI
Versioning Versioning setting: true or false Namespace Tenant UI, MAPI
Pruning setting: true/false and number of days for
primary or replica
Namespace Tenant UI, MAPI
Content Management Services
A Hitachi Content Platform service is a background process that performs a specific function that is targeted at
preserving and improving the overall health of the HCP system. In particular, services are responsible for optimizing
the use of system resources and maintaining the integrity and availability of the data stored in the HCP repository.
HCP implements 12 services: protection, content verification, scavenging, garbage collection, duplicate elimination,
shredding, disposition, compression, capacity balancing, storage tiering, migration and replication.
HCP services are briefly described in Table 3.
WHITE PAPER 20
Table 3. HITACHI CONTENT PLATFORM Services
Policy Description
Protection Enforces DPL policy compliance by ensuring that the proper number of copies of each object exists in the system,
and that damaged or lost objects can be recovered. Any policy violation invokes repair process. Offers both
scheduled and event-driven service. Events trigger a full service run, even if the service is disabled, after a
configurable amount of time: 90 minutes after node shutdown; 1 minute after logical volume failure; 10 minutes
after node removal.
Content Verification Guarantees data integrity of repository objects by ensuring that the content of a file matches its digital signature.
Repairs the object if the hash does not match. Detects and repairs discrepancies between primary and secondary
metadata. SHA-256 hash algorithm is used by default. Checksums are computed on external and internal files.
Computationally intensive and time-consuming service. Runs according to the active service schedule.
Scavenging Ensures that all objects in the repository have valid metadata, and reconstructs metadata in case the metadata is
lost or corrupted, but data files exist. The service verifies that both the primary metadata for each data object and
the copies of the metadata stored with the object data (secondary metadata) are complete, valid and in sync with
each other. Computationally intensive and time-consuming service. Scheduled service.
Garbage Collection Reclaims storage space by purging hidden data and metadata for objects marked for deletion, or left behind by
incomplete transactions. It also deletes old versions of objects that are eligible for pruning. When applicable, the
deletion triggers the shredding service. Scheduled service, not event driven.
Duplicate Elimination Identifies and eliminates redundant objects in the repository, and merges duplicate data to free space. The hash
signature of external file representations is used to select objects as input to the service. These objects are then
checked in a byte for byte manner to ensure that the data contents are indeed identical. Scheduled service.
Shredding Overwrites storage locations where copies of the deleted object were stored in such a way that none of its data
or metadata can be reconstructed, for security reasons. Also called secure deletion. The default HCP shredding
algorithm uses 3 passes to overwrite an object and is DoD 5220.22-M standard compliant. The algorithm is
selected at install time. Event-driven only service, not scheduled. It is triggered by the deletion of an object marked
for shredding.
Disposition Automatic cleanup of expired objects. All HCP namespaces can be configured to automatically delete objects
after their retention period expires. Can be enabled or disabled both at the system and namespace level; enabling
disposition for a namespace has no effect if the service is disabled at the system level. Disposition service deletes
only current versions of versioned objects. Scheduled service.
Compression Compresses object data to make more efficient use of system storage space. The space reclaimed by
compression can be used for additional storage. A number of configurable parameters are provided via System
Management Console. Scheduled service.
Capacity Balancing Attempts to keep the usable storage capacity balanced (roughly equivalent) across all storage nodes in the
system. If storage utilization for the nodes differs by a wide margin, the service moves objects around to bring
the nodes closer to a balanced state. Runs only when started manually. Additions and deletions of objects do
not trigger the service. Typically, an authorized HCP service provider starts this service after adding new storage
nodes to the system. In addition, while not part of the service, during normal system operation new objects tend
to naturally spread among all storage nodes in the system in fairly even proportion. This is due to the nature of the
storage manager selection algorithm and resource monitoring of the administrative engine.
Storage Tiering Determines which storage tiering strategy applies to an object, evaluates where the copies of the object should
reside based on the rules in the applied service plan, and moves objects between running and spin-down storage
as needed. Active only in spindown-capable HCP SAIN systems. Scheduled service.
Conclusion
Hitachi Data Systems object storage solutions avoid the limitations of traditional file systems by intelligently storing
content in far larger quantities and in a much more efficient manner. These solutions provide for the new demands
imposed by the explosion of unstructured data and its growing importance to organizations, their partners, their
customers, their governments and their shareholders.
WHITE PAPER 21
The Hitachi Data Systems object storage solutions treat file data, file metadata and custom metadata as a single
object that is tracked and stored among a variety of storage tiers. With secure multitenancy and configurable attri-
butes for each logical partition, the object store can be divided into a number of smaller virtual object stores that
present configurable attributes to support different service levels. This allows the object store to support a wide range
of workloads, such as content preservation, data protection, content distribution and even cloud from a single physi-
cal infrastructure. One infrastructure is far easier to manage than disparate silos of technology for each application or
set of users. By integrating many key technologies in a single storage platform, Hitachi Data Systems object storage
solutions provide a path to short-term return on investment and significant long-term efficiency improvements. They
help IT evolve to meet new challenges, stay agile over the long term and address future change and growth.
© Hitachi Data Systems Corporation 2013. All rights reserved. HITACHI is a trademark or registered trademark of Hitachi, Ltd. Microsoft, Windows and Active Directory are trademarks or
registered trademarks of Microsoft Corporation. All other trademarks, service marks, and company names are properties of their respective owners.
Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered by
Hitachi Data Systems Corporation.
WP-425-B DG May 2013
Corporate Headquarters
2845 Lafayette Street
Santa Clara, CA 96050-2639 USA
www.HDS.com
Regional Contact Information
Americas: +1 408 970 1000 or info@hds.com
Europe, Middle East and Africa: +44 (0) 1753 618000 or info.emea@hds.com
Asia Pacific: +852 3189 7900 or hds.marketing.apac@hds.com

More Related Content

What's hot

Microservices with Docker, Kubernetes, and Jenkins
Microservices with Docker, Kubernetes, and JenkinsMicroservices with Docker, Kubernetes, and Jenkins
Microservices with Docker, Kubernetes, and JenkinsRed Hat Developers
 
AWS 클라우드를 통해 최소기능제품(MVP) 빠르게 개발하기 - 윤석찬, AWS 테크에반젤리스트
AWS 클라우드를 통해 최소기능제품(MVP) 빠르게 개발하기 - 윤석찬, AWS 테크에반젤리스트AWS 클라우드를 통해 최소기능제품(MVP) 빠르게 개발하기 - 윤석찬, AWS 테크에반젤리스트
AWS 클라우드를 통해 최소기능제품(MVP) 빠르게 개발하기 - 윤석찬, AWS 테크에반젤리스트Amazon Web Services Korea
 
KubeVirt (Kubernetes and Cloud Native Toronto)
KubeVirt (Kubernetes and Cloud Native Toronto)KubeVirt (Kubernetes and Cloud Native Toronto)
KubeVirt (Kubernetes and Cloud Native Toronto)Stephen Gordon
 
OPENSHIFT CONTAINER PLATFORM CI/CD Build & Deploy
OPENSHIFT CONTAINER PLATFORM CI/CD Build & DeployOPENSHIFT CONTAINER PLATFORM CI/CD Build & Deploy
OPENSHIFT CONTAINER PLATFORM CI/CD Build & DeployNatale Vinto
 
Content Delivery Using Amazon CloudFront - AWS Presentation - John Mancuso
Content Delivery Using Amazon CloudFront - AWS Presentation - John MancusoContent Delivery Using Amazon CloudFront - AWS Presentation - John Mancuso
Content Delivery Using Amazon CloudFront - AWS Presentation - John MancusoAmazon Web Services
 
Advanced Architectures with AWS Transit Gateway
Advanced Architectures with AWS Transit GatewayAdvanced Architectures with AWS Transit Gateway
Advanced Architectures with AWS Transit GatewayAmazon Web Services
 
Introduction of CCE and DevCloud
Introduction of CCE and DevCloudIntroduction of CCE and DevCloud
Introduction of CCE and DevCloudOpsta
 
AWS 클라우드 이해하기-사례 중심 (정민정) - AWS 웨비나 시리즈
AWS 클라우드 이해하기-사례 중심 (정민정) - AWS 웨비나 시리즈AWS 클라우드 이해하기-사례 중심 (정민정) - AWS 웨비나 시리즈
AWS 클라우드 이해하기-사례 중심 (정민정) - AWS 웨비나 시리즈Amazon Web Services Korea
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 introTerry Cho
 
Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트
Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트
Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트Amazon Web Services Korea
 
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel Aviv
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel AvivDouble Redundancy with AWS Direct Connect - Pop-up Loft Tel Aviv
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel AvivAmazon Web Services
 
Introduction to Kubernetes and GKE
Introduction to Kubernetes and GKEIntroduction to Kubernetes and GKE
Introduction to Kubernetes and GKEOpsta
 
AWS 상의 컨테이너 서비스 소개 ECS, EKS - 이종립 / Principle Enterprise Evangelist @베스핀글로벌
AWS 상의 컨테이너 서비스 소개 ECS, EKS - 이종립 / Principle Enterprise Evangelist @베스핀글로벌AWS 상의 컨테이너 서비스 소개 ECS, EKS - 이종립 / Principle Enterprise Evangelist @베스핀글로벌
AWS 상의 컨테이너 서비스 소개 ECS, EKS - 이종립 / Principle Enterprise Evangelist @베스핀글로벌BESPIN GLOBAL
 
AWS Summit Seoul 2015 -CloudFront와 Route53 기반 콘텐츠 배포 전략 (GS네오텍-박정수)
AWS Summit Seoul 2015 -CloudFront와 Route53 기반 콘텐츠 배포 전략 (GS네오텍-박정수)AWS Summit Seoul 2015 -CloudFront와 Route53 기반 콘텐츠 배포 전략 (GS네오텍-박정수)
AWS Summit Seoul 2015 -CloudFront와 Route53 기반 콘텐츠 배포 전략 (GS네오텍-박정수)Amazon Web Services Korea
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleMihai Criveti
 
From Monolithic to Microservices (AWS & Digital Goodie)
From Monolithic to Microservices (AWS & Digital Goodie)From Monolithic to Microservices (AWS & Digital Goodie)
From Monolithic to Microservices (AWS & Digital Goodie)Amazon Web Services
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep DiveMichelle Holley
 

What's hot (20)

Microservices with Docker, Kubernetes, and Jenkins
Microservices with Docker, Kubernetes, and JenkinsMicroservices with Docker, Kubernetes, and Jenkins
Microservices with Docker, Kubernetes, and Jenkins
 
Why Kubernetes on Azure
Why Kubernetes on AzureWhy Kubernetes on Azure
Why Kubernetes on Azure
 
AWS Containers Day.pdf
AWS Containers Day.pdfAWS Containers Day.pdf
AWS Containers Day.pdf
 
AWS 클라우드를 통해 최소기능제품(MVP) 빠르게 개발하기 - 윤석찬, AWS 테크에반젤리스트
AWS 클라우드를 통해 최소기능제품(MVP) 빠르게 개발하기 - 윤석찬, AWS 테크에반젤리스트AWS 클라우드를 통해 최소기능제품(MVP) 빠르게 개발하기 - 윤석찬, AWS 테크에반젤리스트
AWS 클라우드를 통해 최소기능제품(MVP) 빠르게 개발하기 - 윤석찬, AWS 테크에반젤리스트
 
KubeVirt (Kubernetes and Cloud Native Toronto)
KubeVirt (Kubernetes and Cloud Native Toronto)KubeVirt (Kubernetes and Cloud Native Toronto)
KubeVirt (Kubernetes and Cloud Native Toronto)
 
OPENSHIFT CONTAINER PLATFORM CI/CD Build & Deploy
OPENSHIFT CONTAINER PLATFORM CI/CD Build & DeployOPENSHIFT CONTAINER PLATFORM CI/CD Build & Deploy
OPENSHIFT CONTAINER PLATFORM CI/CD Build & Deploy
 
Content Delivery Using Amazon CloudFront - AWS Presentation - John Mancuso
Content Delivery Using Amazon CloudFront - AWS Presentation - John MancusoContent Delivery Using Amazon CloudFront - AWS Presentation - John Mancuso
Content Delivery Using Amazon CloudFront - AWS Presentation - John Mancuso
 
Advanced Architectures with AWS Transit Gateway
Advanced Architectures with AWS Transit GatewayAdvanced Architectures with AWS Transit Gateway
Advanced Architectures with AWS Transit Gateway
 
Introduction of CCE and DevCloud
Introduction of CCE and DevCloudIntroduction of CCE and DevCloud
Introduction of CCE and DevCloud
 
AWS 클라우드 이해하기-사례 중심 (정민정) - AWS 웨비나 시리즈
AWS 클라우드 이해하기-사례 중심 (정민정) - AWS 웨비나 시리즈AWS 클라우드 이해하기-사례 중심 (정민정) - AWS 웨비나 시리즈
AWS 클라우드 이해하기-사례 중심 (정민정) - AWS 웨비나 시리즈
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트
Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트
Amazon VPC와 ELB/Direct Connect/VPN 알아보기 - 김세준, AWS 솔루션즈 아키텍트
 
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel Aviv
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel AvivDouble Redundancy with AWS Direct Connect - Pop-up Loft Tel Aviv
Double Redundancy with AWS Direct Connect - Pop-up Loft Tel Aviv
 
Amazon Virtual Private Cloud
Amazon Virtual Private CloudAmazon Virtual Private Cloud
Amazon Virtual Private Cloud
 
Introduction to Kubernetes and GKE
Introduction to Kubernetes and GKEIntroduction to Kubernetes and GKE
Introduction to Kubernetes and GKE
 
AWS 상의 컨테이너 서비스 소개 ECS, EKS - 이종립 / Principle Enterprise Evangelist @베스핀글로벌
AWS 상의 컨테이너 서비스 소개 ECS, EKS - 이종립 / Principle Enterprise Evangelist @베스핀글로벌AWS 상의 컨테이너 서비스 소개 ECS, EKS - 이종립 / Principle Enterprise Evangelist @베스핀글로벌
AWS 상의 컨테이너 서비스 소개 ECS, EKS - 이종립 / Principle Enterprise Evangelist @베스핀글로벌
 
AWS Summit Seoul 2015 -CloudFront와 Route53 기반 콘텐츠 배포 전략 (GS네오텍-박정수)
AWS Summit Seoul 2015 -CloudFront와 Route53 기반 콘텐츠 배포 전략 (GS네오텍-박정수)AWS Summit Seoul 2015 -CloudFront와 Route53 기반 콘텐츠 배포 전략 (GS네오텍-박정수)
AWS Summit Seoul 2015 -CloudFront와 Route53 기반 콘텐츠 배포 전략 (GS네오텍-박정수)
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image Lifecycle
 
From Monolithic to Microservices (AWS & Digital Goodie)
From Monolithic to Microservices (AWS & Digital Goodie)From Monolithic to Microservices (AWS & Digital Goodie)
From Monolithic to Microservices (AWS & Digital Goodie)
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep Dive
 

Similar to Introduction to Object Storage Solutions White Paper

Hitachi content platform custom object metadata enhancement tool
Hitachi content platform custom object metadata enhancement toolHitachi content platform custom object metadata enhancement tool
Hitachi content platform custom object metadata enhancement toolHitachi Vantara
 
Meet the unique challenges of dicom hl7 data access, data consolidation, data...
Meet the unique challenges of dicom hl7 data access, data consolidation, data...Meet the unique challenges of dicom hl7 data access, data consolidation, data...
Meet the unique challenges of dicom hl7 data access, data consolidation, data...Hitachi Vantara
 
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File SharingESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File SharingHitachi Vantara
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
 
Nuxeo Fact Sheet
Nuxeo Fact SheetNuxeo Fact Sheet
Nuxeo Fact SheetNuxeo
 
Storage 2.0 (Unstructured Data)
Storage 2.0 (Unstructured Data)Storage 2.0 (Unstructured Data)
Storage 2.0 (Unstructured Data)Vikas Deolaliker
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object Sandeep Patil
 
Hitachi Content Platform Datasheet
Hitachi Content Platform DatasheetHitachi Content Platform Datasheet
Hitachi Content Platform DatasheetHitachi Vantara
 
Hitachi Cloud Solutions Profile
Hitachi Cloud Solutions Profile Hitachi Cloud Solutions Profile
Hitachi Cloud Solutions Profile Hitachi Vantara
 
Spectrum scale-external-unified-file object
Spectrum scale-external-unified-file objectSpectrum scale-external-unified-file object
Spectrum scale-external-unified-file objectSandeep Patil
 
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...dbpublications
 
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...janaskhoj
 
IRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET Journal
 
Hitachi content-platform-architecture-fundamentals
Hitachi content-platform-architecture-fundamentalsHitachi content-platform-architecture-fundamentals
Hitachi content-platform-architecture-fundamentalsilknurd
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
Red Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed_Hat_Storage
 
hitachi-content-platform-portfolio-esg-validation-report
hitachi-content-platform-portfolio-esg-validation-reporthitachi-content-platform-portfolio-esg-validation-report
hitachi-content-platform-portfolio-esg-validation-reportIngrid Fernandez, PhD
 
Object storage
Object storageObject storage
Object storageronpoul
 

Similar to Introduction to Object Storage Solutions White Paper (20)

Hitachi content platform custom object metadata enhancement tool
Hitachi content platform custom object metadata enhancement toolHitachi content platform custom object metadata enhancement tool
Hitachi content platform custom object metadata enhancement tool
 
Meet the unique challenges of dicom hl7 data access, data consolidation, data...
Meet the unique challenges of dicom hl7 data access, data consolidation, data...Meet the unique challenges of dicom hl7 data access, data consolidation, data...
Meet the unique challenges of dicom hl7 data access, data consolidation, data...
 
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File SharingESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
ESG - HDS HCP Anywhere Easy, Secure, On-Premises File Sharing
 
paper
paperpaper
paper
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Nuxeo Fact Sheet
Nuxeo Fact SheetNuxeo Fact Sheet
Nuxeo Fact Sheet
 
Storage 2.0 (Unstructured Data)
Storage 2.0 (Unstructured Data)Storage 2.0 (Unstructured Data)
Storage 2.0 (Unstructured Data)
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object
 
Hitachi Content Platform Datasheet
Hitachi Content Platform DatasheetHitachi Content Platform Datasheet
Hitachi Content Platform Datasheet
 
Hitachi Cloud Solutions Profile
Hitachi Cloud Solutions Profile Hitachi Cloud Solutions Profile
Hitachi Cloud Solutions Profile
 
Spectrum scale-external-unified-file object
Spectrum scale-external-unified-file objectSpectrum scale-external-unified-file object
Spectrum scale-external-unified-file object
 
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
 
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
Archiving as a Service - A Model for the Provision of Shared Archiving Servic...
 
IRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication System
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Hitachi content-platform-architecture-fundamentals
Hitachi content-platform-architecture-fundamentalsHitachi content-platform-architecture-fundamentals
Hitachi content-platform-architecture-fundamentals
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Red Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph Storage
 
hitachi-content-platform-portfolio-esg-validation-report
hitachi-content-platform-portfolio-esg-validation-reporthitachi-content-platform-portfolio-esg-validation-report
hitachi-content-platform-portfolio-esg-validation-report
 
Object storage
Object storageObject storage
Object storage
 

More from Hitachi Vantara

Webinar: What Makes a Smart City Smart
Webinar: What Makes a Smart City SmartWebinar: What Makes a Smart City Smart
Webinar: What Makes a Smart City SmartHitachi Vantara
 
Hyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital TransformationHyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital TransformationHitachi Vantara
 
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsPowering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsHitachi Vantara
 
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Hitachi Vantara
 
Virtual Infrastructure Integrator Overview Presentation
Virtual Infrastructure Integrator Overview PresentationVirtual Infrastructure Integrator Overview Presentation
Virtual Infrastructure Integrator Overview PresentationHitachi Vantara
 
HDS and VMware vSphere Virtual Volumes (VVol)
HDS and VMware vSphere Virtual Volumes (VVol) HDS and VMware vSphere Virtual Volumes (VVol)
HDS and VMware vSphere Virtual Volumes (VVol) Hitachi Vantara
 
Cloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards InfographicCloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards InfographicHitachi Vantara
 
Five Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud ExperienceFive Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud ExperienceHitachi Vantara
 
Economist Intelligence Unit: Preparing for Next-Generation Cloud
Economist Intelligence Unit: Preparing for Next-Generation CloudEconomist Intelligence Unit: Preparing for Next-Generation Cloud
Economist Intelligence Unit: Preparing for Next-Generation CloudHitachi Vantara
 
HDS Influencer Summit 2014: Innovating with Information to Address Business N...
HDS Influencer Summit 2014: Innovating with Information to Address Business N...HDS Influencer Summit 2014: Innovating with Information to Address Business N...
HDS Influencer Summit 2014: Innovating with Information to Address Business N...Hitachi Vantara
 
Information Innovation Index 2014 UK Research Results
Information Innovation Index 2014 UK Research ResultsInformation Innovation Index 2014 UK Research Results
Information Innovation Index 2014 UK Research ResultsHitachi Vantara
 
Redefine Your IT Future With Continuous Cloud Infrastructure
Redefine Your IT Future With Continuous Cloud InfrastructureRedefine Your IT Future With Continuous Cloud Infrastructure
Redefine Your IT Future With Continuous Cloud InfrastructureHitachi Vantara
 
Hu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHitachi Vantara
 
Define Your Future with Continuous Cloud Infrastructure Checklist Infographic
Define Your Future with Continuous Cloud Infrastructure Checklist InfographicDefine Your Future with Continuous Cloud Infrastructure Checklist Infographic
Define Your Future with Continuous Cloud Infrastructure Checklist InfographicHitachi Vantara
 
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platformHitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platformHitachi Vantara
 
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...Hitachi Vantara
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperHitachi Vantara
 
HitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution ProfileHitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution ProfileHitachi Vantara
 
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...Hitachi Vantara
 
The Next Evolution in Storage Virtualization Management White Paper
The Next Evolution in Storage Virtualization Management White PaperThe Next Evolution in Storage Virtualization Management White Paper
The Next Evolution in Storage Virtualization Management White PaperHitachi Vantara
 

More from Hitachi Vantara (20)

Webinar: What Makes a Smart City Smart
Webinar: What Makes a Smart City SmartWebinar: What Makes a Smart City Smart
Webinar: What Makes a Smart City Smart
 
Hyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital TransformationHyperconverged Systems for Digital Transformation
Hyperconverged Systems for Digital Transformation
 
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data SystemsPowering the Enterprise Cloud with CSC and Hitachi Data Systems
Powering the Enterprise Cloud with CSC and Hitachi Data Systems
 
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
Virtualizing SAP HANA with Hitachi Unified Compute Platform Solutions: Bring...
 
Virtual Infrastructure Integrator Overview Presentation
Virtual Infrastructure Integrator Overview PresentationVirtual Infrastructure Integrator Overview Presentation
Virtual Infrastructure Integrator Overview Presentation
 
HDS and VMware vSphere Virtual Volumes (VVol)
HDS and VMware vSphere Virtual Volumes (VVol) HDS and VMware vSphere Virtual Volumes (VVol)
HDS and VMware vSphere Virtual Volumes (VVol)
 
Cloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards InfographicCloud Adoption, Risks and Rewards Infographic
Cloud Adoption, Risks and Rewards Infographic
 
Five Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud ExperienceFive Best Practices for Improving the Cloud Experience
Five Best Practices for Improving the Cloud Experience
 
Economist Intelligence Unit: Preparing for Next-Generation Cloud
Economist Intelligence Unit: Preparing for Next-Generation CloudEconomist Intelligence Unit: Preparing for Next-Generation Cloud
Economist Intelligence Unit: Preparing for Next-Generation Cloud
 
HDS Influencer Summit 2014: Innovating with Information to Address Business N...
HDS Influencer Summit 2014: Innovating with Information to Address Business N...HDS Influencer Summit 2014: Innovating with Information to Address Business N...
HDS Influencer Summit 2014: Innovating with Information to Address Business N...
 
Information Innovation Index 2014 UK Research Results
Information Innovation Index 2014 UK Research ResultsInformation Innovation Index 2014 UK Research Results
Information Innovation Index 2014 UK Research Results
 
Redefine Your IT Future With Continuous Cloud Infrastructure
Redefine Your IT Future With Continuous Cloud InfrastructureRedefine Your IT Future With Continuous Cloud Infrastructure
Redefine Your IT Future With Continuous Cloud Infrastructure
 
Hu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On World
 
Define Your Future with Continuous Cloud Infrastructure Checklist Infographic
Define Your Future with Continuous Cloud Infrastructure Checklist InfographicDefine Your Future with Continuous Cloud Infrastructure Checklist Infographic
Define Your Future with Continuous Cloud Infrastructure Checklist Infographic
 
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platformHitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
Hitachi white-paper-future-proof-your-datacenter-with-the-right-nas-platform
 
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
IDC Analyst Connection: Flash, Cloud, and Software-Defined Storage: Trends Di...
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White Paper
 
HitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution ProfileHitVirtualized Tiered Storage Solution Profile
HitVirtualized Tiered Storage Solution Profile
 
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
Use Case: Large Biotech Firm Expands Data Center and Reduces Overheating with...
 
The Next Evolution in Storage Virtualization Management White Paper
The Next Evolution in Storage Virtualization Management White PaperThe Next Evolution in Storage Virtualization Management White Paper
The Next Evolution in Storage Virtualization Management White Paper
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Introduction to Object Storage Solutions White Paper

  • 1. Introduction to Object Storage and Hitachi Content Platform The Fundamentals of Hitachi Content Platform DATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG ON POWERFUL RELEVANT PERFORMANCE SOLUTION CLO VIRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITEPAPER By Hitachi Data Systems May 2013
  • 2. WHITE PAPER 2 Contents Executive Summary 3 Introduction 4 Main Concepts and Features 4 Object-Based Storage 4 Object Structure 4 Distributed Design 6 Open Architecture 6 Multitenancy 7 Object Versioning 7 Spin-Down and Storage Tiering 7 Search 8 Replication 8 Common Use Cases 8 Fixed-Content Archiving 8 Backup-Free Data Protection and Content Preservation 8 Cloud-Enabled Storage 10 E-Discovery, Compliance and Metadata Analysis 11 System Fundamentals 12 Hardware Overview 12 Software Overview 14 System Organization 14 Namespaces and Tenants 15 Main Concepts 15 User and Group Accounts 17 System and Tenant Management 17 Policies 18 Content Management Services 19 Conclusion 20
  • 3. WHITE PAPER 3 Introduction to Object Storage and Hitachi Content Platform Executive Summary One of IT’s greatest challenges today is an explosive, uncontrolled growth of unstructured data. Continual growth of email and documents, video, Web pages, presentations, medical images, and so forth, increase both complexity and risk. This effect is seen particularly in distributed IT environments, such as cloud service providers and organizations with branch or remote office sites. The vast quantity of data being created, difficulties in management and proper handling of unstructured content, and complexity of supporting more users and applications pose challenges to IT departments. Organizations often end up with sprawling storage silos for a multitude of applications and workloads, with few resources available to manage, govern, protect, and search the data. Hitachi Data Systems provides an alternative solution to these challenges through a single object storage platform that can be divided into virtual storage systems, each configured for the desired level of service. The great scale and rich features of this solution help IT organizations in both private enterprises and cloud service providers manage distributed IT environments. It helps them to control the flood of storage requirements for unstructured content and addresses a variety of workloads.
  • 4. WHITE PAPER 4 Introduction Hitachi Content Platform (HCP) is a multipurpose distributed object-based storage system designed to sup- port large-scale repositories of unstructured data. HCP enables IT organizations and cloud service providers to store, protect, preserve and retrieve unstructured content with a single storage platform. It supports multiple levels of service and readily evolves with technology and scale changes. With a vast array of data protection and content preservation technologies, the system can significantly reduce or even eliminate tape-based self backups or backups of edge devices connected to the platform. HCP obviates the need for a siloed approach to storing unstructured content. Massive scale, multiple storage tiers, Hitachi reliability, nondisruptive hardware and software updates, mul- titenancy and configurable attributes for each tenant allow the platform to support a wide range of applications on a single physical HCP instance. By dividing the physical system into multiple, uniquely configured tenants, adminis- trators create “virtual content platforms” that can be further subdivided into namespaces for further organization of content, policies and access. With support for thousands of tenants, tens of thousands of namespaces, and pet- abytes of capacity in one system, HCP is truly cloud-ready. Main Concepts and Features Object-Based Storage Hitachi Content Platform, as a general-purpose object store, allows unstructured data files to be stored as objects. An object is essentially a container that includes both file data and associated metadata that describes the data. The objects are stored in a repository. The metadata is used to define the structure and administration of the data. HCP can also leverage object metadata to apply specific management functions, such as storage tiering, to each object. The objects have intelligence that enables them to automatically take advantage of advanced storage and data management features to ensure proper placement and distribution of content. HCP architecture isolates stored data from the hardware layer. Internally, ingested files are represented as objects that encapsulate both the data and metadata required to support applications. Externally, HCP presents each object either as a set of files in a standard directory structure or as a uniform resource locator (URL) accessible by users and applications via HTTP/HTTPS. HCP stores objects in a repository. Data that is ingested and stored in the repository is permanently associated with the information about that data, called metadata. Each data object encapsulates both object data and metadata, and is treated within HCP as a single unit for all intents and purposes. Object Structure An HCP repository object is composed of file data and the associated metadata, which in turn consists of system metadata and, optionally, custom metadata and an access control list (ACL). The structure of the object is shown in Figure 1. File data is an exact digital copy of the actual file contents at the time of its ingestion. If the object is under retention, it cannot be deleted before the expiration of its retention period, except when using a special privileged operation. If versioning is enabled, multiple versions of a file can be retained. If appendable objects are enabled, data can be appended to an object (with the CIFS or NFS protocols) without modifying the original fixed-content data.
  • 5. WHITE PAPER 5 Figure 1. HCP Object Metadata is system or user generated data that describes the fixed-content data of an object and defines the object’s properties. System metadata, the system-managed properties of the object, includes HCP-specific meta- data and POSIX metadata. HCP-specific metadata includes the date and time the object was added to the namespace (ingest time), the date and time the object was last changed (change time), the cryptographic hash value of the object along with the namespace hash algorithm used to generate that value, and the protocol through which the object was ingested. It also includes the object’s policy settings, such as data protection level (DPL), retention, shredding, indexing, and, for HCP namespaces only, versioning. POSIX metadata includes a user ID and group ID, a POSIX permissions value, and POSIX time attributes. Custom metadata is optional, user-supplied descriptive information about a data object that is usually provided as well-formed XML. It is typically intended for more detailed description of the object. This metadata can also be used by future users and applications to understand and repurpose the object content. HCP supports multiple custom metadata fields for each object. ACL is optional, user-provided metadata containing a set of permissions granted to users or user groups to perform operations on an object. The ACLs are supported only in HCP namespaces. The complete metadata structure, as supported in HCP namespaces, is shown in Figure 2. It includes all metafiles supported by HCP for objects, which were generated for the sample data structure (assuming that custom metadata and ACLs were added for each object).
  • 6. WHITE PAPER 6 Figure 2. HCP Namespace: Complete Metadata Structure Distributed Design An HCP system consists of both hardware and software and comprises many different components that are con- nected together to form a robust, scalable architecture for object-based storage. HCP runs on an array of servers, or nodes, that are networked together to form a single physical instance. Each node is a storage node. Storage nodes store data objects. All runtime operations and physical storage, including data and metadata, are distributed among the storage nodes. All objects in the repository are distributed across all available storage space but still presented as files in a standard directory structure. Objects that are physically stored on any particular node are available from all other nodes. Open Architecture HCP has an open architecture that insulates stored data from technology changes, as well as from changes in HCP itself due to product enhancements. This open architecture ensures that users will have access to the data long after it has been added to the repository. HCP acts as both a repository that can store customer data and an online portal that enables access to that data by means of several industry-standard interfaces, as well as through an integrated search facility, Hitachi Data Discovery Suite (HDDS). The HTTP or HTTPS, WebDAV, CIFS and NFS pro- tocols support various operations. These operations include storing data, creating and viewing directories, viewing and retrieving objects and their metadata, modifying object metadata, and deleting objects. Objects that were added using any protocol are immediately accessible through any other supported protocol. These protocols can be used to access the data with a Web browser, the HCP client tools, 3rd-party applications, Microsoft® Windows® Explorer, or native Windows or UNIX tools. HCP allows special-purpose access to the repository through the SMTP protocol, which is used only for storing email. For data backup and restore, HCP supports the NDMP protocol.
  • 7. WHITE PAPER 7 Multitenancy Multitenancy support allows the repository in a single physical HCP instance to be partitioned into multiple name- spaces. A namespace is a logical partition that contains a collection of objects particular to one or more applications. Each namespace is a private object store that is represented by a separate directory structure and has a set of independently configured attributes. Namespaces provide segregation of data, while tenants, or groupings of namespaces, provide segregation of management. An HCP system can have up to 1,000 tenants. Each tenant and its set of namespaces constitute a virtual HCP system that can be accessed and managed independently by users and applications. This HCP feature is essential in enterprise, cloud and service-provider environments. Data access to HCP namespaces can be either authenticated or nonauthenticated, depending on the type and configuration of the access protocol. Authentication can be performed using HCP local accounts or Microsoft Active Directory® groups. Object Versioning HCP supports object versioning, which is the capability of a namespace to create, store and manage multiple versions of objects in the HCP repository. This ability provides a history of how the data has changed over time. Versioning facilitates storage and replication of evolving content, thereby creating new opportunities for HCP in markets such as content depots and workflow applications. Versioning is available in HCP namespaces and is configured at the namespace level. Versioning is only supported with HTTP or REST. Other protocols cannot be enabled if versioning is enabled for the namespace. Versioning applies only to objects, not to directories or symbolic links. A new version of an object is created when an object with the same name and location as an existing object is added to the namespace. A special type of version, called a deleted version, is created when an object is deleted. Updates to the object metadata affect only the current ver- sion of an object and do not create new versions. Previous versions of objects that are older than a specified amount of time can be automatically deleted, or pruned. It is not possible to delete specific historical versions of an object; however, a user or application with appropriate permissions can purge the object to delete all its versions, including the current one. Spin-Down and Storage Tiering HCP implements spin-down disk support as an early step towards the long-term goal of supporting information life- cycle management (ILM) and intelligent objects. In the near term, the goal of the HCP spin-down feature is to take advantage of the energy savings potential of the spin-down technology. HCP spindown-capable storage is based on the power savings feature of Hitachi midrange storage systems and is a core element of the new storage tiering functionality, which is implemented as an HCP service. According to storage tiering strategy that is specified by customers, the storage tiering service identifies objects that are eligible to reside on spin-down storage and moves them to and from the spin-down storage as needed. Tiering selected content to spindown-enabled storage lowers overall cost by reducing energy consumption for large-scale unstructured data storage, such as deep archives and disaster recovery sites. Storage tiering can very effectively be used with customer-identified “dark data” (rarely accessed data) or data replicated for disaster recovery by moving that data to spin-down storage some time after ingestion or replication. Customer sites where data pro- tection is critical can use storage tiering to move all redundant data copies to spin-down storage, which makes the cost of keeping data protection copies competitive to a tape solution. Storage tiering also enables service providers to use a turnkey framework to offer differentiated object data man- agement plans. This capability further enhances HCP as an attractive target for fixed content, especially for archive-oriented use cases where tape may be considered an alternative.
  • 8. WHITE PAPER 8 Search HCP provides the only integrated metadata query engine on the market. HCP includes comprehensive search capa- bilities that enable users to search for objects in namespaces, analyze namespace contents, and manipulate groups of objects. To satisfy government requirements, HCP supports e-discovery for audits and litigation. The metadata query engine is always available in any HCP system, but the content search facility requires installation of a separate HDS product, Hitachi Data Discovery Suite. Replication Replication, an add-on feature to HCP, is the process that keeps selected tenants and namespaces in 2 or more HCP systems in sync with each other. The replication service copies one or more tenants or namespaces from one HCP system to another, propagating object creations, object deletions, and metadata changes. HCP also replicates tenant and namespace configuration, tenant-level user accounts, compliance and tenant log messages, and reten- tion classes. The HCP system in which the objects are initially created is called the primary system. The 2nd system is called the replica. Typically, the primary system and the replica are in separate geographic locations and connected by a high-speed wide area network. HCP supports different replication topologies including many-to-one and chain configurations. Common Use Cases Fixed-Content Archiving Hitachi Content Platform is optimized for fixed-content data archiving. Fixed-content data is information that does not change but must be kept available for future reference and be easily accessible when needed. A fixed-content storage system is one in which the data cannot be modified. HCP uses “write-once, read-many” (WORM) storage technology, and a variety of policies and services (such as retention, content verification and protection) to ensure the integrity of data in the repository. The WORM storage means that data, once ingested into the repository, cannot be updated or modified; that is, the data is guaranteed to remain unchanged from when it was originally stored. If the versioning feature is enabled within the HCP system, different versions of the data can be stored and retrieved, in which case each version is WORM. Backup-Free Data Protection and Content Preservation HCP is a true backup-free platform. HCP protects content without the need for backup. It uses sophisticated data preservation technologies, such as configurable data and metadata protection levels (MDPL), object versioning and change tracking, multisite replication with seamless application failover, and many others. HCP includes a variety of features designed to protect integrity, provide privacy, and ensure availability and security of stored data. Below is a summary of the key HCP data protection features: ■■ Content immutability. This intrinsic feature of HCP WORM storage design protects the integrity of the data in the repository. ■■ Content verification. The content verification service maintains data integrity and protects against data corrup- tion or tampering by ensuring that the data of each object matches its cryptographic hash value. Any violation is repaired in a self-healing fashion. ■■ Scavenging. The scavenging service ensures that all objects in the repository have valid metadata. In case meta- data is lost or corrupted, the service tries to reconstruct it by using the secondary, or scavenging, metadata (a copy of the metadata stored with each copy of the object data).
  • 9. WHITE PAPER 9 ■■ Data encryption. HCP supports encryption at rest capability that allows seamless encryption of data on the physi- cal volumes of the repository. This ensures data privacy by preventing unauthorized access to the stored data. The encryption and decryption are handled automatically and transparently to users and applications. ■■ Versioning. HCP uses versioning to protect against accidental deletes and storing wrong copies of objects. ■■ Data availability. ■■ RAID protection. RAID storage technology provides efficient protection from simple disk failures. SAN-based HCP systems typically use RAID-6 erasure coding protection to guard against dual drive failures. ■■ Multipathing and zero-copy failover. These features provide data availability in SAN-attached array of inde- pendent nodes (SAIN) systems. ■■ Data protection level and protection service. In addition to using RAID and SAN technologies to provide data integrity and availability, HCP can use software mirroring to store the data for each object in multiple loca- tions on different nodes. HCP groups storage nodes into protection sets with the same number of nodes in each set, and tries to store all the copies of the data for an object in a single protection set where each copy is stored on a different node. The protection service enforces the required level of data redundancy by check- ing and repairing protection sets. In case of violation, it creates additional copies or deletes extra copies of an object to bring the object into compliance. If replication is enabled, the protection service can use an object copy from a replica system if the copy on the primary system is unavailable. ■■ Metadata redundancy. In addition to the data redundancy as specified by DPL, HCP creates multiple copies of the metadata for an object on different nodes. Metadata protection level or MDPL is a system-wide setting that specifies the number of copies of the metadata that the HCP system must maintain (normally 2 copies, MDPL2). Management of MDPL redundancy is independent of the management of data copies for DPL. ■■ Nondisruptive software and hardware upgrades. HCP employs a number of techniques that minimize or eliminate any disruption of normal system functions during software and hardware upgrades. Nondisruptive software upgrade (NDSU) is one of these techniques that includes greatly enhanced online upgrade support, nondisruptive patch management, and online upgrade performance improvements. HCP supports media-free and remote upgrades, HTTP or REST drain mode, and parallel operating system (OS) installation. It also sup- ports automatic online upgrade commit, offline upgrade duration estimate, enhanced monitoring and email alerts, and other features. Storage nodes can be added to an HCP system without causing any downtime. HCP also supports nondisrup- tive storage upgrades that allow online storage addition to SAIN systems without any data outage. ■■ Seamless application failover. This feature is supported by HCP systems in a replicated topology. This capability includes seamless failover routing feature that enables direct integration with customer-owned load balancers by allowing HTTP requests to be serviced by any HCP system in a replication topology. Seamless domain name system (DNS) failover is an HCP built-in multisite load-balancing and high-availability technology that is ideal for cost efficient, best-effort customer environments. ■■ Replication. If enabled, this feature provides a multitude of mechanisms that ensure data availability. The rep- lica system can be used both as a source for disaster recovery and to maintain data availability by providing good object copies for protection and content verification services. If an object cannot be read from the primary system, HCP can try to read the object from the replica if read-from-replica feature is enabled. ■■ Data security. ■■ Authentication of management and data access.
  • 10. WHITE PAPER 10 ■■ Granular, multilayer data access permission scheme. ■■ IP filtering technology and protocol-specific access or deny lists. ■■ Secure Sockets Layer (SSL) for HTTP or WebDAV data access, management access, and replication. ■■ Node login prevention. ■■ Shredding policy and service. ■■ Autonomic technology refresh feature, implemented as HCP migration service, enables organizations to main- tain continuously operating content stores that allows them to preserve their digital content assets for the long term. Cloud-Enabled Storage The powerful, industry-leading capabilities of HCP make it well suited to the cloud storage space. An HCP-based infrastructure solution is sufficiently flexible to accommodate any cloud deployment models (public, private or hybrid) and simplify the migration to the cloud for both service providers and subscribers. HCP provides edge-to- core, secure multitenancy and robust management capabilities, and a host of features to optimize cloud storage operations. HCP, in its role as an online data repository, is truly ready for a cloud-enabled market. While numerous HCP features were already discussed earlier in this paper, the purpose of this section is to summarize those that contribute the most to HCP cloud capabilities. They include: ■■ Large-scale multitenancy. ■■ Management segregation. HCP supports up to 1,000 tenants, each of which can be uniquely configured for use by a separate cloud service subscriber. ■■ Data segregation. HCP supports up to 10,000 namespaces, each of which can be uniquely configured for a particular application or workload. ■■ Massive scale. ■■ Petabyte repository offers 40PB of storage, 80 nodes, 32 billion user objects, and 15 million files per directory, all on a single physical system. ■■ Best node density in the object storage industry supports 500TB per node and 400+M objects per node. With fewer number of nodes, HCP requires less power, less cooling, and less floor space. ■■ Unparalleled expandability that allows organizations to “start small” and expand according to demand. ■■ Nodes and/or storage can be added to expand an HCP system’s storage and throughput capacity, without disruptions. Multiple storage systems are supported by a single HCP system. ■■ Easy tenant and storage provisioning. ■■ Geographical dispersal and global accessibility. ■■ WAN-friendly REST interface for namespace data access and replication. ■■ Replication of content across multiple sites using advanced, flexible replication topologies. ■■ WAN-optimized, high-throughput data transfer.
  • 11. WHITE PAPER 11 ■■ High availability. ■■ Fully redundant hardware. ■■ Automatic routing of client requests around hardware failures. ■■ Load balancing across all available hardware. ■■ Multiple REST interfaces. These interfaces include the REST API for namespace data access, management API, and metadata query API. REST API is a technology of choice for cloud enablers and consumers. Some of the reasons for its popularity include high efficiency and low overhead, caching at both the client and the server and API uniformity. In addition, this technology offers a stateless nature that allows accommodation of the latencies of Internet access and potentially complex firewall configurations. ■■ Secure, granular access to tenants, namespaces and objects, which is crucial in any cloud environment. This access is facilitated by the HCP multilayer, flexible permission mechanism, including object-level ACLs. ■■ Usage metering. HCP has built-in chargeback capabilities, indispensable for cloud use, to facilitate provider and subscriber transactions. HCP also provides tools for 3rd-party vendors and customers to write to the API for easy integration with the HDS solution for billing and reporting. ■■ Low-touch system that is self-monitoring, self-managing and self-healing. HCP features advanced monitor- ing, audit and reporting capabilities. HCP services can automatically repair issues if they arise. ■■ Support for multiple levels of service. This support is provided through HCP policies, service plans and quotas that can be configured for each tenant helps enforce service-level agreements (SLAs). It allows the platform to accommodate a wide range of subscriber use cases and business models on a single physical system. ■■ Edge-to-core solution. HCP, working in tandem with Hitachi Data Ingestor (HDI), provides an integrated edge- to-core solution for cloud storage deployments. HCP serves as the “engine” at the core of the HDS cloud architecture. HDI resides at the edge of the storage cloud (for instance, at a remote office or subscriber site) and serves as the “on-ramp” for application data to enter the cloud infrastructure. HDI acts as a local storage cache while migrating data into HCP and maintaining links to stored content for later retrieval. Users and applications interact with HDI at the edge of the cloud but perceive bottomless, backup-free storage provided by HCP at the core. E-Discovery, Compliance and Metadata Analysis Custom metadata enables building massive unstructured data stores by providing means for faster and more accurate access of content and giving storage managers the meaningful information they need to efficiently and intelligently process data and apply the right object policies to meet all business, compliance and protection require- ments. Regulatory compliance features include namespace retention mode (compliance and enterprise), retention classes, retention hold, automated content disposition, and privileged delete and purge. HCP search capabili- ties include support for e-discovery for litigation or audit purposes. On HCP, open APIs allow direct 3rd-party integration. HCP supports search facilities that provide an interactive interface. The search console offers a structured environ- ment for creating and executing queries (sets of criteria that each object in the search results must satisfy). Users can apply various selection criteria, such as objects stored before a certain date or larger than a specified size. Queries return metadata for objects included in the search result. This metadata can be used to retrieve the object. From the search console, users can open objects, perform bulk operations on objects (hold, release, delete, purge, privileged delete and purge, change owner, set ACL), and export search results in standard file formats for use as input to other applications.
  • 12. WHITE PAPER 12 The metadata query engine (MQE) is integrated with HCP and is always available in the HCP system. It is also used by the metadata query API, a programmatic interface for querying namespaces. The MQE index resides on desig- nated logical volumes on the HCP storage nodes, sharing or not sharing the space on these volumes with the object data, depending on the type of system and volume configuration. Search is enabled at both the tenant and namespace levels. Indexing is enabled on a per-namespace basis. Settings at the system and namespace levels determine whether custom metadata is indexed in addition to system meta- data and ACLs. If indexing of custom metadata is disabled, the MQE indexes do not include custom metadata. If a namespace is not indexed at all, searches do not return any results for objects in this namespace. Each object has an index setting that affects differently what content is indexed by the metadata query engine. If indexing is enabled for a namespace, MQE always indexes system metadata and ACLs regardless of the index set- ting for an object. If the index setting is set to true, MQE also indexes custom metadata for this object. System Fundamentals Hardware Overview An individual physical HCP instance, or HCP system, is not a single device; it is a collection of devices that, com- bined with HCP software, can provide all the features of an online object repository while tolerating node, disk and other component failures. From a hardware perspective, each HCP system consists of the following categories of components: ■■ Nodes (servers). ■■ Internal or SAN-attached storage. ■■ Networking components (switches and cabling). ■■ Infrastructure components (racks and power distribution units). Storage nodes are the vital part of HCP. They store and manage the objects that reside in the physical system stor- age. The nodes are conventional off-the-shelf servers. Each node can have multiple internal physical drives and/or connect to external Fibre Channel storage (SAN). In addition to using RAID and SAN technologies and a host of other features to protect the data, HCP uses software mirroring to store the data and metadata for each object in multiple locations on different nodes. For data, this feature is managed by the namespace DPL setting, which specifies the number of copies of each object HCP must maintain in the repository to ensure the required level of data protection. For metadata, this feature is managed by the MDPL, which is a system-wide setting. A storage node runs the complete HCP software and serves as both a repository for objects and a gateway to the data and metadata they contain. All runtime operations are distributed among the storage nodes, ensuring reliability and performance. HCP runs on a redundant array of independent nodes (RAIN) or a SAN-attached array of independent nodes (SAIN). RAIN systems use the internal storage in each node. SAIN systems use the external SAN storage. HCP is offered as 2 products: HCP 300 (based on RAIN configuration) and HCP 500 (based on SAIN configuration). HCP RAIN (HCP 300) The nodes in an HCP 300 system are Hitachi Compute Rack 220 (CR 220) servers. RAIN nodes contain internal storage: RAID controller and disks. All nodes use hardware RAID-5 data protection. In an HCP RAIN system, the physical disks in each node form a single RAID group, normally RAID-5 (5D+1P) (see Figure 3). This helps ensure the integrity of the data stored on each node.
  • 13. WHITE PAPER 13 An HCP 300 (RAIN) system must have a minimum of 4 storage nodes. Additional storage nodes are added in 4-node increments. An HCP 300 system can have a maximum of 20 nodes. HCP 300 systems are normally configured with a DPL setting of 2 (DPL2), which, coupled with hardware RAID-5, yields an effective RAID-5+1 total protection level. Figure 3. HCP 300 Hardware Architecture HCP SAIN (HCP 500/500XL) The nodes in an HCP 500 system are either Hitachi Compute Rack 220 (CR 220) servers or blades in Hitachi Compute Blade 320 (CB 320) servers. The HCP 500 nodes contain Fibre Channel host bus adapters (HBAs) and use external Fibre Channel SAN storage; they are diskless servers that boot from the SAN-attached storage. The nodes in a SAIN system can have internal storage in addition to being connected to external storage. These nodes are called HCP 500XL nodes. They are an alternative to the standard HCP 500 nodes and have the same hardware configuration, except the addition of the RAID controller and internal hard disk drives. In HCP 500XL nodes, the system metadata database resides on the local disks, which leads to more efficient and faster database opera- tions. As a result, the system has the ability to better support larger capacity and higher object counts per node and address higher performance requirements. A typical 500XL node internal storage configuration includes six 500GB 7200RPM SATA II drives in a single RAID-5 (5D+1P) RAID group, with 2 LUNs: 31GB (operating system) and 2.24TB (database). The HCP 500XL nodes are usu- ally considered when the system configuration exceeds 4 standard nodes.
  • 14. WHITE PAPER 14 HCP 500 and 500XL (SAIN) systems are supported with a minimum of 4 storage nodes. With a SAIN system, addi- tional storage nodes are added in pairs, so the system always has an even number of storage nodes. A SAIN system can have a maximum of 80 nodes. Both RAIN and SAIN systems can have a DPL as high as 4, which affords maximum data availability but greatly sacrifices storage utilization. Typically, the external SAN-attached storage uses RAID-6. Best protection and high availability of an HCP 500 system is achieved by giving each node its own RAID group or Hitachi Dynamic Provisioning (HDP) pool containing 1 RAID group. Software Overview HCP system software consists of an operating system (the appliance operating system) and core software. The core software includes components that: ■■ Enable access to the object repository through the industry-standard HTTP or HTTPS, WebDAV, CIFS, NFS, SMTP and NDMP protocols. ■■ Ingest fixed-content data, convert it into HCP objects, and manage the objects data and metadata over time. ■■ Maintain the integrity, stability, availability and security of stored data by enforcing repository policies and executing system services. ■■ Enable configuration, monitoring and management of the HCP system through a human-readable interface. ■■ Support searching the repository through an interactive Web interface (the search console) and a programmatic interface (the metadata query API). System Organization HCP is a fully symmetric, distributed application that stores and manages objects (see Figure 4). An HCP object encapsulates the raw fixed-content data that is written by a client application, and its associated system and custom metadata. Each node in an HCP system is a Linux-based server that runs a complete HCP instance. The HCP system can withstand multiple simultaneous node failures, and acts automatically to ensure that all object and namespace policies are valid.
  • 15. WHITE PAPER 15 Figure 4. The High-Level Structure of an HCP System External system communication is managed by the DNS manager, a distributed network component that balances client requests across all nodes to ensure maximum system throughput and availability. The DNS manager works in conjunction with a corporate DNS server to allow clients to access the system as a single entity, even though the system is made up of multiple independent nodes. The HCP system is configured as a subdomain of an existing corporate domain. Clients access the system using predefined protocol-specific or namespace-specific names. While not required, using DNS is important in ensuring balanced and problem-free client access to an HCP system, especially for the HTTP or REST clients. Namespaces and Tenants Main Concepts An HCP repository is partitioned into namespaces. A namespace is a logical repository as viewed by an applica- tion. Each namespace consists of a distinct logical grouping of objects with its own directory structure, such that the objects in one namespace are not visible in any other namespace. Access to one namespace does not grant a user access to any other namespace. To the user of a namespace, the namespace is the repository. Namespaces are not associated with any preallocated storage; they share the same underlying physical storage. Namespaces provide a mechanism for separating the data stored for different applications, business units or customers. For example, there may be one namespace for accounts receivable and another for accounts payable. While a single namespace can
  • 16. WHITE PAPER 16 host one or more applications, it typically hosts only one application. Namespaces also enable operations to work against selected subsets of repository objects. For example, a search could target the accounts receivable and accounts payable namespaces but not the employees namespace. Figure 5 shows the logical structure of an HCP system with respect to its multitenancy features. Figure 5. HCP System Logical Layout: Namespaces and Tenants Namespaces are owned and managed by tenants. Tenants are administrative entities that provide segregation of management, while namespaces offer segregation of data. A tenant typically represents an actual organization
  • 17. WHITE PAPER 17 such as a company or a department within a company that uses a portion of a repository. A tenant can also corre- spond to an individual person. Namespace administration is done at the owning tenant level. Clients can access HCP namespaces through HTTP or HTTPS, WebDAV, CIFS, NFS and SMTP protocols. These protocols can support authenticated and/or anonymous types of access (types of access and their combinations are discussed in more detail later in this document). HCP namespaces are owned by HCP tenants. An HCP system can have multiple HCP tenants, each of which can own multiple namespaces. The number of namespaces each HCP tenant can own can be limited by an administrator. User and Group Accounts User and group accounts control access to various HCP interfaces and give users permission to perform administra- tive tasks and access namespace content. An HCP user account is defined in HCP; it has a set of credentials, username and password, which is stored locally in the system. The HCP system uses these credentials to authenticate a user, performing local authentication. An HCP group account is a representation of an Active Directory (AD) group. To create group accounts, HCP must be configured to support Active Directory. The group account enables AD users in the AD group to access one or more of HCP interfaces. Like HCP user accounts, HCP group accounts are defined separately at the system and tenant levels. Different ten- ants have different user and group accounts. These accounts cannot be shared across tenants. Group membership is different at the system and tenant levels. HCP administrative roles can be associated with both system-level and tenant-level user and group accounts. Data access permissions can be associated with only tenant-level user and group accounts. Consequently, system-level local and AD users can only be administrative users, while tenant-level local and AD users can be both adminis- trative users and have data access permissions. Tenant-level users can have only administrative roles without namespace data permissions, or only namespace data permissions without administrative roles, or any combination of administrative roles and namespace data permissions. System and Tenant Management The implementation of segregation of management in the HCP system is illustrated in Figure 6. An HCP system has both system-level and tenant-level administrators: ■■ System-level administrative accounts are used for configuring system-wide features, monitoring system hard- ware and software and overall repository usage, and managing system-level users. The system administrator user interface, the system management console, provides the functionality needed by the maintainer of the physi- cal HCP system. For example, it allows the maintainer to shut down the system, see information about nodes, manage policies and services, and create HCP tenants. System administrators have a view of the system as a whole. This view includes all HCP software and hardware that make up the system, and can perform all of the administration for actions that have system scope. ■■ Tenant-level administrative accounts are used for creating HCP namespaces. They can configure individual ten- ants and namespaces, monitor namespace usage at the tenant and namespace level, manage tenant-level users, and control access to namespaces. The required functionality is provided by the tenant administrator user inter- face, tenant management console. This interface is intended for use by the maintainer of the virtual HCP system (an individual tenant with a set of namespaces it owns). The tenant-level administration feature facilitates segrega- tion of management, which is essential in cloud environments.
  • 18. WHITE PAPER 18 An HCP tenant can optionally grant system-level users administrative access to itself. In this case, system-level users with the monitor, administrator, security or compliance role can log into the tenant management console or use the HCP management API for that tenant. System-level users with the monitor or administrator role can also access the tenant management console directly from the system management console. This effectively enables a system administrator to function as a tenant administrator, as shown in Figure 4. System-level users can perform all the activities allowed by the tenant-level roles that correspond to their system-level roles. An AD user may belong to AD groups for which the corresponding HCP group accounts exist at both the system and tenant levels. This user has the roles associated with both the applicable system-level group accounts and the applicable tenant-level group accounts. Policies Objects in a namespace have a variety of properties, such as the retention setting or index setting. These proper- ties are defined for each object by the object system metadata. Objects can also be affected by some namespace properties, such as the default metadata settings that are inherited by new objects stored in the namespace, or the versioning setting. Both the namespace-level settings and the properties that are part of the object metadata serve as parameters for the HCP system’s transactions and services, and determine the object’s behavior during its life cycle within the repository. These settings are called policies. An HCP policy is one or more settings that influence how transactions and internal processes (services) affect objects in a namespace. Policies ensure that objects behave in expected ways. The HCP policies are described in Table 1. Table 1. HITACHI CONTENT PLATFORM Policies Policy Name Policy Description and Components Transactions and Services Influenced DPL System DPL setting, namespace DPL setting. Object creation. Protection service. Retention Default retention setting, object retention setting, hold setting, system metadata and custom metadata options for objects under retention. Object creation, object deletion, system and custom metadata handling. Disposition, Garbage collection services. Shredding Default shred setting, object shred setting. Object deletion. Shredding service. Indexing Default index setting, object index setting. MQE. Versioning Versioning setting, pruning setting. Object creation and deletion. Garbage collection service. Custom Metadata Validation XML syntax validation. Add/replace custom metadata operations. Each policy may consist of one or more settings that may have different scopes of application and methods of con- figuration. Policy settings are defined at the object and the namespace level. Note that the same policy setting may be set at different levels depending on the namespace. The default retention, shred and index settings are set at the namespace level in HCP namespaces.
  • 19. WHITE PAPER 19 Table 2 lists all policy settings sorted according to their scope and method of configuration. Table 2. HITACHI CONTENT PLATFORM Policy Settings: Scope and Configuration Policy Policy Setting HCP Namespaces Scope/Level Configured Via Data Protection Level System DPL: 1-4 System System UI Namespace DPL: 1-4, dynamic Namespace Tenant UI, MAPI Retention Default retention setting: fixed date, offset, special value, retention class Namespace Tenant UI, MAPI Retention setting: fixed date, offset, special value, retention class Object REST API, retention.txt Hold setting: true or false Object REST API Ownership and POSIX permission changes under retention: true or false Namespace Tenant UI, MAPI Custom metadata operations allowed under retention Namespace Tenant UI, MAPI Indexing Index setting: true or false (1/0) Object REST API, index.txt Default index setting: true or false Namespace Tenant UI, MAPI Shredding Shred setting: true or false (1/0) Object REST API, shred.txt Default shred setting: true or false Namespace Tenant UI, MAPI Custom Metadata Validation XML validation: true or false Namespace Tenant UI, MAPI Versioning Versioning setting: true or false Namespace Tenant UI, MAPI Pruning setting: true/false and number of days for primary or replica Namespace Tenant UI, MAPI Content Management Services A Hitachi Content Platform service is a background process that performs a specific function that is targeted at preserving and improving the overall health of the HCP system. In particular, services are responsible for optimizing the use of system resources and maintaining the integrity and availability of the data stored in the HCP repository. HCP implements 12 services: protection, content verification, scavenging, garbage collection, duplicate elimination, shredding, disposition, compression, capacity balancing, storage tiering, migration and replication. HCP services are briefly described in Table 3.
  • 20. WHITE PAPER 20 Table 3. HITACHI CONTENT PLATFORM Services Policy Description Protection Enforces DPL policy compliance by ensuring that the proper number of copies of each object exists in the system, and that damaged or lost objects can be recovered. Any policy violation invokes repair process. Offers both scheduled and event-driven service. Events trigger a full service run, even if the service is disabled, after a configurable amount of time: 90 minutes after node shutdown; 1 minute after logical volume failure; 10 minutes after node removal. Content Verification Guarantees data integrity of repository objects by ensuring that the content of a file matches its digital signature. Repairs the object if the hash does not match. Detects and repairs discrepancies between primary and secondary metadata. SHA-256 hash algorithm is used by default. Checksums are computed on external and internal files. Computationally intensive and time-consuming service. Runs according to the active service schedule. Scavenging Ensures that all objects in the repository have valid metadata, and reconstructs metadata in case the metadata is lost or corrupted, but data files exist. The service verifies that both the primary metadata for each data object and the copies of the metadata stored with the object data (secondary metadata) are complete, valid and in sync with each other. Computationally intensive and time-consuming service. Scheduled service. Garbage Collection Reclaims storage space by purging hidden data and metadata for objects marked for deletion, or left behind by incomplete transactions. It also deletes old versions of objects that are eligible for pruning. When applicable, the deletion triggers the shredding service. Scheduled service, not event driven. Duplicate Elimination Identifies and eliminates redundant objects in the repository, and merges duplicate data to free space. The hash signature of external file representations is used to select objects as input to the service. These objects are then checked in a byte for byte manner to ensure that the data contents are indeed identical. Scheduled service. Shredding Overwrites storage locations where copies of the deleted object were stored in such a way that none of its data or metadata can be reconstructed, for security reasons. Also called secure deletion. The default HCP shredding algorithm uses 3 passes to overwrite an object and is DoD 5220.22-M standard compliant. The algorithm is selected at install time. Event-driven only service, not scheduled. It is triggered by the deletion of an object marked for shredding. Disposition Automatic cleanup of expired objects. All HCP namespaces can be configured to automatically delete objects after their retention period expires. Can be enabled or disabled both at the system and namespace level; enabling disposition for a namespace has no effect if the service is disabled at the system level. Disposition service deletes only current versions of versioned objects. Scheduled service. Compression Compresses object data to make more efficient use of system storage space. The space reclaimed by compression can be used for additional storage. A number of configurable parameters are provided via System Management Console. Scheduled service. Capacity Balancing Attempts to keep the usable storage capacity balanced (roughly equivalent) across all storage nodes in the system. If storage utilization for the nodes differs by a wide margin, the service moves objects around to bring the nodes closer to a balanced state. Runs only when started manually. Additions and deletions of objects do not trigger the service. Typically, an authorized HCP service provider starts this service after adding new storage nodes to the system. In addition, while not part of the service, during normal system operation new objects tend to naturally spread among all storage nodes in the system in fairly even proportion. This is due to the nature of the storage manager selection algorithm and resource monitoring of the administrative engine. Storage Tiering Determines which storage tiering strategy applies to an object, evaluates where the copies of the object should reside based on the rules in the applied service plan, and moves objects between running and spin-down storage as needed. Active only in spindown-capable HCP SAIN systems. Scheduled service. Conclusion Hitachi Data Systems object storage solutions avoid the limitations of traditional file systems by intelligently storing content in far larger quantities and in a much more efficient manner. These solutions provide for the new demands imposed by the explosion of unstructured data and its growing importance to organizations, their partners, their customers, their governments and their shareholders.
  • 21. WHITE PAPER 21 The Hitachi Data Systems object storage solutions treat file data, file metadata and custom metadata as a single object that is tracked and stored among a variety of storage tiers. With secure multitenancy and configurable attri- butes for each logical partition, the object store can be divided into a number of smaller virtual object stores that present configurable attributes to support different service levels. This allows the object store to support a wide range of workloads, such as content preservation, data protection, content distribution and even cloud from a single physi- cal infrastructure. One infrastructure is far easier to manage than disparate silos of technology for each application or set of users. By integrating many key technologies in a single storage platform, Hitachi Data Systems object storage solutions provide a path to short-term return on investment and significant long-term efficiency improvements. They help IT evolve to meet new challenges, stay agile over the long term and address future change and growth.
  • 22. © Hitachi Data Systems Corporation 2013. All rights reserved. HITACHI is a trademark or registered trademark of Hitachi, Ltd. Microsoft, Windows and Active Directory are trademarks or registered trademarks of Microsoft Corporation. All other trademarks, service marks, and company names are properties of their respective owners. Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered by Hitachi Data Systems Corporation. WP-425-B DG May 2013 Corporate Headquarters 2845 Lafayette Street Santa Clara, CA 96050-2639 USA www.HDS.com Regional Contact Information Americas: +1 408 970 1000 or info@hds.com Europe, Middle East and Africa: +44 (0) 1753 618000 or info.emea@hds.com Asia Pacific: +852 3189 7900 or hds.marketing.apac@hds.com