Introduction to Object Storage Solutions White Paper

Introduction to Object Storage and Hitachi
Content Platform
The Fundamentals of Hitachi Content Platform
DATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG
ON POWERFUL RELEVANT PERFORMANCE SOLUTION CLO
VIRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V
WHITEPAPER
By Hitachi Data Systems
May 2013

WHITE PAPER 2
Contents
Executive Summary 3
Introduction 4
Main Concepts and Features 4
Object-Based Storage 4
Object Structure 4
Distributed Design 6
Open Architecture 6
Multitenancy 7
Object Versioning 7
Spin-Down and Storage Tiering 7
Search 8
Replication 8
Common Use Cases 8
Fixed-Content Archiving 8
Backup-Free Data Protection and Content Preservation 8
Cloud-Enabled Storage 10
E-Discovery, Compliance and Metadata Analysis 11
System Fundamentals 12
Hardware Overview 12
Software Overview 14
System Organization 14
Namespaces and Tenants 15
Main Concepts 15
User and Group Accounts 17
System and Tenant Management 17
Policies 18
Content Management Services 19
Conclusion 20

WHITE PAPER 3
Introduction to Object Storage and Hitachi Content Platform
Executive Summary
One of IT’s greatest challenges today is an explosive, uncontrolled growth of unstructured data. Continual growth of
email and documents, video, Web pages, presentations, medical images, and so forth, increase both complexity and
risk. This effect is seen particularly in distributed IT environments, such as cloud service providers and organizations
with branch or remote office sites. The vast quantity of data being created, difficulties in management and proper
handling of unstructured content, and complexity of supporting more users and applications pose challenges to IT
departments. Organizations often end up with sprawling storage silos for a multitude of applications and workloads,
with few resources available to manage, govern, protect, and search the data.
Hitachi Data Systems provides an alternative solution to these challenges through a single object storage platform
that can be divided into virtual storage systems, each configured for the desired level of service. The great scale and
rich features of this solution help IT organizations in both private enterprises and cloud service providers manage
distributed IT environments. It helps them to control the flood of storage requirements for unstructured content and
addresses a variety of workloads.

WHITE PAPER 4
Introduction
Hitachi Content Platform (HCP) is a multipurpose distributed object-based storage system designed to sup-
port large-scale repositories of unstructured data. HCP enables IT organizations and cloud service providers to
store, protect, preserve and retrieve unstructured content with a single storage platform. It supports multiple levels
of service and readily evolves with technology and scale changes. With a vast array of data protection and content
preservation technologies, the system can significantly reduce or even eliminate tape-based self backups or backups
of edge devices connected to the platform. HCP obviates the need for a siloed approach to storing unstructured
content. Massive scale, multiple storage tiers, Hitachi reliability, nondisruptive hardware and software updates, mul-
titenancy and configurable attributes for each tenant allow the platform to support a wide range of applications on
a single physical HCP instance. By dividing the physical system into multiple, uniquely configured tenants, adminis-
trators create “virtual content platforms” that can be further subdivided into namespaces for further organization of
content, policies and access. With support for thousands of tenants, tens of thousands of namespaces, and pet-
abytes of capacity in one system, HCP is truly cloud-ready.
Main Concepts and Features
Object-Based Storage
Hitachi Content Platform, as a general-purpose object store, allows unstructured data files to be stored as objects.
An object is essentially a container that includes both file data and associated metadata that describes the data.
The objects are stored in a repository. The metadata is used to define the structure and administration of the data.
HCP can also leverage object metadata to apply specific management functions, such as storage tiering, to each
object. The objects have intelligence that enables them to automatically take advantage of advanced storage and
data management features to ensure proper placement and distribution of content.
HCP architecture isolates stored data from the hardware layer. Internally, ingested files are represented as objects
that encapsulate both the data and metadata required to support applications. Externally, HCP presents each object
either as a set of files in a standard directory structure or as a uniform resource locator (URL) accessible by users and
applications via HTTP/HTTPS.
HCP stores objects in a repository. Data that is ingested and stored in the repository is permanently associated with
the information about that data, called metadata. Each data object encapsulates both object data and metadata, and
is treated within HCP as a single unit for all intents and purposes.
Object Structure
An HCP repository object is composed of file data and the associated metadata, which in turn consists of system
metadata and, optionally, custom metadata and an access control list (ACL). The structure of the object is shown in
Figure 1.
File data is an exact digital copy of the actual file contents at the time of its ingestion. If the object is under retention,
it cannot be deleted before the expiration of its retention period, except when using a special privileged operation.
If versioning is enabled, multiple versions of a file can be retained. If appendable objects are enabled, data can be
appended to an object (with the CIFS or NFS protocols) without modifying the original fixed-content data.

WHITE PAPER 5
Figure 1. HCP Object
Metadata is system or user generated data that describes the fixed-content data of an object and defines the
object’s properties. System metadata, the system-managed properties of the object, includes HCP-specific meta-
data and POSIX metadata.
HCP-specific metadata includes the date and time the object was added to the namespace (ingest time), the
date and time the object was last changed (change time), the cryptographic hash value of the object along with the
namespace hash algorithm used to generate that value, and the protocol through which the object was ingested. It
also includes the object’s policy settings, such as data protection level (DPL), retention, shredding, indexing, and, for
HCP namespaces only, versioning.
POSIX metadata includes a user ID and group ID, a POSIX permissions value, and POSIX time attributes.
Custom metadata is optional, user-supplied descriptive information about a data object that is usually provided as
well-formed XML. It is typically intended for more detailed description of the object. This metadata can also be used
by future users and applications to understand and repurpose the object content. HCP supports multiple custom
metadata fields for each object.
ACL is optional, user-provided metadata containing a set of permissions granted to users or user groups to perform
operations on an object. The ACLs are supported only in HCP namespaces.
The complete metadata structure, as supported in HCP namespaces, is shown in Figure 2. It includes all metafiles
supported by HCP for objects, which were generated for the sample data structure (assuming that custom metadata
and ACLs were added for each object).

WHITE PAPER 6
Figure 2. HCP Namespace: Complete Metadata Structure
Distributed Design
An HCP system consists of both hardware and software and comprises many different components that are con-
nected together to form a robust, scalable architecture for object-based storage. HCP runs on an array of servers, or
nodes, that are networked together to form a single physical instance. Each node is a storage node. Storage nodes
store data objects. All runtime operations and physical storage, including data and metadata, are distributed among
the storage nodes. All objects in the repository are distributed across all available storage space but still presented as
files in a standard directory structure. Objects that are physically stored on any particular node are available from all
other nodes.
Open Architecture
HCP has an open architecture that insulates stored data from technology changes, as well as from changes in HCP
itself due to product enhancements. This open architecture ensures that users will have access to the data long after
it has been added to the repository. HCP acts as both a repository that can store customer data and an online
portal that enables access to that data by means of several industry-standard interfaces, as well as through an
integrated search facility, Hitachi Data Discovery Suite (HDDS). The HTTP or HTTPS, WebDAV, CIFS and NFS pro-
tocols support various operations. These operations include storing data, creating and viewing directories, viewing
and retrieving objects and their metadata, modifying object metadata, and deleting objects. Objects that were added
using any protocol are immediately accessible through any other supported protocol. These protocols can be used to
access the data with a Web browser, the HCP client tools, 3rd-party applications, Microsoft®
Windows®
Explorer, or
native Windows or UNIX tools.
HCP allows special-purpose access to the repository through the SMTP protocol, which is used only for storing
email. For data backup and restore, HCP supports the NDMP protocol.

WHITE PAPER 7
Multitenancy
Multitenancy support allows the repository in a single physical HCP instance to be partitioned into multiple name-
spaces. A namespace is a logical partition that contains a collection of objects particular to one or more applications.
Each namespace is a private object store that is represented by a separate directory structure and has a set of
independently configured attributes. Namespaces provide segregation of data, while tenants, or groupings of
namespaces, provide segregation of management. An HCP system can have up to 1,000 tenants. Each tenant and
its set of namespaces constitute a virtual HCP system that can be accessed and managed independently by users
and applications. This HCP feature is essential in enterprise, cloud and service-provider environments.
Data access to HCP namespaces can be either authenticated or nonauthenticated, depending on the type and
configuration of the access protocol. Authentication can be performed using HCP local accounts or Microsoft Active
Directory®
groups.
Object Versioning
HCP supports object versioning, which is the capability of a namespace to create, store and manage multiple
versions of objects in the HCP repository. This ability provides a history of how the data has changed over time.
Versioning facilitates storage and replication of evolving content, thereby creating new opportunities for HCP in
markets such as content depots and workflow applications.
Versioning is available in HCP namespaces and is configured at the namespace level. Versioning is only supported
with HTTP or REST. Other protocols cannot be enabled if versioning is enabled for the namespace. Versioning
applies only to objects, not to directories or symbolic links. A new version of an object is created when an object
with the same name and location as an existing object is added to the namespace. A special type of version, called
a deleted version, is created when an object is deleted. Updates to the object metadata affect only the current ver-
sion of an object and do not create new versions.
Previous versions of objects that are older than a specified amount of time can be automatically deleted, or pruned.
It is not possible to delete specific historical versions of an object; however, a user or application with appropriate
permissions can purge the object to delete all its versions, including the current one.
Spin-Down and Storage Tiering
HCP implements spin-down disk support as an early step towards the long-term goal of supporting information life-
cycle management (ILM) and intelligent objects. In the near term, the goal of the HCP spin-down feature is to take
advantage of the energy savings potential of the spin-down technology.
HCP spindown-capable storage is based on the power savings feature of Hitachi midrange storage systems and is a
core element of the new storage tiering functionality, which is implemented as an HCP service. According to storage
tiering strategy that is specified by customers, the storage tiering service identifies objects that are eligible to reside
on spin-down storage and moves them to and from the spin-down storage as needed.
Tiering selected content to spindown-enabled storage lowers overall cost by reducing energy consumption for
large-scale unstructured data storage, such as deep archives and disaster recovery sites. Storage tiering can very
effectively be used with customer-identified “dark data” (rarely accessed data) or data replicated for disaster recovery
by moving that data to spin-down storage some time after ingestion or replication. Customer sites where data pro-
tection is critical can use storage tiering to move all redundant data copies to spin-down storage, which makes the
cost of keeping data protection copies competitive to a tape solution.
Storage tiering also enables service providers to use a turnkey framework to offer differentiated object data man-
agement plans. This capability further enhances HCP as an attractive target for fixed content, especially for
archive-oriented use cases where tape may be considered an alternative.

WHITE PAPER 8
Search
HCP provides the only integrated metadata query engine on the market. HCP includes comprehensive search capa-
bilities that enable users to search for objects in namespaces, analyze namespace contents, and manipulate groups
of objects. To satisfy government requirements, HCP supports e-discovery for audits and litigation.
The metadata query engine is always available in any HCP system, but the content search facility requires installation
of a separate HDS product, Hitachi Data Discovery Suite.
Replication
Replication, an add-on feature to HCP, is the process that keeps selected tenants and namespaces in 2 or more
HCP systems in sync with each other. The replication service copies one or more tenants or namespaces from one
HCP system to another, propagating object creations, object deletions, and metadata changes. HCP also replicates
tenant and namespace configuration, tenant-level user accounts, compliance and tenant log messages, and reten-
tion classes.
The HCP system in which the objects are initially created is called the primary system. The 2nd system is called
the replica. Typically, the primary system and the replica are in separate geographic locations and connected by
a high-speed wide area network. HCP supports different replication topologies including many-to-one and chain
configurations.
Common Use Cases
Fixed-Content Archiving
Hitachi Content Platform is optimized for fixed-content data archiving. Fixed-content data is information that does
not change but must be kept available for future reference and be easily accessible when needed. A fixed-content
storage system is one in which the data cannot be modified. HCP uses “write-once, read-many” (WORM) storage
technology, and a variety of policies and services (such as retention, content verification and protection) to ensure the
integrity of data in the repository. The WORM storage means that data, once ingested into the repository, cannot be
updated or modified; that is, the data is guaranteed to remain unchanged from when it was originally stored. If the
versioning feature is enabled within the HCP system, different versions of the data can be stored and retrieved, in
which case each version is WORM.
Backup-Free Data Protection and Content Preservation
HCP is a true backup-free platform. HCP protects content without the need for backup. It uses sophisticated data
preservation technologies, such as configurable data and metadata protection levels (MDPL), object versioning and
change tracking, multisite replication with seamless application failover, and many others. HCP includes a variety of
features designed to protect integrity, provide privacy, and ensure availability and security of stored data. Below is a
summary of the key HCP data protection features:
■■ Content immutability. This intrinsic feature of HCP WORM storage design protects the integrity of the data in the
repository.
■■ Content verification. The content verification service maintains data integrity and protects against data corrup-
tion or tampering by ensuring that the data of each object matches its cryptographic hash value. Any violation is
repaired in a self-healing fashion.
■■ Scavenging. The scavenging service ensures that all objects in the repository have valid metadata. In case meta-
data is lost or corrupted, the service tries to reconstruct it by using the secondary, or scavenging, metadata (a
copy of the metadata stored with each copy of the object data).

WHITE PAPER 9
■■ Data encryption. HCP supports encryption at rest capability that allows seamless encryption of data on the physi-
cal volumes of the repository. This ensures data privacy by preventing unauthorized access to the stored data. The
encryption and decryption are handled automatically and transparently to users and applications.
■■ Versioning. HCP uses versioning to protect against accidental deletes and storing wrong copies of objects.
■■ Data availability.
■■ RAID protection. RAID storage technology provides efficient protection from simple disk failures. SAN-based
HCP systems typically use RAID-6 erasure coding protection to guard against dual drive failures.
■■ Multipathing and zero-copy failover. These features provide data availability in SAN-attached array of inde-
pendent nodes (SAIN) systems.
■■ Data protection level and protection service. In addition to using RAID and SAN technologies to provide
data integrity and availability, HCP can use software mirroring to store the data for each object in multiple loca-
tions on different nodes. HCP groups storage nodes into protection sets with the same number of nodes in
each set, and tries to store all the copies of the data for an object in a single protection set where each copy
is stored on a different node. The protection service enforces the required level of data redundancy by check-
ing and repairing protection sets. In case of violation, it creates additional copies or deletes extra copies of an
object to bring the object into compliance. If replication is enabled, the protection service can use an object
copy from a replica system if the copy on the primary system is unavailable.
■■ Metadata redundancy. In addition to the data redundancy as specified by DPL, HCP creates multiple copies
of the metadata for an object on different nodes. Metadata protection level or MDPL is a system-wide setting
that specifies the number of copies of the metadata that the HCP system must maintain (normally 2 copies,
MDPL2). Management of MDPL redundancy is independent of the management of data copies for DPL.
■■ Nondisruptive software and hardware upgrades. HCP employs a number of techniques that minimize or
eliminate any disruption of normal system functions during software and hardware upgrades. Nondisruptive
software upgrade (NDSU) is one of these techniques that includes greatly enhanced online upgrade support,
nondisruptive patch management, and online upgrade performance improvements. HCP supports media-free
and remote upgrades, HTTP or REST drain mode, and parallel operating system (OS) installation. It also sup-
ports automatic online upgrade commit, offline upgrade duration estimate, enhanced monitoring and email
alerts, and other features.
Storage nodes can be added to an HCP system without causing any downtime. HCP also supports nondisrup-
tive storage upgrades that allow online storage addition to SAIN systems without any data outage.
■■ Seamless application failover. This feature is supported by HCP systems in a replicated topology. This
capability includes seamless failover routing feature that enables direct integration with customer-owned load
balancers by allowing HTTP requests to be serviced by any HCP system in a replication topology. Seamless
domain name system (DNS) failover is an HCP built-in multisite load-balancing and high-availability technology
that is ideal for cost efficient, best-effort customer environments.
■■ Replication. If enabled, this feature provides a multitude of mechanisms that ensure data availability. The rep-
lica system can be used both as a source for disaster recovery and to maintain data availability by providing
good object copies for protection and content verification services. If an object cannot be read from the primary
system, HCP can try to read the object from the replica if read-from-replica feature is enabled.
■■ Data security.
■■ Authentication of management and data access.

WHITE PAPER 10
■■ Granular, multilayer data access permission scheme.
■■ IP filtering technology and protocol-specific access or deny lists.
■■ Secure Sockets Layer (SSL) for HTTP or WebDAV data access, management access, and replication.
■■ Node login prevention.
■■ Shredding policy and service.
■■ Autonomic technology refresh feature, implemented as HCP migration service, enables organizations to main-
tain continuously operating content stores that allows them to preserve their digital content assets for the long
term.
Cloud-Enabled Storage
The powerful, industry-leading capabilities of HCP make it well suited to the cloud storage space. An HCP-based
infrastructure solution is sufficiently flexible to accommodate any cloud deployment models (public, private or
hybrid) and simplify the migration to the cloud for both service providers and subscribers. HCP provides edge-to-
core, secure multitenancy and robust management capabilities, and a host of features to optimize cloud storage
operations.
HCP, in its role as an online data repository, is truly ready for a cloud-enabled market. While numerous HCP features
were already discussed earlier in this paper, the purpose of this section is to summarize those that contribute the
most to HCP cloud capabilities. They include:
■■ Large-scale multitenancy.
■■ Management segregation. HCP supports up to 1,000 tenants, each of which can be uniquely configured for
use by a separate cloud service subscriber.
■■ Data segregation. HCP supports up to 10,000 namespaces, each of which can be uniquely configured for a
particular application or workload.
■■ Massive scale.
■■ Petabyte repository offers 40PB of storage, 80 nodes, 32 billion user objects, and 15 million files per directory,
all on a single physical system.
■■ Best node density in the object storage industry supports 500TB per node and 400+M objects per node. With
fewer number of nodes, HCP requires less power, less cooling, and less floor space.
■■ Unparalleled expandability that allows organizations to “start small” and expand according to demand.
■■ Nodes and/or storage can be added to expand an HCP system’s storage and throughput capacity, without
disruptions. Multiple storage systems are supported by a single HCP system.
■■ Easy tenant and storage provisioning.
■■ Geographical dispersal and global accessibility.
■■ WAN-friendly REST interface for namespace data access and replication.
■■ Replication of content across multiple sites using advanced, flexible replication topologies.
■■ WAN-optimized, high-throughput data transfer.

WHITE PAPER 11
■■ High availability.
■■ Fully redundant hardware.
■■ Automatic routing of client requests around hardware failures.
■■ Load balancing across all available hardware.
■■ Multiple REST interfaces. These interfaces include the REST API for namespace data access, management API,
and metadata query API. REST API is a technology of choice for cloud enablers and consumers. Some of the
reasons for its popularity include high efficiency and low overhead, caching at both the client and the server and
API uniformity. In addition, this technology offers a stateless nature that allows accommodation of the latencies of
Internet access and potentially complex firewall configurations.
■■ Secure, granular access to tenants, namespaces and objects, which is crucial in any cloud environment. This
access is facilitated by the HCP multilayer, flexible permission mechanism, including object-level ACLs.
■■ Usage metering. HCP has built-in chargeback capabilities, indispensable for cloud use, to facilitate provider and
subscriber transactions. HCP also provides tools for 3rd-party vendors and customers to write to the API for easy
integration with the HDS solution for billing and reporting.
■■ Low-touch system that is self-monitoring, self-managing and self-healing. HCP features advanced monitor-
ing, audit and reporting capabilities. HCP services can automatically repair issues if they arise.
■■ Support for multiple levels of service. This support is provided through HCP policies, service plans and quotas
that can be configured for each tenant helps enforce service-level agreements (SLAs). It allows the platform to
accommodate a wide range of subscriber use cases and business models on a single physical system.
■■ Edge-to-core solution. HCP, working in tandem with Hitachi Data Ingestor (HDI), provides an integrated edge-
to-core solution for cloud storage deployments. HCP serves as the “engine” at the core of the HDS cloud
architecture. HDI resides at the edge of the storage cloud (for instance, at a remote office or subscriber site) and
serves as the “on-ramp” for application data to enter the cloud infrastructure. HDI acts as a local storage cache
while migrating data into HCP and maintaining links to stored content for later retrieval. Users and applications
interact with HDI at the edge of the cloud but perceive bottomless, backup-free storage provided by HCP at the
core.
E-Discovery, Compliance and Metadata Analysis
Custom metadata enables building massive unstructured data stores by providing means for faster and more
accurate access of content and giving storage managers the meaningful information they need to efficiently and
intelligently process data and apply the right object policies to meet all business, compliance and protection require-
ments. Regulatory compliance features include namespace retention mode (compliance and enterprise), retention
classes, retention hold, automated content disposition, and privileged delete and purge. HCP search capabili-
ties include support for e-discovery for litigation or audit purposes. On HCP, open APIs allow direct 3rd-party
integration.
HCP supports search facilities that provide an interactive interface. The search console offers a structured environ-
ment for creating and executing queries (sets of criteria that each object in the search results must satisfy). Users
can apply various selection criteria, such as objects stored before a certain date or larger than a specified size.
Queries return metadata for objects included in the search result. This metadata can be used to retrieve the object.
From the search console, users can open objects, perform bulk operations on objects (hold, release, delete, purge,
privileged delete and purge, change owner, set ACL), and export search results in standard file formats for use as
input to other applications.

WHITE PAPER 12
The metadata query engine (MQE) is integrated with HCP and is always available in the HCP system. It is also used
by the metadata query API, a programmatic interface for querying namespaces. The MQE index resides on desig-
nated logical volumes on the HCP storage nodes, sharing or not sharing the space on these volumes with the object
data, depending on the type of system and volume configuration.
Search is enabled at both the tenant and namespace levels. Indexing is enabled on a per-namespace basis. Settings
at the system and namespace levels determine whether custom metadata is indexed in addition to system meta-
data and ACLs. If indexing of custom metadata is disabled, the MQE indexes do not include custom metadata. If a
namespace is not indexed at all, searches do not return any results for objects in this namespace.
Each object has an index setting that affects differently what content is indexed by the metadata query engine. If
indexing is enabled for a namespace, MQE always indexes system metadata and ACLs regardless of the index set-
ting for an object. If the index setting is set to true, MQE also indexes custom metadata for this object.
System Fundamentals
Hardware Overview
An individual physical HCP instance, or HCP system, is not a single device; it is a collection of devices that, com-
bined with HCP software, can provide all the features of an online object repository while tolerating node, disk and
other component failures.
From a hardware perspective, each HCP system consists of the following categories of components:
■■ Nodes (servers).
■■ Internal or SAN-attached storage.
■■ Networking components (switches and cabling).
■■ Infrastructure components (racks and power distribution units).
Storage nodes are the vital part of HCP. They store and manage the objects that reside in the physical system stor-
age. The nodes are conventional off-the-shelf servers. Each node can have multiple internal physical drives and/or
connect to external Fibre Channel storage (SAN). In addition to using RAID and SAN technologies and a host of other
features to protect the data, HCP uses software mirroring to store the data and metadata for each object in multiple
locations on different nodes. For data, this feature is managed by the namespace DPL setting, which specifies the
number of copies of each object HCP must maintain in the repository to ensure the required level of data protection.
For metadata, this feature is managed by the MDPL, which is a system-wide setting.
A storage node runs the complete HCP software and serves as both a repository for objects and a gateway to the
data and metadata they contain. All runtime operations are distributed among the storage nodes, ensuring reliability
and performance.
HCP runs on a redundant array of independent nodes (RAIN) or a SAN-attached array of independent nodes
(SAIN). RAIN systems use the internal storage in each node. SAIN systems use the external SAN storage. HCP is
offered as 2 products: HCP 300 (based on RAIN configuration) and HCP 500 (based on SAIN configuration).
HCP RAIN (HCP 300)
The nodes in an HCP 300 system are Hitachi Compute Rack 220 (CR 220) servers. RAIN nodes contain internal
storage: RAID controller and disks. All nodes use hardware RAID-5 data protection. In an HCP RAIN system, the
physical disks in each node form a single RAID group, normally RAID-5 (5D+1P) (see Figure 3). This helps ensure the
integrity of the data stored on each node.

WHITE PAPER 13
An HCP 300 (RAIN) system must have a minimum of 4 storage nodes. Additional storage nodes are added in
4-node increments. An HCP 300 system can have a maximum of 20 nodes.
HCP 300 systems are normally configured with a DPL setting of 2 (DPL2), which, coupled with hardware RAID-5,
yields an effective RAID-5+1 total protection level.
Figure 3. HCP 300 Hardware Architecture
HCP SAIN (HCP 500/500XL)
The nodes in an HCP 500 system are either Hitachi Compute Rack 220 (CR 220) servers or blades in Hitachi
Compute Blade 320 (CB 320) servers. The HCP 500 nodes contain Fibre Channel host bus adapters (HBAs) and
use external Fibre Channel SAN storage; they are diskless servers that boot from the SAN-attached storage.
The nodes in a SAIN system can have internal storage in addition to being connected to external storage. These
nodes are called HCP 500XL nodes. They are an alternative to the standard HCP 500 nodes and have the same
hardware configuration, except the addition of the RAID controller and internal hard disk drives. In HCP 500XL nodes,
the system metadata database resides on the local disks, which leads to more efficient and faster database opera-
tions. As a result, the system has the ability to better support larger capacity and higher object counts per node and
address higher performance requirements.
A typical 500XL node internal storage configuration includes six 500GB 7200RPM SATA II drives in a single RAID-5
(5D+1P) RAID group, with 2 LUNs: 31GB (operating system) and 2.24TB (database). The HCP 500XL nodes are usu-
ally considered when the system configuration exceeds 4 standard nodes.

WHITE PAPER 14
HCP 500 and 500XL (SAIN) systems are supported with a minimum of 4 storage nodes. With a SAIN system, addi-
tional storage nodes are added in pairs, so the system always has an even number of storage nodes. A SAIN system
can have a maximum of 80 nodes.
Both RAIN and SAIN systems can have a DPL as high as 4, which affords maximum data availability but greatly
sacrifices storage utilization. Typically, the external SAN-attached storage uses RAID-6. Best protection and
high availability of an HCP 500 system is achieved by giving each node its own RAID group or Hitachi Dynamic
Provisioning (HDP) pool containing 1 RAID group.
Software Overview
HCP system software consists of an operating system (the appliance operating system) and core software. The core
software includes components that:
■■ Enable access to the object repository through the industry-standard HTTP or HTTPS, WebDAV, CIFS, NFS,
SMTP and NDMP protocols.
■■ Ingest fixed-content data, convert it into HCP objects, and manage the objects data and metadata over time.
■■ Maintain the integrity, stability, availability and security of stored data by enforcing repository policies and executing
system services.
■■ Enable configuration, monitoring and management of the HCP system through a human-readable interface.
■■ Support searching the repository through an interactive Web interface (the search console) and a programmatic
interface (the metadata query API).
System Organization
HCP is a fully symmetric, distributed application that stores and manages objects (see Figure 4). An HCP object
encapsulates the raw fixed-content data that is written by a client application, and its associated system and
custom metadata. Each node in an HCP system is a Linux-based server that runs a complete HCP instance. The
HCP system can withstand multiple simultaneous node failures, and acts automatically to ensure that all object and
namespace policies are valid.

WHITE PAPER 15
Figure 4. The High-Level Structure of an HCP System
External system communication is managed by the DNS manager, a distributed network component that balances
client requests across all nodes to ensure maximum system throughput and availability. The DNS manager works in
conjunction with a corporate DNS server to allow clients to access the system as a single entity, even though the
system is made up of multiple independent nodes.
The HCP system is configured as a subdomain of an existing corporate domain. Clients access the system using
predefined protocol-specific or namespace-specific names.
While not required, using DNS is important in ensuring balanced and problem-free client access to an HCP system,
especially for the HTTP or REST clients.
Namespaces and Tenants
Main Concepts
An HCP repository is partitioned into namespaces. A namespace is a logical repository as viewed by an applica-
tion. Each namespace consists of a distinct logical grouping of objects with its own directory structure, such that the
objects in one namespace are not visible in any other namespace. Access to one namespace does not grant a user
access to any other namespace. To the user of a namespace, the namespace is the repository. Namespaces are not
associated with any preallocated storage; they share the same underlying physical storage. Namespaces provide a
mechanism for separating the data stored for different applications, business units or customers. For example, there
may be one namespace for accounts receivable and another for accounts payable. While a single namespace can

WHITE PAPER 16
host one or more applications, it typically hosts only one application. Namespaces also enable operations to work
against selected subsets of repository objects. For example, a search could target the accounts receivable and
accounts payable namespaces but not the employees namespace.
Figure 5 shows the logical structure of an HCP system with respect to its multitenancy features.
Figure 5. HCP System Logical Layout: Namespaces and Tenants
Namespaces are owned and managed by tenants. Tenants are administrative entities that provide segregation of
management, while namespaces offer segregation of data. A tenant typically represents an actual organization

WHITE PAPER 17
such as a company or a department within a company that uses a portion of a repository. A tenant can also corre-
spond to an individual person. Namespace administration is done at the owning tenant level.
Clients can access HCP namespaces through HTTP or HTTPS, WebDAV, CIFS, NFS and SMTP protocols. These
protocols can support authenticated and/or anonymous types of access (types of access and their combinations are
discussed in more detail later in this document). HCP namespaces are owned by HCP tenants. An HCP system can
have multiple HCP tenants, each of which can own multiple namespaces. The number of namespaces each HCP
tenant can own can be limited by an administrator.
User and Group Accounts
User and group accounts control access to various HCP interfaces and give users permission to perform administra-
tive tasks and access namespace content.
An HCP user account is defined in HCP; it has a set of credentials, username and password, which is stored locally
in the system. The HCP system uses these credentials to authenticate a user, performing local authentication.
An HCP group account is a representation of an Active Directory (AD) group. To create group accounts, HCP must
be configured to support Active Directory. The group account enables AD users in the AD group to access one or
more of HCP interfaces.
Like HCP user accounts, HCP group accounts are defined separately at the system and tenant levels. Different ten-
ants have different user and group accounts. These accounts cannot be shared across tenants. Group membership
is different at the system and tenant levels.
HCP administrative roles can be associated with both system-level and tenant-level user and group accounts. Data
access permissions can be associated with only tenant-level user and group accounts. Consequently, system-level
local and AD users can only be administrative users, while tenant-level local and AD users can be both adminis-
trative users and have data access permissions. Tenant-level users can have only administrative roles without
namespace data permissions, or only namespace data permissions without administrative roles, or any combination
of administrative roles and namespace data permissions.
System and Tenant Management
The implementation of segregation of management in the HCP system is illustrated in Figure 6.
An HCP system has both system-level and tenant-level administrators:
■■ System-level administrative accounts are used for configuring system-wide features, monitoring system hard-
ware and software and overall repository usage, and managing system-level users. The system administrator user
interface, the system management console, provides the functionality needed by the maintainer of the physi-
cal HCP system. For example, it allows the maintainer to shut down the system, see information about nodes,
manage policies and services, and create HCP tenants. System administrators have a view of the system as a
whole. This view includes all HCP software and hardware that make up the system, and can perform all of the
administration for actions that have system scope.
■■ Tenant-level administrative accounts are used for creating HCP namespaces. They can configure individual ten-
ants and namespaces, monitor namespace usage at the tenant and namespace level, manage tenant-level users,
and control access to namespaces. The required functionality is provided by the tenant administrator user inter-
face, tenant management console. This interface is intended for use by the maintainer of the virtual HCP system
(an individual tenant with a set of namespaces it owns). The tenant-level administration feature facilitates segrega-
tion of management, which is essential in cloud environments.

WHITE PAPER 18
An HCP tenant can optionally grant system-level users administrative access to itself. In this case, system-level
users with the monitor, administrator, security or compliance role can log into the tenant management console or
use the HCP management API for that tenant. System-level users with the monitor or administrator role can also
access the tenant management console directly from the system management console. This effectively enables a
system administrator to function as a tenant administrator, as shown in Figure 4. System-level users can perform all
the activities allowed by the tenant-level roles that correspond to their system-level roles. An AD user may belong
to AD groups for which the corresponding HCP group accounts exist at both the system and tenant levels. This
user has the roles associated with both the applicable system-level group accounts and the applicable tenant-level
group accounts.
Policies
Objects in a namespace have a variety of properties, such as the retention setting or index setting. These proper-
ties are defined for each object by the object system metadata. Objects can also be affected by some namespace
properties, such as the default metadata settings that are inherited by new objects stored in the namespace, or the
versioning setting. Both the namespace-level settings and the properties that are part of the object metadata serve as
parameters for the HCP system’s transactions and services, and determine the object’s behavior during its life cycle
within the repository. These settings are called policies.
An HCP policy is one or more settings that influence how transactions and internal processes (services) affect
objects in a namespace. Policies ensure that objects behave in expected ways.
The HCP policies are described in Table 1.
Table 1. HITACHI CONTENT PLATFORM Policies
Policy Name Policy Description and Components Transactions and Services Influenced
DPL System DPL setting, namespace DPL setting. Object creation. Protection service.
Retention Default retention setting, object retention setting,
hold setting, system metadata and custom
metadata options for objects under retention.
Object creation, object deletion, system and
custom metadata handling. Disposition, Garbage
collection services.
Shredding Default shred setting, object shred setting. Object deletion. Shredding service.
Indexing Default index setting, object index setting. MQE.
Versioning Versioning setting, pruning setting. Object creation and deletion. Garbage collection
service.
Custom Metadata Validation XML syntax validation. Add/replace custom metadata operations.
Each policy may consist of one or more settings that may have different scopes of application and methods of con-
figuration. Policy settings are defined at the object and the namespace level. Note that the same policy setting may
be set at different levels depending on the namespace. The default retention, shred and index settings are set at the
namespace level in HCP namespaces.

WHITE PAPER 19
Table 2 lists all policy settings sorted according to their scope and method of configuration.
Table 2. HITACHI CONTENT PLATFORM Policy Settings: Scope and Configuration
Policy Policy Setting
HCP Namespaces
Scope/Level Configured Via
Data Protection Level System DPL: 1-4 System System UI
Namespace DPL: 1-4, dynamic Namespace Tenant UI, MAPI
Retention Default retention setting: fixed date, offset, special value,
retention class
Namespace Tenant UI, MAPI
Retention setting: fixed date, offset, special value,
retention class
Object REST API,
retention.txt
Hold setting: true or false Object REST API
Ownership and POSIX permission changes under
retention: true or false
Custom metadata operations allowed under retention Namespace Tenant UI, MAPI
Indexing Index setting: true or false (1/0) Object REST API, index.txt
Default index setting: true or false Namespace Tenant UI, MAPI
Shredding Shred setting: true or false (1/0) Object REST API, shred.txt
Default shred setting: true or false Namespace Tenant UI, MAPI
Custom Metadata Validation XML validation: true or false Namespace Tenant UI, MAPI
Versioning Versioning setting: true or false Namespace Tenant UI, MAPI
Pruning setting: true/false and number of days for
primary or replica
Content Management Services
A Hitachi Content Platform service is a background process that performs a specific function that is targeted at
preserving and improving the overall health of the HCP system. In particular, services are responsible for optimizing
the use of system resources and maintaining the integrity and availability of the data stored in the HCP repository.
HCP implements 12 services: protection, content verification, scavenging, garbage collection, duplicate elimination,
shredding, disposition, compression, capacity balancing, storage tiering, migration and replication.
HCP services are briefly described in Table 3.

WHITE PAPER 20
Table 3. HITACHI CONTENT PLATFORM Services
Policy Description
Protection Enforces DPL policy compliance by ensuring that the proper number of copies of each object exists in the system,
and that damaged or lost objects can be recovered. Any policy violation invokes repair process. Offers both
scheduled and event-driven service. Events trigger a full service run, even if the service is disabled, after a
configurable amount of time: 90 minutes after node shutdown; 1 minute after logical volume failure; 10 minutes
after node removal.
Content Verification Guarantees data integrity of repository objects by ensuring that the content of a file matches its digital signature.
Repairs the object if the hash does not match. Detects and repairs discrepancies between primary and secondary
metadata. SHA-256 hash algorithm is used by default. Checksums are computed on external and internal files.
Computationally intensive and time-consuming service. Runs according to the active service schedule.
Scavenging Ensures that all objects in the repository have valid metadata, and reconstructs metadata in case the metadata is
lost or corrupted, but data files exist. The service verifies that both the primary metadata for each data object and
the copies of the metadata stored with the object data (secondary metadata) are complete, valid and in sync with
each other. Computationally intensive and time-consuming service. Scheduled service.
Garbage Collection Reclaims storage space by purging hidden data and metadata for objects marked for deletion, or left behind by
incomplete transactions. It also deletes old versions of objects that are eligible for pruning. When applicable, the
deletion triggers the shredding service. Scheduled service, not event driven.
Duplicate Elimination Identifies and eliminates redundant objects in the repository, and merges duplicate data to free space. The hash
signature of external file representations is used to select objects as input to the service. These objects are then
checked in a byte for byte manner to ensure that the data contents are indeed identical. Scheduled service.
Shredding Overwrites storage locations where copies of the deleted object were stored in such a way that none of its data
or metadata can be reconstructed, for security reasons. Also called secure deletion. The default HCP shredding
algorithm uses 3 passes to overwrite an object and is DoD 5220.22-M standard compliant. The algorithm is
selected at install time. Event-driven only service, not scheduled. It is triggered by the deletion of an object marked
for shredding.
Disposition Automatic cleanup of expired objects. All HCP namespaces can be configured to automatically delete objects
after their retention period expires. Can be enabled or disabled both at the system and namespace level; enabling
disposition for a namespace has no effect if the service is disabled at the system level. Disposition service deletes
only current versions of versioned objects. Scheduled service.
Compression Compresses object data to make more efficient use of system storage space. The space reclaimed by
compression can be used for additional storage. A number of configurable parameters are provided via System
Management Console. Scheduled service.
Capacity Balancing Attempts to keep the usable storage capacity balanced (roughly equivalent) across all storage nodes in the
system. If storage utilization for the nodes differs by a wide margin, the service moves objects around to bring
the nodes closer to a balanced state. Runs only when started manually. Additions and deletions of objects do
not trigger the service. Typically, an authorized HCP service provider starts this service after adding new storage
nodes to the system. In addition, while not part of the service, during normal system operation new objects tend
to naturally spread among all storage nodes in the system in fairly even proportion. This is due to the nature of the
storage manager selection algorithm and resource monitoring of the administrative engine.
Storage Tiering Determines which storage tiering strategy applies to an object, evaluates where the copies of the object should
reside based on the rules in the applied service plan, and moves objects between running and spin-down storage
as needed. Active only in spindown-capable HCP SAIN systems. Scheduled service.
Conclusion
Hitachi Data Systems object storage solutions avoid the limitations of traditional file systems by intelligently storing
content in far larger quantities and in a much more efficient manner. These solutions provide for the new demands
imposed by the explosion of unstructured data and its growing importance to organizations, their partners, their
customers, their governments and their shareholders.

WHITE PAPER 21
The Hitachi Data Systems object storage solutions treat file data, file metadata and custom metadata as a single
object that is tracked and stored among a variety of storage tiers. With secure multitenancy and configurable attri-
butes for each logical partition, the object store can be divided into a number of smaller virtual object stores that
present configurable attributes to support different service levels. This allows the object store to support a wide range
of workloads, such as content preservation, data protection, content distribution and even cloud from a single physi-
cal infrastructure. One infrastructure is far easier to manage than disparate silos of technology for each application or
set of users. By integrating many key technologies in a single storage platform, Hitachi Data Systems object storage
solutions provide a path to short-term return on investment and significant long-term efficiency improvements. They
help IT evolve to meet new challenges, stay agile over the long term and address future change and growth.

© Hitachi Data Systems Corporation 2013. All rights reserved. HITACHI is a trademark or registered trademark of Hitachi, Ltd. Microsoft, Windows and Active Directory are trademarks or
registered trademarks of Microsoft Corporation. All other trademarks, service marks, and company names are properties of their respective owners.
Notice: This document is for informational purposes only, and does not set forth any warranty, expressed or implied, concerning any equipment or service offered or to be offered by
Hitachi Data Systems Corporation.
WP-425-B DG May 2013
Corporate Headquarters
2845 Lafayette Street
Santa Clara, CA 96050-2639 USA
www.HDS.com
Regional Contact Information
Americas: +1 408 970 1000 or info@hds.com
Europe, Middle East and Africa: +44 (0) 1753 618000 or info.emea@hds.com
Asia Pacific: +852 3189 7900 or hds.marketing.apac@hds.com

Introduction to Object Storage Solutions White Paper

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Object Storage Solutions White Paper

Similar to Introduction to Object Storage Solutions White Paper (20)

More from Hitachi Vantara

More from Hitachi Vantara (20)

Recently uploaded

Recently uploaded (20)

Introduction to Object Storage Solutions White Paper