HPC DAY 2017 | HPE Storage and Data Management for Big Data
1. HPE Storage and
Data Management
for Big Data
1
Volodymyr Saviak, HPE HPC Sales Manager
2. Agenda
1. Different HPC Storage Requirements and solutions. Intro – 5 minutes
2. HPC Lustre based storage – 10 minutes
3. NVME low latency storage – 5 minutes
4. DMF as data management – 10 minutes
2
3. Mission Critical Storage and HPC Storage Differences
3
Mission Critical Storage HPC Storage
Critical buying factors
• Robustness
• Features
• Price, price performance
• Throughput
• Capacity
Willing to
compromise on
• Price, price performance
• Throughput
• Capacity
• Robustness
• Features
Summary: Need different products for different markets
4. Different kind of solutions we deliver
Features Parallel Performance Low Latency Storage Active Archive
Capacity ++ (100PB) +(250TB) +++(500PB)
Multiple nodes parallel
file/object access
+++(many thousands) ++(128/512) +
High Bandwidth +++ ++ ++
Bandwidth per sNode +++ +++ +
High IOps ++ +++ +
Low Latency + ++ -
Disaster Tolerance - - +++
Heterogeneous access -/+ -/+ +++
Multiprotocol access - - ++
4
5. Recent requests coming from the fields
Typical technical requirements (extract)
– HPC customer wants to build 1PByte Storage for Genome Sequencing with 30GByte/s write performance
– Bank wants to build 100TByte storage with near memory performance for Oracle Data Warehouse
– Telco wants to build 3PByte storage with 80GByte/s write throughput
– Government organization wants to build 10PByte archive with 99.9999% data availability
Confidential – For Training Purposes Only 5
Scale-out storage helps to grow storage capacity without pain
7. Accelerate Your Workload With HPE NVMe SSDs
Reliability
for continuous performance with
less downtime
Efficiency
for lower TCO
Performance
for faster business results
8. Difference between SATA and NVMe
–NVMe is actually not an
interface but a language.
–NVMe is a different language
that is optimized to reduce
overhead when making
requests to the SSD.
8
9. NVMe Deployment Challenges
9
App
App
App
App
App
App
All Flash Arrays Server-side Flash
• Array software that does not take full advantage of flash characteristics
• Network and fabric latencies
• I/O stack bottlenecks
• Capacity challenges: provisioning optimization is difficult
• Creates data locality issue
• No centralized management
• Low utilization rates
Applications can have access to large pools
of flash but with limitations
Applications benefit from maximum flash
performance but without shared data
10. Local NVMe Devices
10
Benefits
– Very Low Latency
– High IOPs
– High Throughput
– Commodity Pricing
The Reality
– DAS
– No Logical Volumes
– No Data Protection
– No High Availability
– No Application movement
– Excess (wasted) IOPs
11. How NVMe Scale-out Storage Works
11
Centralized Management
(GUI, RESTful HTTP)
Control Path
Effective Data Path
NVMe Client(s) NVMe Target
(unmodified)
Applications
Intelligent
Client Block
Driver
High Speed Network
R-NIC
CPU
NVMe Drive(s)
NVMe
Target Module
R-NIC
12. Converged, Disaggregated or mixed
12
Local Storage in Application Server Storage is Centralized
• Storage is unified into one pool
• NVMesh Target Module & Intelligent Client
Block Driver run on all nodes
• NVMesh bypasses server CPU
• Linearly scalable
I/O
I/O
NVMesh Target Module
Intelligent Client Block Driver
NVMesh Target Module
Intelligent Client Block Driver
• Storage is unified into one pool
• NVMesh Target Module runs on storage nodes
• Intelligent Client Block Driver runs on server nodes
• Applications get performance of local storage
13. Performance
13
Remote IOPS = Local IOPS
Remote BW = Local BW
Remote Lat. = Local Lat. + ~5us
2RU server with 24 NVMe drives:
> 4.9M 4KB IOPS
> 24GB/s
Scalability
20 servers, shared data, > 99%
efficiency
128 servers @ NASA > 130 GB/s
writing through a shared file
system
Converged-ready
Using RDDA, 0% target CPU
usage
Ubiquitous Access Highly Optimized
14. Customer Successes
Pooling NVMe enables
new Science use cases at NASA
Use Case
• Large-scale modeling, simulation, analysis and visualization
• Visualizes supercomputer simulation data on 128 monitors
from a 128 node cluster
Problem
• Interactive work is generally small IO's
• Introduction of high-performance local NVMe SSD’s
create the problem of data locality
Solution
NVMesh enables NASA to create a petabyte-scale unified pool of
high-performance flash distributed retaining the speeds and
latencies of directly-attached media
17. + XX GB/Sec
+ XX GB/Sec
+ XX,000 Metadata Ops
+ XX,000 Metadata Ops
Data Management | Lustre Designed to Scale Out
Management
Network
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
Storage servers grouped
into failover pairs
Data Network (LNET)
(InfiniBand/Ethernet/Omnipath)
Lustre Clients
Management Server (MGS)
Metadata Server (MDS)
Object Storage
Servers (OSS)
XX,000 Metadata Ops
Metadata
Target (MDT)
Metadata
Target (MDT)
+ XX,000 Metadata Ops XX GB/Sec
Object Storage
Targets (OSTs)
+ XX GB/Sec
Customer
Goal
60GB/sec
17GB/sec +
17GB/sec +
17GB/sec +
17GB/sec
18. Data Management | Lustre Update Shift to Community Lustre
–Intel initiated a process to consolidate its Lustre
efforts around a single version of Lustre that will be
available from the community as open source
–All proprietary elements of Intel Enterprise Edition
for Lustre were contributed by Intel to the community
–HPE will deliver an updated Apollo 4520 Lustre
solution based on Community Lustre in late 2H2017
ORIGINAL
Community Lustre
Intel Enterprise Edition Lustre
Intel Foundation Edition Lustre
Intel Cloud Edition Lustre
NEW
Community Lustre
ORIGINAL NEW
Integrated
on HPE
Hardware?
Yes Yes
Lustre
Version
Intel
Enterprise
Edition for
Lustre 3.1
Community
Lustre
2.10
L1 Support HPE HPE
L2 Support HPE HPE
L3 Support Intel Intel
19. Data Management | Lustre Roadmap and Relevance
Key Features
• Multi-Rail LNET for data
pipeline scalability
• Progressive File Layouts
for performance and more
efficient/balanced file
storage
• Data on MDT for direct
small file storage on MDT
(flash)
20. + XX GB/Sec
+ XX GB/Sec
+ XX,000 Metadata Ops
+ XX,000 Metadata Ops
Data Management | Lustre Multi-Rail LNET
Management
Network
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
Storage servers grouped
into failover pairs
Data Network (LNET)
(InfiniBand/Ethernet/Omnipath)
Lustre Clients
Management Server (MGS)
Metadata Server (MDS)
Object Storage
Servers (OSS)
XX,000 Metadata Ops
Metadata
Target (MDT)
Metadata
Target (MDT)
+ XX,000 Metadata Ops XX GB/Sec
Object Storage
Targets (OSTs)
+ XX GB/Sec
Multiple Fabric
Adapters/Connections
21. Data Management | Lustre Roadmap and Approach
Management
Network
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
Data Network (LNET)
(InfiniBand/Ethernet/Omnipath)
Lustre Clients
Storage
Monitoring
Management Server (MGS)
Metadata Server (MDS) Object Storage Servers (OSS)
Storage servers grouped
into failover pairs
Current Small File
I/O Model
New Small File
I/O Model
Client Data I/O
Small writes go to MDT Large writes go to OST
22. HPE Apollo 4520 Scalable Storage with Lustre
22
Designed for Petabyte-Scale Data Sets
Density Optimized Design For Scale
• Dense Storage Design Translates to Lower $/GB
• Linear performance and capacity scaling
ZFS for File Protection and Performance
ZFS file system provides advanced data protection
• ZFS RAID provides Snapshot, Compression & Error Correction
High Performance Storage Solution
Meets Demanding I/O requirements
• Up to 51GB/sec per rack using balanced architecture based on
4520 Lustre Server with D6020 JBODs
Services and support
Installation and support services
• Factory tested and validated, deployment services for installation
• 24/7 Support services
HPE & Partner Confidential
Apollo 4520 controller
26. Data Management | Lustre HSM Data Management Guidelines
26
Data always lives longer than the
hardware on which it is stored.
Forward migration to new technology
should never adversely impact the users.
27. Data Management | HPC Storage Landscape New Model
Key Takeaways
• Disaggregate and scale High-
Performance Storage Tier
Independently from Capacity Tier
• Co-locate Performance Tier with
Compute and Fabric
• Implement tiered Data Management
for Capacity Scaling and Data
Protection
High-Performance Storage
Capacity Storage, Protection
& Data Management
Compute
Tiered Data Movement and Management are a
Key Requirement – and HPE Data Management
Framework (DMF) meets that need
29. Data Management | DMF Advanced Tape Storage Integration
29
• DMF is certified with libraries from
Spectra Logic, Oracle (StorageTek),
IBM and HPE portfolio of tape libraries
• Support for latest LTO and Enterprise-
class drive technology
• Advanced feature support for
accelerated retrieval and automated
library management
• Certification guide for libraries and
drives is available – and updated
regularly
30. Data Management | DMF Object Storage Support
30
High-Performance File System
DMF Policy and Migration Engine
HPDAHPC
DMF Data Management Layer
Cloud &
Object
Storage
Offsite Data
Replication
DMF
Zero Watt
Storage Onsite Tape
Storage
Secure
Offsite Tape
NFS
RAID or Flash-based Storage
CIFS XFS CXFS Lustre
Object Storage System in
an DMF Architecture:
• Standards-based Integration:
– Use of S3 interface enables
compatibility with Scality, HGST
Active Archive, Amazon S3, CEPH,
NetApp StorageGrid, DDN WOS and
open source alternatives
• Accessibility:
– High resilience and data integrity for
a variety of use cases
• Scalability & Throughput:
– Scalable DMF connections to object
storage environment
– DMF Parallel Data Mover
architecture with high-availability and
failover
• Flexibility:
– Ability to blend object storage with
alternative storage options including
Zero Watt Storage (performance) or
tape (off-site disaster recovery)
31. Data Management | DMF Zero Watt Storage
31
High Performance & Density:
• 70 x 3.5” SAS drives in a 5U Enclosure
• Supports >600TB of usable storage per
enclosure with 10TB drives
• 4+ PB of usable capacity per rack
• High Performance: >10GB/sec per enclosure
streaming retrieval that is an excellent DMF
cache compliment to tape, object or cloud
storage
Zero Watt Storage Advanced Software Features:
• Open standard access – no user application changes required
• Flexible deployment – no interruption to DMF production
environment during ZWS deployment
• Tuneable data movement policies – to maximize use of ZWS
& other storage hardware
• Granular drive management including automated spin-down of
inactive individual disks
• Maximum power savings Increasing disk lifespan
• Automated data recoverability – silent data corruption
prevention and ‘in place’ data recovery
32. Migrate
Recall
e.g. by time, type, etc
Primary Storage
(POSIX)
• Online, high-performance disk
Nearline Fast-Mount
Cache
High capacity, low cost,
power-managed disk
Deep Storage
Object Store
Public Cloud
Tape
Entire namespace is in
Filesystem
Migrate file data transparently
(with invisible IO), leave
inodes
Recall on access or by
schedule
Filesystem IS the metadata
database
Transparency makes it easy
– data catalog and access in
same place
Data Management | Lustre HSM with DMF Core Concepts
33. Data Management | DMF Data Protection Strategy
33
3 copies
Advantage of 3 copies
of all data:
• Optimized use of storage HW
• High availability
• Elimination of backup
1 2 3
Performance
copy
Secure
copy
Disaster Recovery
copy
2
media
types
Advantage of keeping data on
two different media types:
• Fast data access
• Data retention
• Archive resilience
RAID, Flash, Disc,
Tape, Object &
ZWS
Tape or Cloud Object
1 2
1
copy
offsite
Advantage of keeping one
copy offsite:
• Lower power consumption
• Base for compliance
• Disaster recovery
Primary Data
Center
Offsite or Cloud
Storage
1
34. 34
Proven in production use for
over 20 years
• Data Management, Archive, Integrated
Backup, Validation and RepairAll-in-One
• All data appears online all the timeTransparent
• Policies leverage file attributes, define
multiple copies on different media.Policy-driven
DMF
Scalable Data
Management Fabric
• Policy-based Data Migration & HSM
• Parallel Architecture for High Throughput
• Active Data Validation and Repair
• Minimizes Storage Administrator Workload
Lowest cost per GB of
data with extremely high
levels of data durability
High-performance access
with very low storage &
operating costs
Highly scalable and resilient
for availability and disaster
recovery
Public/Private CloudZero Watt StorageTMTape Library Storage
Data Management | DMF Core Concepts
36. –High-Performance data migrations
–DMF Direct Archiving
–MAID storage target
–DMF Zero Watt Storage
–Elegant Archive Storage Migration Over Time
–Multi-Petabyte data migration with no user impact
–Trusted data protection
–Over 25 years preserving data
–Active user community
–DMF User Group (Feb 2017)http://hpc.csiro.au/users/dmfug/
Key Differentiators
36
Some names and brands may be claimed as the property of others
38. Data Management | Summary
38
HPC presents unique storage challenges
HPE has a robust and flexible set of HPC file systems
DMF data management ensures long term availability
HPC Business Unit can assist with sizing and design