Weitere ähnliche Inhalte Ähnlich wie IME - Unlocking the Potential of NVMe (20) Mehr von inside-BigData.com (20) Kürzlich hochgeladen (20) IME - Unlocking the Potential of NVMe2. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN Direction| Building on 20 Years of Innovation
JCAHPC
>1TB/sec NVMe
S2A Systems
1998 2000 2016 2017
SFA – Scale-Out Systems
IME – Software Defined Elastic Data Services
Data Integrity, Declustering, Erasure Coding – innovative data protect schemas, geo distribution, new orders of scaling
Storage Orchestration – Open APIs, Kubernetes, Docker, Openstack, Ansible, Puppet…
New Hierarchies – automated data placement, IOPs and Bulk IO Engines,
Flash and NVRAM – performance, lifetime, memory-class
HW Fabrics and Interconnects – NVMe (oF), IB, OPA, Datacentre Ethernet, Gen-Z
Virtualisation | Containers | Tenancy – secure tenants at scale, SCSI VM stack
Disaggregated, Shared Nothing - KV stores, node-local, edge server, SW/cloud deployment
Distributed Storage: GPFS (Spectrum Scale) and Lustre, NFS/SMB, POSIX and POSIX-lite, Object
Technologyanddemand
2020
SFA12Kxi
4.5 M IOPS
ONRL Spider 2
1 TB/sec HDD
First Large
Lustre System
First S2A
Shipped to NASA
20132003
Elastic Data Services for Performance and Scale
4. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
WHAT IS IME?
IME’s Active I/O Tier, is inserted right between
Compute and the parallel file system
IME software intelligently virtualizes disparate NVMe SSDs into
a single pool of shared memory that accelerates I/O, PFS & Applications
► Scale-Out Flash Cache Layer using NVMe SSDs
inserted between compute cluster and Parallel
File System (PFS)
• IME is configured as CLUSTER with multiple
NVMe servers
• All compute nodes can access cache data on
IME
► Accelerates difficult IO patterns:
small/random/shared file/high concurrency due
to thin SW IO management layer
► configured as scale-out massive cache layer
with huge IO bandwidth and IOPs
5. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME INTERNALS
File1 DFCD3455
File4 52ED789E
File3 46042D43
File6 DC355CE
Data Key Distributed
Network
Hash
Function peers
data
data
data
data
data
data
empty space à Log (time)
Log Tail Log Head
New data added here
Space reclaimed
here
wrap
DHT provides foundation for
• Network parallelism
• Node-level fault tolerance
• Distributed metadata
• Self-Optimising for Noisy Fabrics
Log Structured Filesystem at the storage device
level
• High performance device throughput (NAND
Flash)
• Maximises device lifetime
FABRIC-AWARE
FLASH-NATIVE
7. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME140 Front and Rear View
► The IME140 is a 1U Intel-based Server with up to 10 NVMe drives.
► 9 of the NVMe Drives serve application data, the 10th is for IME SW internal use (commit log)
► 135 TB per 1U, 18 GB/s Read & Write bandwidth with 2 OPA/EDR links
DDN Confidential
Dual Intel
4108 CPU
8C,
1.8GHz
Ten NVMe Drives
8 System Fans
One 128GB SATA
DoM
Sata DoM delivers much
higher boot performance
at lower power
Six 16GB DIMMS
8. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME140 PERFORMANCE
► IME140 Performance Demonstrates around
17GB/s and 300K IOPs per node
10. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME | IME240 Performance
>600K IOPs | 20GB/s | 2 Rack Units
0
5000
10000
15000
20000
25000
4k 8k 16k 32k 64k 128k 256k 512k 1M 2M 4M
Throughput(MB/s)
IME240 Sequential Throughput
Read (dark red) and write (light red)
0
200
400
600
800
1000
1200
1400
0
5000
10000
15000
20000
25000
4k 8k 16k 32k 64k 128k
IOPs(1000s)
Throughput(MB/s)
IO Size
IOPs and Throughput for Random Write IO - single
IME240
►File Performance at over 20GB/s in 2RU, 1M write IOPS and 600K read IOPs
11. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN | IME140 SCALE-OUT NVME
IME140 SPECIFICATIONS
Enclosure 1RU
Disk Slots Up to 9 front accessible 2.5” NVMe drives
PSU/Cooling Redundant Power/Cooling
Network
Connectivity
EDR Infiniband, OPA, Ethernet
Performance 17GB/s per 1U server, 300K IOPs
* Cached
EXTRACT MORE FROM
YOUR APPLICATIONS
12. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
► IME’s datapath is designed to deliver
the potential of flash to the application
► Other Burst Buffers use a conventional
filesystem which severely limits the
ability to deliver flash performance
► The IO500 uses “Easy” and “Hard” IOR
benchmarks
• IOR easy. You can set the parameters to be whatever you would
like. You can use any of the modules such as HDF5 or MPI-IO.
Typically people maximize performance by doing file-per-process
and large aligned IO.
• IOR hard. We enforce a particular set of parameters. Specifically,
the IOs are 47008 bytes each interspersed in a single shared file.
Your only control is to specify how many writes each thread does.
► Anyone can get good performance with enough
equipment with the easy benchmark. Good
Performance with the Hard Benchmark requires a
new approach
0
200
400
600
800
1000
1200
Oakforest-PACS at JCAHPC
(IME)
Shaheen at Kaust (Datawarp)
IORResult(GiB/s)
IO500 IOR Results
https://www.vi4io.org/std/io500/start
easy write hard write easy read hard read
-98%
DataWarp
-98%
-20%
-40%
IME
13. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
► Extracting results from IO500
where the client count is 100
nodes or more
► Filesystem options show huge
degradation when the IO
patterns is tough.
► Only IME is able to present Flash
to the applications efficiently
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
Oakforest-PACS
at JCAHPC (IME)
Shaheen at
Kaust
(Datawarp)
Mistral at DKRZ
(Lustre)
EMSL Cascade at
PNNL (Lustre)
RatioofEasy:Hard
IO500 Results
Ratio of Easy:Hard (systems with 100 clients or more)
Write Ratio Read Ratio
14. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
-5000
5000
15000
25000
Large FPP Sequential
Large FPP Random
Large Shared Sequential
Large Shared Random
Small FPP Sequential
Small FPP Random
Small Shared Sequential
Small Shared Random
Medium FPP Sequential
Medium FPP Random
Medium Shared Sequential
Medium Shared Random
IME single server
-5000
5000
15000
25000
Lustre Nvme
-5000
5000
15000
25000
GPFS/NVMe
► IME I/O Characteristics demonstrate clear
benefits in comparison with Traditional
Parallel Filesystems
► Particularly strong performance for small
IO, writes and shared file operations
16. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
► 4xIME240 with parity=2+1 dhtcopy=3
► Device/Server failures are transparent
for the application
► Automatic data rebuild with no service
interruption
► Native De-Dlustered Distributed Erasure
Coding ensures fast rebuild
17. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
2
3
I/O write intensive job startup
Server 3 fails with 1TB data
1
Data Rebuild Zone
Normal
Service
Resumed
4
~3 mins
Continued Production
18. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
Continued Production
►Even after single node
failure, the rebuilt data are
still protected against failure
• 3 failing devices on surviving servers
• 2nd node failing
19. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 MONITORING WITH DDN INSIGHT
►IME Monitoring Integrated in
DDN Insight
►Aggregated views for Clients,
Servers, Devices
►Performance and status data
collection
►Event monitoring and alerts
►Live and historical data analysis
20. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
ROADMAP ITEM: NFS AS a BFS
NFS
IME
COMPUTE
► Brings scale-out Flash native performance to NFS access
► Shield NFS server from ”tough" IO
► Increase IO throughput from NFS hardware
► Zero application changes - replace NFS mount by IME mount
21. IME1.2 | TRUE DIAL-IN ERASURE CODING
IME
Server0
IME
Server1
IME
Server2
IME
ServerN
FILE CACHE
8+38+1
6+0 8+2
6+1
4+1
4+14+1
8+0
1+1
▶ IME1.1 supports multiple resilience levels
through flexible, adaptive erasure coding
▶ System Wide Default up to 15+3
▶ Applications can overide defaults and select
a specific Erasure Coding Scheme
DEFAULT: 8+3
Erasure coding options:
1+1 1+2 1+3
2+1 2+2 2+3
2+3 3+2 3+3
... ... ...
15+1 15+2 15+3
22. ddn.com©2018 DataDirect Networks, Inc. *Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Thank You!
Keep in touch with us.
9351 Deering Avenue
Chatsworth, CA 91311
1.800.837.2298
1.818.700.4000
company/datadirect-networks
@ddn_limitless
sales@ddn.com
23. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE
FILESYSTEM
Shared File
Shared File
►Parallel File systems can exhibit extremely
poor performance for shared file IO due to
internal lock management as a result of
managing files in large lock units
►IME eliminates contention by managing IO
fragments directly, and coalescing IO's prior
to flushing to the parallel file system
Performance
barrier
file
file
24. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE
FILESYSTEM
►Thick File system SW layers and
traditional data layout severly
restricts performance for tough
workloads
►IME’s lean write anywhere, fully
parallel IO completely removes the
barriers that prevent your application
seeing full performance
FILESYSTEM
LAYERS
25. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
SHARED FILE I/O OPTIMIZATION
Process 0
File
Process 1 Process 2
Filesystem lock
management when
IO's cross page
boundaries
► Parallel File systems can exhibit extremely poor
performance for shared file IO due to internal lock
management as a result of managing files in large
lock units
► IME eliminates contention by managing IO
fragments directly, and coalescing IO's prior to
flushing to the parallel file system
26. DDN Confidential
PFS
FLASH-NATIVE - MAXIMISE FLASH PERFORMANCE AND
LIFETIME
valid sectors with
user data
Random writes incoming with LBA
offsets corresponding to existing
data
Log Structured Filesystem:
SSDs sees writes to new block
ranges
free, unused
blocks
If IME reaches threshold then
data is flushed or purged in large
chunks
• Standard Storage Systems cannot
manage Flash without incurring
costly garbage collection,
reducing performance and SSD
Lifetime
• IME's Flash-Native approach
maximises Flash lifetime by
issuing IO in 128K chunks and
unmapping in large chunks
IMEPFS
new writes invalidate
corresponding sectors
blocks with large invalid sector
count undergo garbage
collection
27. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN | IME IS UNIQUE!
Designed for Scalability
Patented DDN Algorithms
Scale-Out Data Protection
Distributed Erasure Coding
Integrated With File Systems
Designed to Accelerate Lustre*, GPFS
No Code Modification Needed
Fully POSIX & HPC
Compatible
No Application Modifications
Writes Fast; Read Fast Too
No other system offers both at scale.
Intelligent, Adaptive System
On-the-Fly Data Placement