SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
DDN Confidential
DDN Storage | ©2018 DataDirect Networks, Inc.
IME: Unlocking the Potential of NVMe
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN Direction| Building on 20 Years of Innovation
JCAHPC
>1TB/sec NVMe
S2A Systems
1998 2000 2016 2017
SFA – Scale-Out Systems
IME – Software Defined Elastic Data Services
Data Integrity, Declustering, Erasure Coding – innovative data protect schemas, geo distribution, new orders of scaling
Storage Orchestration – Open APIs, Kubernetes, Docker, Openstack, Ansible, Puppet…
New Hierarchies – automated data placement, IOPs and Bulk IO Engines,
Flash and NVRAM – performance, lifetime, memory-class
HW Fabrics and Interconnects – NVMe (oF), IB, OPA, Datacentre Ethernet, Gen-Z
Virtualisation | Containers | Tenancy – secure tenants at scale, SCSI VM stack
Disaggregated, Shared Nothing - KV stores, node-local, edge server, SW/cloud deployment
Distributed Storage: GPFS (Spectrum Scale) and Lustre, NFS/SMB, POSIX and POSIX-lite, Object
Technologyanddemand
2020
SFA12Kxi
4.5 M IOPS
ONRL Spider 2
1 TB/sec HDD
First Large
Lustre System
First S2A
Shipped to NASA
20132003
Elastic Data Services for Performance and Scale
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN the First to Realize the Research Dream
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
WHAT IS IME?
IME’s Active I/O Tier, is inserted right between
Compute and the parallel file system
IME software intelligently virtualizes disparate NVMe SSDs into
a single pool of shared memory that accelerates I/O, PFS & Applications
► Scale-Out Flash Cache Layer using NVMe SSDs
inserted between compute cluster and Parallel
File System (PFS)
• IME is configured as CLUSTER with multiple
NVMe servers
• All compute nodes can access cache data on
IME
► Accelerates difficult IO patterns:
small/random/shared file/high concurrency due
to thin SW IO management layer
► configured as scale-out massive cache layer
with huge IO bandwidth and IOPs
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME INTERNALS
File1 DFCD3455
File4 52ED789E
File3 46042D43
File6 DC355CE
Data Key Distributed
Network
Hash
Function peers
data
data
data
data
data
data
empty space à Log (time)
Log Tail Log Head
New data added here
Space reclaimed
here
wrap
DHT provides foundation for
• Network parallelism
• Node-level fault tolerance
• Distributed metadata
• Self-Optimising for Noisy Fabrics
Log Structured Filesystem at the storage device
level
• High performance device throughput (NAND
Flash)
• Maximises device lifetime
FABRIC-AWARE
FLASH-NATIVE
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME and the Plateau of Death
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME140 Front and Rear View
► The IME140 is a 1U Intel-based Server with up to 10 NVMe drives.
► 9 of the NVMe Drives serve application data, the 10th is for IME SW internal use (commit log)
► 135 TB per 1U, 18 GB/s Read & Write bandwidth with 2 OPA/EDR links
DDN Confidential
Dual Intel
4108 CPU
8C,
1.8GHz
Ten NVMe Drives
8 System Fans
One 128GB SATA
DoM
Sata DoM delivers much
higher boot performance
at lower power
Six 16GB DIMMS
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME140 PERFORMANCE
► IME140 Performance Demonstrates around
17GB/s and 300K IOPs per node
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME240 Performance Scalability & R/W Parity
240
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME | IME240 Performance
>600K IOPs | 20GB/s | 2 Rack Units
0
5000
10000
15000
20000
25000
4k 8k 16k 32k 64k 128k 256k 512k 1M 2M 4M
Throughput(MB/s)
IME240 Sequential Throughput
Read (dark red) and write (light red)
0
200
400
600
800
1000
1200
1400
0
5000
10000
15000
20000
25000
4k 8k 16k 32k 64k 128k
IOPs(1000s)
Throughput(MB/s)
IO Size
IOPs and Throughput for Random Write IO - single
IME240
►File Performance at over 20GB/s in 2RU, 1M write IOPS and 600K read IOPs
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN | IME140 SCALE-OUT NVME
IME140 SPECIFICATIONS
Enclosure 1RU
Disk Slots Up to 9 front accessible 2.5” NVMe drives
PSU/Cooling Redundant Power/Cooling
Network
Connectivity
EDR Infiniband, OPA, Ethernet
Performance 17GB/s per 1U server, 300K IOPs
* Cached
EXTRACT MORE FROM
YOUR APPLICATIONS
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
► IME’s datapath is designed to deliver
the potential of flash to the application
► Other Burst Buffers use a conventional
filesystem which severely limits the
ability to deliver flash performance
► The IO500 uses “Easy” and “Hard” IOR
benchmarks
• IOR easy. You can set the parameters to be whatever you would
like. You can use any of the modules such as HDF5 or MPI-IO.
Typically people maximize performance by doing file-per-process
and large aligned IO.
• IOR hard. We enforce a particular set of parameters. Specifically,
the IOs are 47008 bytes each interspersed in a single shared file.
Your only control is to specify how many writes each thread does.
► Anyone can get good performance with enough
equipment with the easy benchmark. Good
Performance with the Hard Benchmark requires a
new approach
0
200
400
600
800
1000
1200
Oakforest-PACS at JCAHPC
(IME)
Shaheen at Kaust (Datawarp)
IORResult(GiB/s)
IO500 IOR Results
https://www.vi4io.org/std/io500/start
easy write hard write easy read hard read
-98%
DataWarp
-98%
-20%
-40%
IME
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
► Extracting results from IO500
where the client count is 100
nodes or more
► Filesystem options show huge
degradation when the IO
patterns is tough.
► Only IME is able to present Flash
to the applications efficiently
0.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
90.0%
Oakforest-PACS
at JCAHPC (IME)
Shaheen at
Kaust
(Datawarp)
Mistral at DKRZ
(Lustre)
EMSL Cascade at
PNNL (Lustre)
RatioofEasy:Hard
IO500 Results
Ratio of Easy:Hard (systems with 100 clients or more)
Write Ratio Read Ratio
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
APPLICATION EFFICIENCY FOR THE REAL WORLD
-5000
5000
15000
25000
Large FPP Sequential
Large FPP Random
Large Shared Sequential
Large Shared Random
Small FPP Sequential
Small FPP Random
Small Shared Sequential
Small Shared Random
Medium FPP Sequential
Medium FPP Random
Medium Shared Sequential
Medium Shared Random
IME single server
-5000
5000
15000
25000
Lustre Nvme
-5000
5000
15000
25000
GPFS/NVMe
► IME I/O Characteristics demonstrate clear
benefits in comparison with Traditional
Parallel Filesystems
► Particularly strong performance for small
IO, writes and shared file operations
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
#1 on the IO500
More than 78% higher than the #2 score!
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
► 4xIME240 with parity=2+1 dhtcopy=3
► Device/Server failures are transparent
for the application
► Automatic data rebuild with no service
interruption
► Native De-Dlustered Distributed Erasure
Coding ensures fast rebuild
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
2
3
I/O write intensive job startup
Server 3 fails with 1TB data
1
Data Rebuild Zone
Normal
Service
Resumed
4
~3 mins
Continued Production
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 FAULT TOLERANCE
Continued Production
►Even after single node
failure, the rebuilt data are
still protected against failure
• 3 failing devices on surviving servers
• 2nd node failing
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME1.2 MONITORING WITH DDN INSIGHT
►IME Monitoring Integrated in
DDN Insight
►Aggregated views for Clients,
Servers, Devices
►Performance and status data
collection
►Event monitoring and alerts
►Live and historical data analysis
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
ROADMAP ITEM: NFS AS a BFS
NFS
IME
COMPUTE
► Brings scale-out Flash native performance to NFS access
► Shield NFS server from ”tough" IO
► Increase IO throughput from NFS hardware
► Zero application changes - replace NFS mount by IME mount
IME1.2 | TRUE DIAL-IN ERASURE CODING
IME
Server0
IME
Server1
IME
Server2
IME
ServerN
FILE CACHE
8+38+1
6+0 8+2
6+1
4+1
4+14+1
8+0
1+1
▶ IME1.1 supports multiple resilience levels
through flexible, adaptive erasure coding
▶ System Wide Default up to 15+3
▶ Applications can overide defaults and select
a specific Erasure Coding Scheme
DEFAULT: 8+3
Erasure coding options:
1+1 1+2 1+3
2+1 2+2 2+3
2+3 3+2 3+3
... ... ...
15+1 15+2 15+3
ddn.com©2018 DataDirect Networks, Inc. *Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.
Thank You!
Keep in touch with us.
9351 Deering Avenue
Chatsworth, CA 91311
1.800.837.2298
1.818.700.4000
company/datadirect-networks
@ddn_limitless
sales@ddn.com
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE
FILESYSTEM
Shared File
Shared File
►Parallel File systems can exhibit extremely
poor performance for shared file IO due to
internal lock management as a result of
managing files in large lock units
►IME eliminates contention by managing IO
fragments directly, and coalescing IO's prior
to flushing to the parallel file system
Performance
barrier
file
file
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE
FILESYSTEM
►Thick File system SW layers and
traditional data layout severly
restricts performance for tough
workloads
►IME’s lean write anywhere, fully
parallel IO completely removes the
barriers that prevent your application
seeing full performance
FILESYSTEM
LAYERS
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
SHARED FILE I/O OPTIMIZATION
Process 0
File
Process 1 Process 2
Filesystem lock
management when
IO's cross page
boundaries
► Parallel File systems can exhibit extremely poor
performance for shared file IO due to internal lock
management as a result of managing files in large
lock units
► IME eliminates contention by managing IO
fragments directly, and coalescing IO's prior to
flushing to the parallel file system
DDN Confidential
PFS
FLASH-NATIVE - MAXIMISE FLASH PERFORMANCE AND
LIFETIME
valid sectors with
user data
Random writes incoming with LBA
offsets corresponding to existing
data
Log Structured Filesystem:
SSDs sees writes to new block
ranges
free, unused
blocks
If IME reaches threshold then
data is flushed or purged in large
chunks
• Standard Storage Systems cannot
manage Flash without incurring
costly garbage collection,
reducing performance and SSD
Lifetime
• IME's Flash-Native approach
maximises Flash lifetime by
issuing IO in 128K chunks and
unmapping in large chunks
IMEPFS
new writes invalidate
corresponding sectors
blocks with large invalid sector
count undergo garbage
collection
DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc.
DDN | IME IS UNIQUE!
Designed for Scalability
Patented DDN Algorithms
Scale-Out Data Protection
Distributed Erasure Coding
Integrated With File Systems
Designed to Accelerate Lustre*, GPFS
No Code Modification Needed
Fully POSIX & HPC
Compatible
No Application Modifications
Writes Fast; Read Fast Too
No other system offers both at scale.
Intelligent, Adaptive System
On-the-Fly Data Placement

Weitere ähnliche Inhalte

Was ist angesagt?

32992 lam ebc storage overview3
32992 lam ebc storage overview332992 lam ebc storage overview3
32992 lam ebc storage overview3
gmazuel
 

Was ist angesagt? (20)

Can $0.08 Change your View of Storage?
Can $0.08 Change your View of Storage?Can $0.08 Change your View of Storage?
Can $0.08 Change your View of Storage?
 
NetApp enterprise All Flash Storage
NetApp enterprise All Flash StorageNetApp enterprise All Flash Storage
NetApp enterprise All Flash Storage
 
Solutions for Healthcare IT
Solutions for Healthcare ITSolutions for Healthcare IT
Solutions for Healthcare IT
 
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
The Need for Speed: Parallel I/O and the New Tick-Tock in ComputingThe Need for Speed: Parallel I/O and the New Tick-Tock in Computing
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
 
End User Computing with NetApp
End User Computing with NetAppEnd User Computing with NetApp
End User Computing with NetApp
 
The Value of NetApp with VMware
The Value of NetApp with VMwareThe Value of NetApp with VMware
The Value of NetApp with VMware
 
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
Need For Speed- Using Flash Storage to optimise performance and reduce costs-...
 
32992 lam ebc storage overview3
32992 lam ebc storage overview332992 lam ebc storage overview3
32992 lam ebc storage overview3
 
IBM Storage for SAP HANA Deployments
IBM Storage for SAP HANA DeploymentsIBM Storage for SAP HANA Deployments
IBM Storage for SAP HANA Deployments
 
Software Defined Storage In Action
Software Defined Storage In ActionSoftware Defined Storage In Action
Software Defined Storage In Action
 
Software-defined Storage in Action
Software-defined Storage in ActionSoftware-defined Storage in Action
Software-defined Storage in Action
 
DataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage HypervisorDataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage Hypervisor
 
Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire Seize Profits in the Cloud with SolidFire
Seize Profits in the Cloud with SolidFire
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
NetApp HCI
NetApp HCINetApp HCI
NetApp HCI
 
Overview of how NetApp IT Runs NetApp Technology in Their Enterprise
Overview of how NetApp IT Runs NetApp Technology in Their EnterpriseOverview of how NetApp IT Runs NetApp Technology in Their Enterprise
Overview of how NetApp IT Runs NetApp Technology in Their Enterprise
 
Software Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture TechnologiesSoftware Defined Storage - Open Framework and Intel® Architecture Technologies
Software Defined Storage - Open Framework and Intel® Architecture Technologies
 
Modernize Your Oracle Environment with an Agile Data Infrastructure
Modernize Your Oracle Environment with an Agile Data InfrastructureModernize Your Oracle Environment with an Agile Data Infrastructure
Modernize Your Oracle Environment with an Agile Data Infrastructure
 
Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization wit...
Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization wit...Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization wit...
Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization wit...
 
Enterprise Mass Storage TCO Case Study
Enterprise Mass Storage TCO Case StudyEnterprise Mass Storage TCO Case Study
Enterprise Mass Storage TCO Case Study
 

Ähnlich wie IME - Unlocking the Potential of NVMe

IBM flash systems
IBM flash systems IBM flash systems
IBM flash systems
Solv AS
 

Ähnlich wie IME - Unlocking the Potential of NVMe (20)

IO Management with IME 1.1
IO Management with IME 1.1IO Management with IME 1.1
IO Management with IME 1.1
 
Qnap event v1.6
Qnap   event v1.6Qnap   event v1.6
Qnap event v1.6
 
Mellanox Storage Solutions
Mellanox Storage SolutionsMellanox Storage Solutions
Mellanox Storage Solutions
 
Emc isilon overview
Emc isilon overview Emc isilon overview
Emc isilon overview
 
G108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905cG108277 ds8000-resiliency-lagos-v1905c
G108277 ds8000-resiliency-lagos-v1905c
 
DDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at ScaleDDN EXA 5 - Innovation at Scale
DDN EXA 5 - Innovation at Scale
 
#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session
 
Accelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDNAccelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDN
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
 
DDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for ExascaleDDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for Exascale
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
 
Helathcare modernize-tebc105-v1704a
Helathcare modernize-tebc105-v1704aHelathcare modernize-tebc105-v1704a
Helathcare modernize-tebc105-v1704a
 
IBM flash systems
IBM flash systems IBM flash systems
IBM flash systems
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent Memory
 
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015
Cassandra on Azure - "Tel-Aviv-Cassandra-Users" meetup 2015
 
NetApp All Flash storage
NetApp All Flash storageNetApp All Flash storage
NetApp All Flash storage
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
IBM DS8880 and IBM Z - Integrated by Design
IBM DS8880 and IBM Z - Integrated by DesignIBM DS8880 and IBM Z - Integrated by Design
IBM DS8880 and IBM Z - Integrated by Design
 
VMAX : répondez aux niveaux de services applicatifs les plus élevés
VMAX : répondez aux niveaux de services applicatifs les plus élevésVMAX : répondez aux niveaux de services applicatifs les plus élevés
VMAX : répondez aux niveaux de services applicatifs les plus élevés
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

IME - Unlocking the Potential of NVMe

  • 1. DDN Confidential DDN Storage | ©2018 DataDirect Networks, Inc. IME: Unlocking the Potential of NVMe
  • 2. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. DDN Direction| Building on 20 Years of Innovation JCAHPC >1TB/sec NVMe S2A Systems 1998 2000 2016 2017 SFA – Scale-Out Systems IME – Software Defined Elastic Data Services Data Integrity, Declustering, Erasure Coding – innovative data protect schemas, geo distribution, new orders of scaling Storage Orchestration – Open APIs, Kubernetes, Docker, Openstack, Ansible, Puppet… New Hierarchies – automated data placement, IOPs and Bulk IO Engines, Flash and NVRAM – performance, lifetime, memory-class HW Fabrics and Interconnects – NVMe (oF), IB, OPA, Datacentre Ethernet, Gen-Z Virtualisation | Containers | Tenancy – secure tenants at scale, SCSI VM stack Disaggregated, Shared Nothing - KV stores, node-local, edge server, SW/cloud deployment Distributed Storage: GPFS (Spectrum Scale) and Lustre, NFS/SMB, POSIX and POSIX-lite, Object Technologyanddemand 2020 SFA12Kxi 4.5 M IOPS ONRL Spider 2 1 TB/sec HDD First Large Lustre System First S2A Shipped to NASA 20132003 Elastic Data Services for Performance and Scale
  • 3. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. DDN the First to Realize the Research Dream
  • 4. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. WHAT IS IME? IME’s Active I/O Tier, is inserted right between Compute and the parallel file system IME software intelligently virtualizes disparate NVMe SSDs into a single pool of shared memory that accelerates I/O, PFS & Applications ► Scale-Out Flash Cache Layer using NVMe SSDs inserted between compute cluster and Parallel File System (PFS) • IME is configured as CLUSTER with multiple NVMe servers • All compute nodes can access cache data on IME ► Accelerates difficult IO patterns: small/random/shared file/high concurrency due to thin SW IO management layer ► configured as scale-out massive cache layer with huge IO bandwidth and IOPs
  • 5. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME INTERNALS File1 DFCD3455 File4 52ED789E File3 46042D43 File6 DC355CE Data Key Distributed Network Hash Function peers data data data data data data empty space à Log (time) Log Tail Log Head New data added here Space reclaimed here wrap DHT provides foundation for • Network parallelism • Node-level fault tolerance • Distributed metadata • Self-Optimising for Noisy Fabrics Log Structured Filesystem at the storage device level • High performance device throughput (NAND Flash) • Maximises device lifetime FABRIC-AWARE FLASH-NATIVE
  • 6. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME and the Plateau of Death
  • 7. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME140 Front and Rear View ► The IME140 is a 1U Intel-based Server with up to 10 NVMe drives. ► 9 of the NVMe Drives serve application data, the 10th is for IME SW internal use (commit log) ► 135 TB per 1U, 18 GB/s Read & Write bandwidth with 2 OPA/EDR links DDN Confidential Dual Intel 4108 CPU 8C, 1.8GHz Ten NVMe Drives 8 System Fans One 128GB SATA DoM Sata DoM delivers much higher boot performance at lower power Six 16GB DIMMS
  • 8. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME140 PERFORMANCE ► IME140 Performance Demonstrates around 17GB/s and 300K IOPs per node
  • 9. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME240 Performance Scalability & R/W Parity 240
  • 10. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME | IME240 Performance >600K IOPs | 20GB/s | 2 Rack Units 0 5000 10000 15000 20000 25000 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M 4M Throughput(MB/s) IME240 Sequential Throughput Read (dark red) and write (light red) 0 200 400 600 800 1000 1200 1400 0 5000 10000 15000 20000 25000 4k 8k 16k 32k 64k 128k IOPs(1000s) Throughput(MB/s) IO Size IOPs and Throughput for Random Write IO - single IME240 ►File Performance at over 20GB/s in 2RU, 1M write IOPS and 600K read IOPs
  • 11. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. DDN | IME140 SCALE-OUT NVME IME140 SPECIFICATIONS Enclosure 1RU Disk Slots Up to 9 front accessible 2.5” NVMe drives PSU/Cooling Redundant Power/Cooling Network Connectivity EDR Infiniband, OPA, Ethernet Performance 17GB/s per 1U server, 300K IOPs * Cached EXTRACT MORE FROM YOUR APPLICATIONS
  • 12. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. APPLICATION EFFICIENCY FOR THE REAL WORLD ► IME’s datapath is designed to deliver the potential of flash to the application ► Other Burst Buffers use a conventional filesystem which severely limits the ability to deliver flash performance ► The IO500 uses “Easy” and “Hard” IOR benchmarks • IOR easy. You can set the parameters to be whatever you would like. You can use any of the modules such as HDF5 or MPI-IO. Typically people maximize performance by doing file-per-process and large aligned IO. • IOR hard. We enforce a particular set of parameters. Specifically, the IOs are 47008 bytes each interspersed in a single shared file. Your only control is to specify how many writes each thread does. ► Anyone can get good performance with enough equipment with the easy benchmark. Good Performance with the Hard Benchmark requires a new approach 0 200 400 600 800 1000 1200 Oakforest-PACS at JCAHPC (IME) Shaheen at Kaust (Datawarp) IORResult(GiB/s) IO500 IOR Results https://www.vi4io.org/std/io500/start easy write hard write easy read hard read -98% DataWarp -98% -20% -40% IME
  • 13. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. APPLICATION EFFICIENCY FOR THE REAL WORLD ► Extracting results from IO500 where the client count is 100 nodes or more ► Filesystem options show huge degradation when the IO patterns is tough. ► Only IME is able to present Flash to the applications efficiently 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0% 80.0% 90.0% Oakforest-PACS at JCAHPC (IME) Shaheen at Kaust (Datawarp) Mistral at DKRZ (Lustre) EMSL Cascade at PNNL (Lustre) RatioofEasy:Hard IO500 Results Ratio of Easy:Hard (systems with 100 clients or more) Write Ratio Read Ratio
  • 14. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. APPLICATION EFFICIENCY FOR THE REAL WORLD -5000 5000 15000 25000 Large FPP Sequential Large FPP Random Large Shared Sequential Large Shared Random Small FPP Sequential Small FPP Random Small Shared Sequential Small Shared Random Medium FPP Sequential Medium FPP Random Medium Shared Sequential Medium Shared Random IME single server -5000 5000 15000 25000 Lustre Nvme -5000 5000 15000 25000 GPFS/NVMe ► IME I/O Characteristics demonstrate clear benefits in comparison with Traditional Parallel Filesystems ► Particularly strong performance for small IO, writes and shared file operations
  • 15. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. #1 on the IO500 More than 78% higher than the #2 score!
  • 16. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME1.2 FAULT TOLERANCE ► 4xIME240 with parity=2+1 dhtcopy=3 ► Device/Server failures are transparent for the application ► Automatic data rebuild with no service interruption ► Native De-Dlustered Distributed Erasure Coding ensures fast rebuild
  • 17. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME1.2 FAULT TOLERANCE 2 3 I/O write intensive job startup Server 3 fails with 1TB data 1 Data Rebuild Zone Normal Service Resumed 4 ~3 mins Continued Production
  • 18. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME1.2 FAULT TOLERANCE Continued Production ►Even after single node failure, the rebuilt data are still protected against failure • 3 failing devices on surviving servers • 2nd node failing
  • 19. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME1.2 MONITORING WITH DDN INSIGHT ►IME Monitoring Integrated in DDN Insight ►Aggregated views for Clients, Servers, Devices ►Performance and status data collection ►Event monitoring and alerts ►Live and historical data analysis
  • 20. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. ROADMAP ITEM: NFS AS a BFS NFS IME COMPUTE ► Brings scale-out Flash native performance to NFS access ► Shield NFS server from ”tough" IO ► Increase IO throughput from NFS hardware ► Zero application changes - replace NFS mount by IME mount
  • 21. IME1.2 | TRUE DIAL-IN ERASURE CODING IME Server0 IME Server1 IME Server2 IME ServerN FILE CACHE 8+38+1 6+0 8+2 6+1 4+1 4+14+1 8+0 1+1 ▶ IME1.1 supports multiple resilience levels through flexible, adaptive erasure coding ▶ System Wide Default up to 15+3 ▶ Applications can overide defaults and select a specific Erasure Coding Scheme DEFAULT: 8+3 Erasure coding options: 1+1 1+2 1+3 2+1 2+2 2+3 2+3 3+2 3+3 ... ... ... 15+1 15+2 15+3
  • 22. ddn.com©2018 DataDirect Networks, Inc. *Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change. Thank You! Keep in touch with us. 9351 Deering Avenue Chatsworth, CA 91311 1.800.837.2298 1.818.700.4000 company/datadirect-networks @ddn_limitless sales@ddn.com
  • 23. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE FILESYSTEM Shared File Shared File ►Parallel File systems can exhibit extremely poor performance for shared file IO due to internal lock management as a result of managing files in large lock units ►IME eliminates contention by managing IO fragments directly, and coalescing IO's prior to flushing to the parallel file system Performance barrier file file
  • 24. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. IME ENABLES NEW LEVELS OF FILESYSTEM PERFORMANCE FILESYSTEM ►Thick File system SW layers and traditional data layout severly restricts performance for tough workloads ►IME’s lean write anywhere, fully parallel IO completely removes the barriers that prevent your application seeing full performance FILESYSTEM LAYERS
  • 25. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. SHARED FILE I/O OPTIMIZATION Process 0 File Process 1 Process 2 Filesystem lock management when IO's cross page boundaries ► Parallel File systems can exhibit extremely poor performance for shared file IO due to internal lock management as a result of managing files in large lock units ► IME eliminates contention by managing IO fragments directly, and coalescing IO's prior to flushing to the parallel file system
  • 26. DDN Confidential PFS FLASH-NATIVE - MAXIMISE FLASH PERFORMANCE AND LIFETIME valid sectors with user data Random writes incoming with LBA offsets corresponding to existing data Log Structured Filesystem: SSDs sees writes to new block ranges free, unused blocks If IME reaches threshold then data is flushed or purged in large chunks • Standard Storage Systems cannot manage Flash without incurring costly garbage collection, reducing performance and SSD Lifetime • IME's Flash-Native approach maximises Flash lifetime by issuing IO in 128K chunks and unmapping in large chunks IMEPFS new writes invalidate corresponding sectors blocks with large invalid sector count undergo garbage collection
  • 27. DDN ConfidentialDDN Storage | ©2018 DataDirect Networks, Inc. DDN | IME IS UNIQUE! Designed for Scalability Patented DDN Algorithms Scale-Out Data Protection Distributed Erasure Coding Integrated With File Systems Designed to Accelerate Lustre*, GPFS No Code Modification Needed Fully POSIX & HPC Compatible No Application Modifications Writes Fast; Read Fast Too No other system offers both at scale. Intelligent, Adaptive System On-the-Fly Data Placement