SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Downloaden Sie, um offline zu lesen
DAOS in Lenovo’s
HPC Innovation Center
Michael Hennecke | DUG’20, 19-Nov-2020
2020 Lenovo. All rights reserved.
DAOS in Lenovo‘s HPC Innovation Center Stuttgart
12x DAOS Servers in Lenox (ThinkSystem SR630)
Each with 8x NVMe 1.6 TB, 12x PMem 128 GB, 2x EDR
• Total raw capacity: 150+18 TB (NVMe+PMem)
• Peak bandwidth: 240 GB/s read (180 GB/s raw write)
Running DAOS 1.1.1-1; daemons managed with systemd
2020 Lenovo. All rights reserved.
DAOS Server Architecture: Lenovo ThinkSystem SR630
CPU 2
CascadeLake-EP
6238 Gold
( 22C 140W 2.1GHz )
1
0
0
1
0
0
100GbE /
InfiniBand
slot 3 slot 2
1
0
0
1
0
0
slot4
10xU.2drivebays
gen3
16x
4x
4x
4x
4x
DDR
2666
DDR
2666
DDR
2666
DDR
2666
C D E F
DDR
2666
DDR
2666
A B
CPU 1
CascadeLake-EP
6238 Gold
( 22C 140W 2.1GHz )
DDR
2666
DDR
2666
DDR
2666
DDR
2666
I J K L
DDR
2666
DDR
2666
G H
UPI
UPI
gen3
16x
gen3
8x
PCH
10xU.2drivebays
4x
4x
4x
4x
810-4PNVMe
SwitchAdapter
NVMe 0
NVMe 1
NVMe 2
NVMe 3
NVMe 4
NVMe 6
NVMe 5
NVMe 7
gen3 8x
810-4PNVMe
SwitchAdapter
slot1
gen3 8x
M.2 kit
sda
sdb
gen3
16x
AEP
2666
AEP
2666
AEP
2666
AEP
2666
AEP
2666
AEP
2666
AEP
2666
AEP
2666
iMC0 iMC1iMC0 iMC1
AEP
2666
AEP
2666
AEP
2666
AEP
2666
100GbE /
InfiniBand
2x P4610
(1.6 TB)
4x P4610
(1.6 TB)
6x Optane PMem
(128 GB)
6x Optane PMem
(128 GB)
2x P4610
(1.6 TB)
2020 Lenovo. All rights reserved.
Lenovo‘s Slurm Integration: Ephemeral DAOS Containers
1. In slurm.conf, add: Licenses=daos_mlx:1,daos_dev1_mlx:1,daos_dev2_mlx:1
– Define multiple DAOS clusters as Slurm „Licenses“, to manage dedicated batch job access
2. User jobs request a DAOS cluster, e.g. #SBATCH --licenses=daos_mlx
3. Slurm Prolog and Epilog call daos_mounts.sh {prolog|epilog}
4. Lenovo‘s daos_mounts.sh prolog:
– Creates YML config files for the requested Slurm license, and starts daos_agent
– On 1st node, provisions a DAOS pool and DAOS POSIX container for the user
– On 1st node, saves pool and container UUIDs in per-job dotfiles (in user‘s $HOME/.daos/)
– Dfuse-mounts the POSIX container for the user
5. Lenovo‘s daos_mounts.sh epilog:
– Umounts the POSIX container
– On 1st node, destroys the pool, including its container(s)
– On 1st node, removes the per-job dotfiles (from $HOME/.daos/), and stops daos_agent
2020 Lenovo. All rights reserved.
Lenovo‘s Slurm Integration: daos_mounts.sh prolog
# On the first node in the job (Slurm BatchHost):
dmg pool create -s 768g -n 6390g -u $D_USER -g $D_GROUP 
> /tmp/dmg-pool-create.$D_USER.log 2>&1
LINE=`grep $UUID_MATCH /tmp/dmg-pool-create.$D_USER.log`
[[ $LINE =~ $CREATE_MATCH ]]
D_POOL=${BASH_REMATCH[1]}
D_SVC=${BASH_REMATCH[2]}
dmg pool update-acl --pool=$D_POOL -e 'A::root@:rw' # also -e 'A::daos@:rw‘
D_CONT=`uuidgen`
su - $D_USER 
-c "daos cont create --pool=$D_POOL --svc=$D_SVC --cont=$D_CONT --type=POSIX“
# On all nodes in the job:
su - $D_USER 
-c "dfuse --pool $D_POOL --svc=$D_SVC --container $D_CONT -m /daos/$D_USER"
2020 Lenovo. All rights reserved.
DAOS Unified Namespace with Spectrum Scale (1/2)
• DAOS „Unified Namespace“ Concept:
1. Store DAOS pool UUID and container UUID as
extended attributes (XATTR‘s) of the „mount point“ directory
2. When this „mount point“ is in a global parallel filesystem,
dfuse can use this instead of --pool and --container
• IBM Spectrum Scale supports Extended Attributes (XATTR‘s),
both for internal features and for user metadata
– Stored in inode if small, or in „overflow“ EA block (≤64 kiB)
$ daos cont create --pool=$D_POOL --svc=$D_SVC --cont=$D_CONT 
--type=POSIX --path /home/mhennecke/daos_tmp
$ mmlsattr --dump-attr /home/mhennecke/daos_tmp
file name: /home/mhennecke/daos_tmp
user.daos
$ mmlsattr --get-attr user.daos /home/mhennecke/daos_tmp
file name: /home/mhennecke/daos_tmp
user.daos: "DAOS.POSIX://c0c99a8c-5453-4950-9bbd-1d9d784b51c0/7b6ff2f2-b52d-4a25-8565-285006572c96?“
2020 Lenovo. All rights reserved.
?
DAOS Unified Namespace with Spectrum Scale (2/2)
• daos can query the Spectrum Scale mountpoint directory‘s XATTR‘s
on each node where the „containing“ Spectrum Scale filesystem is mounted:
$ daos cont query --path /home/mhennecke/daos_tmp --svc 0
Pool UUID: c0c99a8c-5453-4950-9bbd-1d9d784b51c0
Container UUID: 7b6ff2f2-b52d-4a25-8565-285006572c96
Number of snapshots: 0
Latest Persistent Snapshot: 0
Highest Aggregated Epoch: 1605783539794720768
DAOS Unified Namespace Attributes on path /home/mhennecke/daos_tmp:
Container Type: POSIX
Object Class: SX
Chunk Size: 1048576
• The dfuse mount command can use the path without --pool and --cont:
$ dfuse -m /home/mhennecke/daos_tmp --svc 0
$ df|grep daos
dfuse 13980468750 727 13980468024 1% /gpfs/gss1/home/mhennecke/daos_tmp
2020 Lenovo. All rights reserved.
DAOS Unified Namespace with Spectrum Scale (2/2)
• daos can query the Spectrum Scale mountpoint directory‘s XATTR‘s
on each node where the „containing“ Spectrum Scale filesystem is mounted:
$ daos cont query --path /home/mhennecke/daos_tmp --svc 0
Pool UUID: c0c99a8c-5453-4950-9bbd-1d9d784b51c0
Container UUID: 7b6ff2f2-b52d-4a25-8565-285006572c96
Number of snapshots: 0
Latest Persistent Snapshot: 0
Highest Aggregated Epoch: 1605783539794720768
DAOS Unified Namespace Attributes on path /home/mhennecke/daos_tmp:
Container Type: POSIX
Object Class: SX
Chunk Size: 1048576
• The dfuse mount command can use the path without --pool and --cont:
$ dfuse -m /home/mhennecke/daos_tmp --svc 0
$ df|grep daos
dfuse 13980468750 727 13980468024 1% /gpfs/gss1/home/mhennecke/daos_tmp
2020 Lenovo. All rights reserved.
The fusermount3 -u
(unmount) fails if the
Spectrum Scale file
system is mounted with
root-squash...
Three Ways of POSIX Filesystem Support in DAOS
Application / Framework
DAOS library (libdaos)
DAOS File System (libdfs)
Interception Library (libioil)
dfuse
Single process address space
DAOS Storage Engine
RPC RDMA
End-to-end userspace
No system calls
2020 Lenovo. All rights reserved.
DAOS ior-easy – 1 Server (P4610 3.2TB), 1 Client
2020 Lenovo. All rights reserved.
4MiB sequential transfers.
DAOS ior-easy – 1 Server (P4610 3.2TB), N Clients
API=DFS. 4MiB sequential transfers. Scaling the number of clients, 40 tasks/node.
2020 Lenovo. All rights reserved.
DAOS IO500 – 1 Server (P4610 3.2TB), 10 Clients
IO500 with API=DFS, and Mohamad‘s DFS-enabled find from mpifileutils:
IO500 version io500-sc20_v3
[RESULT] ior-easy-write 20.754597 GiB/s : time 315.288 seconds
[RESULT] mdtest-easy-write 586.050492 kIOPS : time 308.214 seconds
[RESULT] ior-hard-write 8.015282 GiB/s : time 316.094 seconds
[RESULT] mdtest-hard-write 120.679218 kIOPS : time 320.813 seconds
[RESULT] find 328.553089 kIOPS : time 657.437 seconds
[RESULT] ior-easy-read 21.865060 GiB/s : time 298.510 seconds
[RESULT] mdtest-easy-stat 919.974294 kIOPS : time 192.983 seconds
[RESULT] ior-hard-read 9.767739 GiB/s : time 258.846 seconds
[RESULT] mdtest-hard-stat 532.842517 kIOPS : time 73.066 seconds
[RESULT] mdtest-easy-delete 389.111423 kIOPS : time 467.045 seconds
[RESULT] mdtest-hard-read 186.604589 kIOPS : time 207.250 seconds
[RESULT] mdtest-hard-delete 370.730738 kIOPS : time 192.598 seconds
[SCORE] Bandwidth 13.729175 GiB/s : IOPS 363.768691 kiops : TOTAL 70.669966
2020 Lenovo. All rights reserved. ior-hard used --dfs.chunk_size=470080 (10x the transferSize)
DAOS IO500 – 1 Server (P4610 1.6TB), 10 Clients
IO500 with API=DFS, and Mohamad‘s DFS-enabled find from mpifileutils:
IO500 version io500-sc20_v3
[RESULT] ior-easy-write 14.385031 GiB/s : time 323.882 seconds
[RESULT] mdtest-easy-write 583.580500 kIOPS : time 307.465 seconds
[RESULT] ior-hard-write 7.995398 GiB/s : time 315.677 seconds
[RESULT] mdtest-hard-write 120.932885 kIOPS : time 319.958 seconds
[RESULT] find 326.808035 kIOPS : time 660.728 seconds
[RESULT] ior-easy-read 21.764214 GiB/s : time 213.669 seconds
[RESULT] mdtest-easy-stat 866.138808 kIOPS : time 204.900 seconds
[RESULT] ior-hard-read 7.026412 GiB/s : time 358.190 seconds
[RESULT] mdtest-hard-stat 511.331836 kIOPS : time 76.019 seconds
[RESULT] mdtest-easy-delete 356.626439 kIOPS : time 510.617 seconds
[RESULT] mdtest-hard-read 187.117240 kIOPS : time 207.295 seconds
[RESULT] mdtest-hard-delete 367.907065 kIOPS : time 193.228 seconds
[SCORE] Bandwidth 11.516139 GiB/s : IOPS 354.741268 kiops : TOTAL 63.915957
2020 Lenovo. All rights reserved. ior-hard used --dfs.chunk_size=470080 (10x the transferSize)
DAOS IO500 – 9 Servers (P4610 1.6TB), 10 Clients
IO500 with API=DFS, and Mohamad‘s DFS-enabled find from mpifileutils:
IO500 version io500-sc20_v3
[RESULT] ior-easy-write 92.534946 GiB/s : time 308.122 seconds
[RESULT] mdtest-easy-write 3135.992682 kIOPS : time 314.497 seconds
[RESULT] ior-hard-write 50.643138 GiB/s : time 340.693 seconds
[RESULT] mdtest-hard-write 690.243412 kIOPS : time 325.546 seconds
[RESULT] find 1349.765412 kIOPS : time 881.429 seconds
[RESULT] ior-easy-read 86.753938 GiB/s : time 328.958 seconds
[RESULT] mdtest-easy-stat 2011.338604 kIOPS : time 482.063 seconds
[RESULT] ior-hard-read 33.106979 GiB/s : time 520.042 seconds
[RESULT] mdtest-hard-stat 1792.537325 kIOPS : time 126.983 seconds
[RESULT] mdtest-easy-delete 1927.080799 kIOPS : time 515.165 seconds
[RESULT] mdtest-hard-read 892.651111 kIOPS : time 252.147 seconds
[RESULT] mdtest-hard-delete 1816.380945 kIOPS : time 235.734 seconds
[SCORE] Bandwidth 60.570169 GiB/s : IOPS 1547.648039 kiops : TOTAL 306.172016
2020 Lenovo. All rights reserved. ior-hard used --dfs.chunk_size=470080 (10x the transferSize)
mhennecke @ lenovo.com

Weitere ähnliche Inhalte

Was ist angesagt?

High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatChris Barber
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....
Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....
Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....NETWAYS
 
Running hadoop on ubuntu linux
Running hadoop on ubuntu linuxRunning hadoop on ubuntu linux
Running hadoop on ubuntu linuxTRCK
 
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterLISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterIvan Babrou
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Amazon Web Services
 
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...Equnix Business Solutions
 
Apache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteApache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteAllen Wittenauer
 
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...PROIDEA
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Community
 
Logical volume manager xfs
Logical volume manager xfsLogical volume manager xfs
Logical volume manager xfsSarwar Javaid
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsSruthi Kumar Annamnidu
 
Automating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLAutomating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLNina Kaufman
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...InfluxData
 
Extreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningExtreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningMilind Koyande
 

Was ist angesagt? (20)

GTC Japan 2014
GTC Japan 2014GTC Japan 2014
GTC Japan 2014
 
High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & Heartbeat
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Upgrade & ndmp
Upgrade & ndmpUpgrade & ndmp
Upgrade & ndmp
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....
Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....
Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....
 
Running hadoop on ubuntu linux
Running hadoop on ubuntu linuxRunning hadoop on ubuntu linux
Running hadoop on ubuntu linux
 
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF ExporterLISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
 
Apache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteApache Hadoop Shell Rewrite
Apache Hadoop Shell Rewrite
 
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
DevOpsDays Warsaw 2015: Running High Performance And Fault Tolerant Elasticse...
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore
 
Logical volume manager xfs
Logical volume manager xfsLogical volume manager xfs
Logical volume manager xfs
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
 
Automating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLAutomating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQL
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
 
Extreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and TuningExtreme Linux Performance Monitoring and Tuning
Extreme Linux Performance Monitoring and Tuning
 

Ähnlich wie DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center

Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Praguetomasbart
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud StroageAlex Lau
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderGregSmith458515
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
PFIセミナー資料 H27.10.22
PFIセミナー資料 H27.10.22PFIセミナー資料 H27.10.22
PFIセミナー資料 H27.10.22Yuya Takei
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...confluent
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Anne Nicolas
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservicesSalo Shp
 
How to admin
How to adminHow to admin
How to adminyalegko
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kazuhito Ohkawa
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceBrendan Gregg
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingMarian Marinov
 
Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )  Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH ) Alex Lau
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemCyber Security Alliance
 
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014Amazon Web Services
 
Doing QoS Before Ceph Cluster QoS is available - David Byte, Alex Lau
Doing QoS Before Ceph Cluster QoS is available - David Byte, Alex LauDoing QoS Before Ceph Cluster QoS is available - David Byte, Alex Lau
Doing QoS Before Ceph Cluster QoS is available - David Byte, Alex LauCeph Community
 

Ähnlich wie DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center (20)

Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
Build an affordable Cloud Stroage
Build an affordable Cloud StroageBuild an affordable Cloud Stroage
Build an affordable Cloud Stroage
 
Multipath
MultipathMultipath
Multipath
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
Speedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql LoaderSpeedrunning the Open Street Map osm2pgsql Loader
Speedrunning the Open Street Map osm2pgsql Loader
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
PFIセミナー資料 H27.10.22
PFIセミナー資料 H27.10.22PFIセミナー資料 H27.10.22
PFIセミナー資料 H27.10.22
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservices
 
How to admin
How to adminHow to admin
How to admin
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
SiteGround Tech TeamBuilding
SiteGround Tech TeamBuildingSiteGround Tech TeamBuilding
SiteGround Tech TeamBuilding
 
Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )  Cloud Storage Introduction ( CEPH )
Cloud Storage Introduction ( CEPH )
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
(SDD402) Amazon ElastiCache Deep Dive | AWS re:Invent 2014
 
Doing QoS Before Ceph Cluster QoS is available - David Byte, Alex Lau
Doing QoS Before Ceph Cluster QoS is available - David Byte, Alex LauDoing QoS Before Ceph Cluster QoS is available - David Byte, Alex Lau
Doing QoS Before Ceph Cluster QoS is available - David Byte, Alex Lau
 

Mehr von Andrey Kudryavtsev

DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansAndrey Kudryavtsev
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...Andrey Kudryavtsev
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesAndrey Kudryavtsev
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateAndrey Kudryavtsev
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingAndrey Kudryavtsev
 
DUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSDUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSAndrey Kudryavtsev
 
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabDUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabAndrey Kudryavtsev
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedAndrey Kudryavtsev
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateAndrey Kudryavtsev
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSAndrey Kudryavtsev
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraAndrey Kudryavtsev
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateAndrey Kudryavtsev
 

Mehr von Andrey Kudryavtsev (13)

DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware Update
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY Mapping
 
DUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSDUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOS
 
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabDUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN Openlab
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature Update
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOS
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS Update
 
DAOS Middleware overview
DAOS Middleware overviewDAOS Middleware overview
DAOS Middleware overview
 

Kürzlich hochgeladen

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center

  • 1. DAOS in Lenovo’s HPC Innovation Center Michael Hennecke | DUG’20, 19-Nov-2020 2020 Lenovo. All rights reserved.
  • 2. DAOS in Lenovo‘s HPC Innovation Center Stuttgart 12x DAOS Servers in Lenox (ThinkSystem SR630) Each with 8x NVMe 1.6 TB, 12x PMem 128 GB, 2x EDR • Total raw capacity: 150+18 TB (NVMe+PMem) • Peak bandwidth: 240 GB/s read (180 GB/s raw write) Running DAOS 1.1.1-1; daemons managed with systemd 2020 Lenovo. All rights reserved.
  • 3. DAOS Server Architecture: Lenovo ThinkSystem SR630 CPU 2 CascadeLake-EP 6238 Gold ( 22C 140W 2.1GHz ) 1 0 0 1 0 0 100GbE / InfiniBand slot 3 slot 2 1 0 0 1 0 0 slot4 10xU.2drivebays gen3 16x 4x 4x 4x 4x DDR 2666 DDR 2666 DDR 2666 DDR 2666 C D E F DDR 2666 DDR 2666 A B CPU 1 CascadeLake-EP 6238 Gold ( 22C 140W 2.1GHz ) DDR 2666 DDR 2666 DDR 2666 DDR 2666 I J K L DDR 2666 DDR 2666 G H UPI UPI gen3 16x gen3 8x PCH 10xU.2drivebays 4x 4x 4x 4x 810-4PNVMe SwitchAdapter NVMe 0 NVMe 1 NVMe 2 NVMe 3 NVMe 4 NVMe 6 NVMe 5 NVMe 7 gen3 8x 810-4PNVMe SwitchAdapter slot1 gen3 8x M.2 kit sda sdb gen3 16x AEP 2666 AEP 2666 AEP 2666 AEP 2666 AEP 2666 AEP 2666 AEP 2666 AEP 2666 iMC0 iMC1iMC0 iMC1 AEP 2666 AEP 2666 AEP 2666 AEP 2666 100GbE / InfiniBand 2x P4610 (1.6 TB) 4x P4610 (1.6 TB) 6x Optane PMem (128 GB) 6x Optane PMem (128 GB) 2x P4610 (1.6 TB) 2020 Lenovo. All rights reserved.
  • 4. Lenovo‘s Slurm Integration: Ephemeral DAOS Containers 1. In slurm.conf, add: Licenses=daos_mlx:1,daos_dev1_mlx:1,daos_dev2_mlx:1 – Define multiple DAOS clusters as Slurm „Licenses“, to manage dedicated batch job access 2. User jobs request a DAOS cluster, e.g. #SBATCH --licenses=daos_mlx 3. Slurm Prolog and Epilog call daos_mounts.sh {prolog|epilog} 4. Lenovo‘s daos_mounts.sh prolog: – Creates YML config files for the requested Slurm license, and starts daos_agent – On 1st node, provisions a DAOS pool and DAOS POSIX container for the user – On 1st node, saves pool and container UUIDs in per-job dotfiles (in user‘s $HOME/.daos/) – Dfuse-mounts the POSIX container for the user 5. Lenovo‘s daos_mounts.sh epilog: – Umounts the POSIX container – On 1st node, destroys the pool, including its container(s) – On 1st node, removes the per-job dotfiles (from $HOME/.daos/), and stops daos_agent 2020 Lenovo. All rights reserved.
  • 5. Lenovo‘s Slurm Integration: daos_mounts.sh prolog # On the first node in the job (Slurm BatchHost): dmg pool create -s 768g -n 6390g -u $D_USER -g $D_GROUP > /tmp/dmg-pool-create.$D_USER.log 2>&1 LINE=`grep $UUID_MATCH /tmp/dmg-pool-create.$D_USER.log` [[ $LINE =~ $CREATE_MATCH ]] D_POOL=${BASH_REMATCH[1]} D_SVC=${BASH_REMATCH[2]} dmg pool update-acl --pool=$D_POOL -e 'A::root@:rw' # also -e 'A::daos@:rw‘ D_CONT=`uuidgen` su - $D_USER -c "daos cont create --pool=$D_POOL --svc=$D_SVC --cont=$D_CONT --type=POSIX“ # On all nodes in the job: su - $D_USER -c "dfuse --pool $D_POOL --svc=$D_SVC --container $D_CONT -m /daos/$D_USER" 2020 Lenovo. All rights reserved.
  • 6. DAOS Unified Namespace with Spectrum Scale (1/2) • DAOS „Unified Namespace“ Concept: 1. Store DAOS pool UUID and container UUID as extended attributes (XATTR‘s) of the „mount point“ directory 2. When this „mount point“ is in a global parallel filesystem, dfuse can use this instead of --pool and --container • IBM Spectrum Scale supports Extended Attributes (XATTR‘s), both for internal features and for user metadata – Stored in inode if small, or in „overflow“ EA block (≤64 kiB) $ daos cont create --pool=$D_POOL --svc=$D_SVC --cont=$D_CONT --type=POSIX --path /home/mhennecke/daos_tmp $ mmlsattr --dump-attr /home/mhennecke/daos_tmp file name: /home/mhennecke/daos_tmp user.daos $ mmlsattr --get-attr user.daos /home/mhennecke/daos_tmp file name: /home/mhennecke/daos_tmp user.daos: "DAOS.POSIX://c0c99a8c-5453-4950-9bbd-1d9d784b51c0/7b6ff2f2-b52d-4a25-8565-285006572c96?“ 2020 Lenovo. All rights reserved. ?
  • 7. DAOS Unified Namespace with Spectrum Scale (2/2) • daos can query the Spectrum Scale mountpoint directory‘s XATTR‘s on each node where the „containing“ Spectrum Scale filesystem is mounted: $ daos cont query --path /home/mhennecke/daos_tmp --svc 0 Pool UUID: c0c99a8c-5453-4950-9bbd-1d9d784b51c0 Container UUID: 7b6ff2f2-b52d-4a25-8565-285006572c96 Number of snapshots: 0 Latest Persistent Snapshot: 0 Highest Aggregated Epoch: 1605783539794720768 DAOS Unified Namespace Attributes on path /home/mhennecke/daos_tmp: Container Type: POSIX Object Class: SX Chunk Size: 1048576 • The dfuse mount command can use the path without --pool and --cont: $ dfuse -m /home/mhennecke/daos_tmp --svc 0 $ df|grep daos dfuse 13980468750 727 13980468024 1% /gpfs/gss1/home/mhennecke/daos_tmp 2020 Lenovo. All rights reserved.
  • 8. DAOS Unified Namespace with Spectrum Scale (2/2) • daos can query the Spectrum Scale mountpoint directory‘s XATTR‘s on each node where the „containing“ Spectrum Scale filesystem is mounted: $ daos cont query --path /home/mhennecke/daos_tmp --svc 0 Pool UUID: c0c99a8c-5453-4950-9bbd-1d9d784b51c0 Container UUID: 7b6ff2f2-b52d-4a25-8565-285006572c96 Number of snapshots: 0 Latest Persistent Snapshot: 0 Highest Aggregated Epoch: 1605783539794720768 DAOS Unified Namespace Attributes on path /home/mhennecke/daos_tmp: Container Type: POSIX Object Class: SX Chunk Size: 1048576 • The dfuse mount command can use the path without --pool and --cont: $ dfuse -m /home/mhennecke/daos_tmp --svc 0 $ df|grep daos dfuse 13980468750 727 13980468024 1% /gpfs/gss1/home/mhennecke/daos_tmp 2020 Lenovo. All rights reserved. The fusermount3 -u (unmount) fails if the Spectrum Scale file system is mounted with root-squash...
  • 9. Three Ways of POSIX Filesystem Support in DAOS Application / Framework DAOS library (libdaos) DAOS File System (libdfs) Interception Library (libioil) dfuse Single process address space DAOS Storage Engine RPC RDMA End-to-end userspace No system calls 2020 Lenovo. All rights reserved.
  • 10. DAOS ior-easy – 1 Server (P4610 3.2TB), 1 Client 2020 Lenovo. All rights reserved. 4MiB sequential transfers.
  • 11. DAOS ior-easy – 1 Server (P4610 3.2TB), N Clients API=DFS. 4MiB sequential transfers. Scaling the number of clients, 40 tasks/node. 2020 Lenovo. All rights reserved.
  • 12. DAOS IO500 – 1 Server (P4610 3.2TB), 10 Clients IO500 with API=DFS, and Mohamad‘s DFS-enabled find from mpifileutils: IO500 version io500-sc20_v3 [RESULT] ior-easy-write 20.754597 GiB/s : time 315.288 seconds [RESULT] mdtest-easy-write 586.050492 kIOPS : time 308.214 seconds [RESULT] ior-hard-write 8.015282 GiB/s : time 316.094 seconds [RESULT] mdtest-hard-write 120.679218 kIOPS : time 320.813 seconds [RESULT] find 328.553089 kIOPS : time 657.437 seconds [RESULT] ior-easy-read 21.865060 GiB/s : time 298.510 seconds [RESULT] mdtest-easy-stat 919.974294 kIOPS : time 192.983 seconds [RESULT] ior-hard-read 9.767739 GiB/s : time 258.846 seconds [RESULT] mdtest-hard-stat 532.842517 kIOPS : time 73.066 seconds [RESULT] mdtest-easy-delete 389.111423 kIOPS : time 467.045 seconds [RESULT] mdtest-hard-read 186.604589 kIOPS : time 207.250 seconds [RESULT] mdtest-hard-delete 370.730738 kIOPS : time 192.598 seconds [SCORE] Bandwidth 13.729175 GiB/s : IOPS 363.768691 kiops : TOTAL 70.669966 2020 Lenovo. All rights reserved. ior-hard used --dfs.chunk_size=470080 (10x the transferSize)
  • 13. DAOS IO500 – 1 Server (P4610 1.6TB), 10 Clients IO500 with API=DFS, and Mohamad‘s DFS-enabled find from mpifileutils: IO500 version io500-sc20_v3 [RESULT] ior-easy-write 14.385031 GiB/s : time 323.882 seconds [RESULT] mdtest-easy-write 583.580500 kIOPS : time 307.465 seconds [RESULT] ior-hard-write 7.995398 GiB/s : time 315.677 seconds [RESULT] mdtest-hard-write 120.932885 kIOPS : time 319.958 seconds [RESULT] find 326.808035 kIOPS : time 660.728 seconds [RESULT] ior-easy-read 21.764214 GiB/s : time 213.669 seconds [RESULT] mdtest-easy-stat 866.138808 kIOPS : time 204.900 seconds [RESULT] ior-hard-read 7.026412 GiB/s : time 358.190 seconds [RESULT] mdtest-hard-stat 511.331836 kIOPS : time 76.019 seconds [RESULT] mdtest-easy-delete 356.626439 kIOPS : time 510.617 seconds [RESULT] mdtest-hard-read 187.117240 kIOPS : time 207.295 seconds [RESULT] mdtest-hard-delete 367.907065 kIOPS : time 193.228 seconds [SCORE] Bandwidth 11.516139 GiB/s : IOPS 354.741268 kiops : TOTAL 63.915957 2020 Lenovo. All rights reserved. ior-hard used --dfs.chunk_size=470080 (10x the transferSize)
  • 14. DAOS IO500 – 9 Servers (P4610 1.6TB), 10 Clients IO500 with API=DFS, and Mohamad‘s DFS-enabled find from mpifileutils: IO500 version io500-sc20_v3 [RESULT] ior-easy-write 92.534946 GiB/s : time 308.122 seconds [RESULT] mdtest-easy-write 3135.992682 kIOPS : time 314.497 seconds [RESULT] ior-hard-write 50.643138 GiB/s : time 340.693 seconds [RESULT] mdtest-hard-write 690.243412 kIOPS : time 325.546 seconds [RESULT] find 1349.765412 kIOPS : time 881.429 seconds [RESULT] ior-easy-read 86.753938 GiB/s : time 328.958 seconds [RESULT] mdtest-easy-stat 2011.338604 kIOPS : time 482.063 seconds [RESULT] ior-hard-read 33.106979 GiB/s : time 520.042 seconds [RESULT] mdtest-hard-stat 1792.537325 kIOPS : time 126.983 seconds [RESULT] mdtest-easy-delete 1927.080799 kIOPS : time 515.165 seconds [RESULT] mdtest-hard-read 892.651111 kIOPS : time 252.147 seconds [RESULT] mdtest-hard-delete 1816.380945 kIOPS : time 235.734 seconds [SCORE] Bandwidth 60.570169 GiB/s : IOPS 1547.648039 kiops : TOTAL 306.172016 2020 Lenovo. All rights reserved. ior-hard used --dfs.chunk_size=470080 (10x the transferSize)