SlideShare ist ein Scribd-Unternehmen logo
1 von 15
DAOS For Applications
Mohamad Chaarawi
Extreme Scale Architecture & Development
Intel CorporationCloud & Enterprise Solutions Group 2
2
DAOS Storage Architecture
2
DAOS Storage Engine
Intel® Optane Memory
SPDK
NVMe
Interface
Metadata, low-latency I/Os &
indexing/query
Bulk data
3D-NAND/Optane SSD
PMDK
Memory
Interface
HDD
AI/Analytics/Simulation Workflow
DAOS library
POSIX I/O HDF5 Spark…
Compute Nodes
MPI-I/O Python
Libfabric • Low-latency & high-message-rate communications
• Native support for RDMA & scalable collective operations
• Support for iWarp, RoCE, Infiniband, OPA, Slingshot, …
• DAOS library directly linked with the applications
• No need for dedicated cores
• Low memory/CPU footprint
• End-to-end OS bypass
• Non-blocking, lockless, snapshot support, …
• Fine-grained I/O with media selection strategy
• Only application data on SSD to maximize throughput
• Small I/Os aggregated in pmem & migrated to SSD in large
chunks
• Full userspace model with no system calls on I/O path
• Built-in storage management infrastructure (control plane)
• NFSv4-like ACL
Delivers high-IOPS, high-bandwidth and low-latency
storage with advanced features in a single tier
Storage Nodes
Intel CorporationCloud & Enterprise Solutions Group 3
Aggregate related datasets into manageable and
coherent entities
• Distributed consistency & automated recovery
• Full Versioning
• Simplified data management
• Snapshot
• Cross-tier Migration
• Indexing
Storage Containers
DAOS Container
datadatadatafile
dir
datadatafile
dir
datadatadatadatafile
dir
root
Encapsulated POSIX Namespace File-per-process
DAOS Container
datadatadatadatafile
datadatadatadatafile
datadatadatadatafile
datadatadatadatafile
DAOS Container
datadatadatadataset
group
datadatadataset
group
datadatadatadatadataset
group
group
HDF5 « File » Key-value store
Graph
DAOS Container
valuekey
valuekey
valuekey
valuekey valuekey
DAOS Container
node
node
node
node
node
node
DAOS Container
Columnar Database
key
key
key
key
Value
Value
Value
Value
Value
Value
Value
Value
Intel CorporationCloud & Enterprise Solutions Group 4
 Native support for structured, semi-structured & unstructured data
models
• Built on top of DCPMM
• Unconstrained by POSIX serialization
• Custom attributes
• Data access time orders of magnitude faster (µs)
• Scalable concurrent updates & high IOPS
• Non-blocking
• Enable in-storage computing
DAOS Objects
DAOS Storage Engine
Open Source Apache 2.0 License
Data Model Library/Framework
Array KV Store Multi-level KV Store
key1
val1
key3
val3
@
@
Application
NVMe SSD
DAOS
key1
val1
root
@
key3
val3
@
val2
key2
@@
@
@
val2 con’d
val2
key2
@
Application
Intel CorporationCloud & Enterprise Solutions Group 5
5
DAOS Tools
Tool dmg daos
Target Administrators Users
Lustre Equivalent lctl/mkfs/mount/IML lfs
Functionality • Storage provisioning
• Burn-in
• Firmware update
• Data plane mgmt & monitoring
• Configure/monitor scrubbing
• Pool mgmt
• Telemetry
• Pool query
• Container mgmt
• Unified namespace mgmt
• Container user attributes
• Snapshots
• Object debugging
• POSIX container configuration
Intel CorporationCloud & Enterprise Solutions Group 6
6
Application Interface
Dataset
Mover
Capacity Tier
PFS, S3, HSM, …
DAOS Storage Engine
Open Source Apache 2.0 License
POSIX I/O
HPC APPs
HDF5 MPI-IO Python
Apache
Spark
Apache
Arrow
Analytics/AI APPs
TensorFlowSEGY
Intel CorporationCloud & Enterprise Solutions Group 7
 DAOS API is new and very flexible
• Multi-level keys
• Different value types supported
• Can build all data models / IO middleware on top of it
 Most applications still based on POSIX
• Need a smooth migration path with little to no application changes
• Quick path to realize performance of DCPMM & DAOS
 POSIX implemented as a middleware instead of being the
building block of all data models.
POSIX I/O Support
Application / Framework
DAOS library (libdaos)
DAOS File System (libdfs)
Interception Library
dfuse
Single process address space
DAOS Storage Engine
RPC RDMA
End-to-end
user space
No system calls
Intel® QLC
3D Nand
SSD
Intel CorporationCloud & Enterprise Solutions Group 8
MPI-IO Driver for DAOS
The DAOS MPI-IO driver is implemented within the
I/O library in MPICH (ROMIO).
• Added as an ADIO driver
• Portable to Open-MPI, Intel MPI, etc.
• https://github.com/pmodels/mpich
MPI Files use the same DFS mapping to the
DAOS Object Model
• MPI Files can be accessed through the DFS
API
• MPI Files can be accessed through regular
POSIX with a dfuse mount over the container.
Application works seamlessly by just specifying the use of the driver by appending “daos:” to the path.
MPI-IO ROMIO driver (https://github.com/pmodels/mpich/tree/master/src/mpi/romio/adio/ad_daos)
POSIX / MPI-IO File
DAOS Byte Array Object
Special DAOS Object:
 1 Level Key
 1 Byte records
 Configurable Chunk Size
Intel CorporationCloud & Enterprise Solutions Group 9The information on this page is subject to the use and disclosure restrictions provided on the second page to this document.
HDF5 VOL Architecture



 Three main components:
• HDF5 Library
• DAOS VOL Connector
• (External) HDF5 Test Suite
HDF5 API
VOL Layer
VFD Layer
Native VOL
DAOS
VOL
SEC2
MPIO
File System DAOS
HDF5 Tools Test Suite
New Component
…
…
DAOS APIPOSIX API
Core HDF5
Library
External
Test
Suite
External
VOL
Connector
Enhanced Component
Native Component
Through
MPI I/O
Intel CorporationCloud & Enterprise Solutions Group 10
 No longer requires separate version of HDF5
• Compatible with main develop branch of HDF5
• Compatible with 1.12.x release series of HDF5 with VOL support
 Currently supported features
• New H5M MAP API to expose K/V interface to HDF5 users
• Variable length datatypes are now also supported
• Chunking is recommended storage layout to get most of DAOS performance
• Tools support (h5dump, h5ls, h5repack etc)
 Coming by end of the year
• Independent metadata writes (= independent object creation)
• Asynchronous I/O
 Available from: https://git.hdfgroup.org/projects/HDF5VOL/repos/daos-vol/
• See user’s guide for more detailed list of supported features
HDF5 DAOS VOL Connector – Current Status
Intel CorporationCloud & Enterprise Solutions Group 11
11
Unified Namespace
DAOS Object Store:
 Addresses pools, containers with uuids; objects with 128-bit IDs.
 Applications/Users are used to access files / directories in a traditional namespace
Unified Namespace allows users to create links between a file/dir in a system
namespace to DAOS pools & containers:
 daos container create --path=/mnt/project1/userA/NS1 --pool=uuid --
type=POSIX/HDF5/etc.
 Path created above becomes a special file or directory (depending on container type) with
an extended attribute with the pool and container information.
 Accessing that path from DAOS aware middleware will make the link on the fly with the
DAOS UNS library.
Intel CorporationCloud & Enterprise Solutions Group 12
12
Spark Input/Output Support
DAOS
FileSystem
DAOS POSIX API
Java Wrapper
HDFS
Hadoop FileSystem Abstract Class
Disks
DAOS
Storage
DAOS Library, DPDK/PMDK/SPDK used, Kernel bypass, zero copy
DAOS/Arrow
Wrapper
Apache Arrow Data Source
DAOS API
Wrapper
DAOS Native Data Source
(Parquet, ORC, etc.)
Scan/Parse
Project, Join, ML, etc.
Native Scan/Parse
(CSV, Parquet, ORC, etc.)
Current Spark
Implemented
Planned
Intel CorporationCloud & Enterprise Solutions Group 13
 Pythonic bindings called pydaos
• Export key-value store objects
• Integrated with python dictionaries
• Support python iterator, direct assignments, …
• Scalable & performant
• Bulk insert/retrieve
• Core written in C
• Python 2.7 & 3 support
 Python integration
• dbm
• pyprob
 TODO
• Expose snapshots
• Integration with NumPy, …
Python Support
Intel CorporationCloud & Enterprise Solutions Group 14
Resources
 Source code on GitHub
• https://github.com/daos-stack/daos
 Documentation
• http://daos.io
Community mailing list
• https://daos.groups.io
DAOS solution brief
• https://www.intel.com/content/www/us/en/high-performance-computing/
15

Weitere ähnliche Inhalte

Was ist angesagt?

[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...OpenStack Korea Community
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanCeph Community
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Databricks
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Ceph Community
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Flink Forward
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Christo kutrovsky oracle, memory & linux
Christo kutrovsky   oracle, memory & linuxChristo kutrovsky   oracle, memory & linux
Christo kutrovsky oracle, memory & linuxKyle Hailey
 
Getting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterGetting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterScyllaDB
 

Was ist angesagt? (20)

[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Ceph on arm64 upload
Ceph on arm64   uploadCeph on arm64   upload
Ceph on arm64 upload
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Christo kutrovsky oracle, memory & linux
Christo kutrovsky   oracle, memory & linuxChristo kutrovsky   oracle, memory & linux
Christo kutrovsky oracle, memory & linux
 
Getting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterGetting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers Faster
 

Ähnlich wie DAOS Middleware overview

VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld
 
DELLEMC_Portfolio_hyperlinks_Complete
DELLEMC_Portfolio_hyperlinks_CompleteDELLEMC_Portfolio_hyperlinks_Complete
DELLEMC_Portfolio_hyperlinks_CompleteDELLEMC Technologies
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Hortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleHortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleAbhishek Sood
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansAndrey Kudryavtsev
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Red Hat Storage Day Dallas - Defiance of the Appliance
Red Hat Storage Day Dallas - Defiance of the Appliance Red Hat Storage Day Dallas - Defiance of the Appliance
Red Hat Storage Day Dallas - Defiance of the Appliance Red_Hat_Storage
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredNETWAYS
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Redis Labs
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container EcosystemVinay Rao
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic StoragePatrick Bouillaud
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1
 

Ähnlich wie DAOS Middleware overview (20)

VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
VMworld 2015: The Future of Software- Defined Storage- What Does it Look Like...
 
DELLEMC_Portfolio_hyperlinks_Complete
DELLEMC_Portfolio_hyperlinks_CompleteDELLEMC_Portfolio_hyperlinks_Complete
DELLEMC_Portfolio_hyperlinks_Complete
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Hortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum ScaleHortonworks Data Platform with IBM Spectrum Scale
Hortonworks Data Platform with IBM Spectrum Scale
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
DUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution PlansDUG'20: 13 - HPE’s DAOS Solution Plans
DUG'20: 13 - HPE’s DAOS Solution Plans
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Red Hat Storage Day Dallas - Defiance of the Appliance
Red Hat Storage Day Dallas - Defiance of the Appliance Red Hat Storage Day Dallas - Defiance of the Appliance
Red Hat Storage Day Dallas - Defiance of the Appliance
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
 
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
Moving Beyond Cache by Yiftach Shoolman Redis Labs - Redis Day Seattle 2020
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
IBM Platform Computing Elastic Storage
IBM Platform Computing  Elastic StorageIBM Platform Computing  Elastic Storage
IBM Platform Computing Elastic Storage
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
{code} and containers
{code} and containers{code} and containers
{code} and containers
 

Mehr von Andrey Kudryavtsev

DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterAndrey Kudryavtsev
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...Andrey Kudryavtsev
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesAndrey Kudryavtsev
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateAndrey Kudryavtsev
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingAndrey Kudryavtsev
 
DUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSDUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSAndrey Kudryavtsev
 
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabDUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabAndrey Kudryavtsev
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedAndrey Kudryavtsev
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateAndrey Kudryavtsev
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSAndrey Kudryavtsev
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraAndrey Kudryavtsev
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateAndrey Kudryavtsev
 

Mehr von Andrey Kudryavtsev (12)

DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation CenterDUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
DUG'20: 12 - DAOS in Lenovo’s HPC Innovation Center
 
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
DUG'20: 11 - Platform Performance Evolution from bring-up to reaching link sa...
 
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage ArchitecturesDUG'20: 10 - Storage Orchestration for Composable Storage Architectures
DUG'20: 10 - Storage Orchestration for Composable Storage Architectures
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware Update
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY Mapping
 
DUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOSDUG'20: 07 - Storing High-Energy Physics data in DAOS
DUG'20: 07 - Storing High-Energy Physics data in DAOS
 
DUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN OpenlabDUG'20: 06 - DAOS Adventures at CERN Openlab
DUG'20: 06 - DAOS Adventures at CERN Openlab
 
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS TestbedDUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
DUG'20: 05 - Very Early Experiences with a 0.5 PByte DAOS Testbed
 
DUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature UpdateDUG'20: 04 - DAOS Feature Update
DUG'20: 04 - DAOS Feature Update
 
DUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOSDUG'20: 03 - Online compression with QAT in DAOS
DUG'20: 03 - Online compression with QAT in DAOS
 
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on AuroraDUG'20: 02 - Accelerating apache spark with DAOS on Aurora
DUG'20: 02 - Accelerating apache spark with DAOS on Aurora
 
DUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS UpdateDUG'20: 01 - Welcome & DAOS Update
DUG'20: 01 - Welcome & DAOS Update
 

Kürzlich hochgeladen

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 

Kürzlich hochgeladen (20)

Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 

DAOS Middleware overview

  • 1. DAOS For Applications Mohamad Chaarawi Extreme Scale Architecture & Development
  • 2. Intel CorporationCloud & Enterprise Solutions Group 2 2 DAOS Storage Architecture 2 DAOS Storage Engine Intel® Optane Memory SPDK NVMe Interface Metadata, low-latency I/Os & indexing/query Bulk data 3D-NAND/Optane SSD PMDK Memory Interface HDD AI/Analytics/Simulation Workflow DAOS library POSIX I/O HDF5 Spark… Compute Nodes MPI-I/O Python Libfabric • Low-latency & high-message-rate communications • Native support for RDMA & scalable collective operations • Support for iWarp, RoCE, Infiniband, OPA, Slingshot, … • DAOS library directly linked with the applications • No need for dedicated cores • Low memory/CPU footprint • End-to-end OS bypass • Non-blocking, lockless, snapshot support, … • Fine-grained I/O with media selection strategy • Only application data on SSD to maximize throughput • Small I/Os aggregated in pmem & migrated to SSD in large chunks • Full userspace model with no system calls on I/O path • Built-in storage management infrastructure (control plane) • NFSv4-like ACL Delivers high-IOPS, high-bandwidth and low-latency storage with advanced features in a single tier Storage Nodes
  • 3. Intel CorporationCloud & Enterprise Solutions Group 3 Aggregate related datasets into manageable and coherent entities • Distributed consistency & automated recovery • Full Versioning • Simplified data management • Snapshot • Cross-tier Migration • Indexing Storage Containers DAOS Container datadatadatafile dir datadatafile dir datadatadatadatafile dir root Encapsulated POSIX Namespace File-per-process DAOS Container datadatadatadatafile datadatadatadatafile datadatadatadatafile datadatadatadatafile DAOS Container datadatadatadataset group datadatadataset group datadatadatadatadataset group group HDF5 « File » Key-value store Graph DAOS Container valuekey valuekey valuekey valuekey valuekey DAOS Container node node node node node node DAOS Container Columnar Database key key key key Value Value Value Value Value Value Value Value
  • 4. Intel CorporationCloud & Enterprise Solutions Group 4  Native support for structured, semi-structured & unstructured data models • Built on top of DCPMM • Unconstrained by POSIX serialization • Custom attributes • Data access time orders of magnitude faster (µs) • Scalable concurrent updates & high IOPS • Non-blocking • Enable in-storage computing DAOS Objects DAOS Storage Engine Open Source Apache 2.0 License Data Model Library/Framework Array KV Store Multi-level KV Store key1 val1 key3 val3 @ @ Application NVMe SSD DAOS key1 val1 root @ key3 val3 @ val2 key2 @@ @ @ val2 con’d val2 key2 @ Application
  • 5. Intel CorporationCloud & Enterprise Solutions Group 5 5 DAOS Tools Tool dmg daos Target Administrators Users Lustre Equivalent lctl/mkfs/mount/IML lfs Functionality • Storage provisioning • Burn-in • Firmware update • Data plane mgmt & monitoring • Configure/monitor scrubbing • Pool mgmt • Telemetry • Pool query • Container mgmt • Unified namespace mgmt • Container user attributes • Snapshots • Object debugging • POSIX container configuration
  • 6. Intel CorporationCloud & Enterprise Solutions Group 6 6 Application Interface Dataset Mover Capacity Tier PFS, S3, HSM, … DAOS Storage Engine Open Source Apache 2.0 License POSIX I/O HPC APPs HDF5 MPI-IO Python Apache Spark Apache Arrow Analytics/AI APPs TensorFlowSEGY
  • 7. Intel CorporationCloud & Enterprise Solutions Group 7  DAOS API is new and very flexible • Multi-level keys • Different value types supported • Can build all data models / IO middleware on top of it  Most applications still based on POSIX • Need a smooth migration path with little to no application changes • Quick path to realize performance of DCPMM & DAOS  POSIX implemented as a middleware instead of being the building block of all data models. POSIX I/O Support Application / Framework DAOS library (libdaos) DAOS File System (libdfs) Interception Library dfuse Single process address space DAOS Storage Engine RPC RDMA End-to-end user space No system calls Intel® QLC 3D Nand SSD
  • 8. Intel CorporationCloud & Enterprise Solutions Group 8 MPI-IO Driver for DAOS The DAOS MPI-IO driver is implemented within the I/O library in MPICH (ROMIO). • Added as an ADIO driver • Portable to Open-MPI, Intel MPI, etc. • https://github.com/pmodels/mpich MPI Files use the same DFS mapping to the DAOS Object Model • MPI Files can be accessed through the DFS API • MPI Files can be accessed through regular POSIX with a dfuse mount over the container. Application works seamlessly by just specifying the use of the driver by appending “daos:” to the path. MPI-IO ROMIO driver (https://github.com/pmodels/mpich/tree/master/src/mpi/romio/adio/ad_daos) POSIX / MPI-IO File DAOS Byte Array Object Special DAOS Object:  1 Level Key  1 Byte records  Configurable Chunk Size
  • 9. Intel CorporationCloud & Enterprise Solutions Group 9The information on this page is subject to the use and disclosure restrictions provided on the second page to this document. HDF5 VOL Architecture     Three main components: • HDF5 Library • DAOS VOL Connector • (External) HDF5 Test Suite HDF5 API VOL Layer VFD Layer Native VOL DAOS VOL SEC2 MPIO File System DAOS HDF5 Tools Test Suite New Component … … DAOS APIPOSIX API Core HDF5 Library External Test Suite External VOL Connector Enhanced Component Native Component Through MPI I/O
  • 10. Intel CorporationCloud & Enterprise Solutions Group 10  No longer requires separate version of HDF5 • Compatible with main develop branch of HDF5 • Compatible with 1.12.x release series of HDF5 with VOL support  Currently supported features • New H5M MAP API to expose K/V interface to HDF5 users • Variable length datatypes are now also supported • Chunking is recommended storage layout to get most of DAOS performance • Tools support (h5dump, h5ls, h5repack etc)  Coming by end of the year • Independent metadata writes (= independent object creation) • Asynchronous I/O  Available from: https://git.hdfgroup.org/projects/HDF5VOL/repos/daos-vol/ • See user’s guide for more detailed list of supported features HDF5 DAOS VOL Connector – Current Status
  • 11. Intel CorporationCloud & Enterprise Solutions Group 11 11 Unified Namespace DAOS Object Store:  Addresses pools, containers with uuids; objects with 128-bit IDs.  Applications/Users are used to access files / directories in a traditional namespace Unified Namespace allows users to create links between a file/dir in a system namespace to DAOS pools & containers:  daos container create --path=/mnt/project1/userA/NS1 --pool=uuid -- type=POSIX/HDF5/etc.  Path created above becomes a special file or directory (depending on container type) with an extended attribute with the pool and container information.  Accessing that path from DAOS aware middleware will make the link on the fly with the DAOS UNS library.
  • 12. Intel CorporationCloud & Enterprise Solutions Group 12 12 Spark Input/Output Support DAOS FileSystem DAOS POSIX API Java Wrapper HDFS Hadoop FileSystem Abstract Class Disks DAOS Storage DAOS Library, DPDK/PMDK/SPDK used, Kernel bypass, zero copy DAOS/Arrow Wrapper Apache Arrow Data Source DAOS API Wrapper DAOS Native Data Source (Parquet, ORC, etc.) Scan/Parse Project, Join, ML, etc. Native Scan/Parse (CSV, Parquet, ORC, etc.) Current Spark Implemented Planned
  • 13. Intel CorporationCloud & Enterprise Solutions Group 13  Pythonic bindings called pydaos • Export key-value store objects • Integrated with python dictionaries • Support python iterator, direct assignments, … • Scalable & performant • Bulk insert/retrieve • Core written in C • Python 2.7 & 3 support  Python integration • dbm • pyprob  TODO • Expose snapshots • Integration with NumPy, … Python Support
  • 14. Intel CorporationCloud & Enterprise Solutions Group 14 Resources  Source code on GitHub • https://github.com/daos-stack/daos  Documentation • http://daos.io Community mailing list • https://daos.groups.io DAOS solution brief • https://www.intel.com/content/www/us/en/high-performance-computing/
  • 15. 15

Hinweis der Redaktion

  1. Overall DAOS architecture Distributed object store built from the ground up for NVM technology. Includes 2 types of storage: SSD with different media, 3d-NAND, Optane. Available with NVMe interface, provide fast block-based storage Intel developing new user-space nvme driver, SPDK Pmem, Optane pmem. Fine grained storage directly memory mapped. Can do direct load store to the storage. No block abstraction Intel developing new storage stack PMDK to manage transactional update to pmem We don’t have any Object store to leverage these object stores. Lustre, GPFS built into kernel. Ceph still uses kernel IOs by going through Linux block layer. Here everything in userspace. DAOS new OS built on top of pmdk and spdk. highIOPS high BW. More advanced storage API, not just byte arrays, KV, etc. Still need to look into communication. Reuse several decade effort to optimize latency and BW for MPI. Re-use same communication middleware as MPI (libfabric). Client side, very lightweight with support to many middleware. All metadata stored on pmem. Small IOs land in pmem and later aggregated into SSD in the background to free space in Pmem. Application large data go in the SSD.
  2. Baseline API based on Storage container -> Object address space inside a pool. Container can be encapsulated POSIX namespace, or just a flat one. HDF5 has it’s own representations for a single HDF5. KV store can be one. Database, ACG 1 graph Container provides a baseline API with object. Unit of storage management is the container not a file. When you have millions of file, data migration is done on all those. We Simplified that. Fewer number of containers than files, so we can index them. Data migration of entire container. unit of Snapshot. Snapshot not typically available traditionally (used to be at FS level) but now at container level and exposed to user
  3. POSIX not the foundation any more. New key-array store API provided by DAOS with advanced features (ad-hoc concurrency control, snapshot support, asynchronous API, ..). DAOS API to be directly integrated with rich data model for better performance and extended capabilities. POSIX (i.e. File) support built as a middleware over the DAOS API. DAOS has a native KV API which means that the wire protocol is KV fetch/lookup. On the server, the tree structure is maintained in DCPM and values are either stored in DCPM or SSD depending on sizes. No file serialization is required. A client can directly lookup or insert a key/value pair over the wire and this can happen from many clients concurrently. Keys can be ordered, so we support range query. KV store are partitioned/sharded/replicated/erasure coded over multiple servers for performance & resilience. DAOS can also support complex query since it understand the key-value structure. The native indexing can also be augmented with ad-hoc application-driven index.
  4. Spark Integration with DAOS – different team at intel Written Hadoop filesystem abstraction class for DAOS. Integrated in DAOS source code. Use DAOS in user-space through libdfs transparently with this Hadoop connector. More value/acceleration to Spark, by integrating DAOS as a datasource through parquet / apache arrow, or through native DAOS API.
  5. Native Python bindings (pydaos) integrate KV store object directly with python dictionaries. Very pythonic API, you can have iterator, direct assignments, bulk insertion/retrieve Allow people who are not very familiar with C using high level language. Some integration with dbm, pyprob. More work to do