SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Webinar:
EfïŹcient Data Loading for
Model Training on AWS
Greg Palmer
greg.palmer@alluxio.com
October 3rd, 2023
Adoption of ArtiïŹcial Intelligence (AI)
● 49% of CIOs are using or plan to use AI[1]
● Recent boom of generative AI is accelerating adoption
2
● Successful AI projects require access to data
● As AI use cases grow more complex

○ Understanding data access patterns becomes more
important
[1] Gartner, “2023 Gartner CIO survey”
Barriers to Implementing AI[2]
3
[2] Gartner, “2021 Gartner AI in Organizations survey”
How Access to Data Hinders the
Success of AI
4
● High-quality AI models require access to massive datasets
● Data access is slow and costly[3]
● Increasing size of models slows down application performance
● Limited availability of GPUs necessitates remote data transfer[4]
● GPUs waiting for data, results in underutilized GPUs
[3] AI and compute, https://openai.com/research/ai-and-compute [4] Amazon EC2 P4 Instances, https://aws.amazon.com/ec2/instance-types/p4
Data Access Patterns
5
Data Access Patterns in the ML Pipeline
6
Data Access Patterns in Model Training
7
Cloud Data Access Patterns:
Training on Unstructured Datasets
8
Cloud Data Access Patterns:
Training on Structured Datasets
9
Cloud Data Access Patterns:
Multi-cloud/Multi-region Data Access
10
Data Access Solutions Should Support:
11
● High performance and throughput for ML workloads
● Dataset management, including load/unload/update of data from the data
lake
● Cloud-native capabilities, such as multi-tenancy, scalability, and elasticity
● Eliminate data redundancy to avoid managing multiple copies of data
● Reduced dependency on specialized networking hardware
● Flexibility to place compute anywhere, regardless of the location of the data
● Agnostic to cloud service providers to avoid vendor lock-in
● Future-proofing to adapt to advancements in storage and computation
technologies
● Security, including consistent authentication and authorization
Alluxio-powered Data Access Across
the ML Pipeline
12
Data Access for Model
Training on AWS[5]
[5] Amazon AWS Docs: https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html
Data Access for Model Training on AWS
S3 File Mode
14
14
Instance Filesystem:
/opt/ml/input/data/training-channel
Training Instance
Training Script Process
(train.py)
Copy dataset ahead of time
Read
Data Access for Model Training on AWS
S3 FastFile Mode
15
15
Instance Filesystem:
/opt/ml/input/data/training-channel
Training Script Process
(train.py)
Stream in real-time
Read
FUSE Process
mount
Training Instance
Data Access for Model Training on AWS
S3 Pipe Mode
16
16
Instance Filesystem:
/opt/ml/input/data/training-channel
Training Script Process
(train.py)
Stream in real-time
Read
Training Instance
Data Access for Model Training on AWS
Amazon Fsx for Lustre
17
17
Instance Filesystem:
/opt/ml/input/data/training-channel
Training Script Process
(train.py)
Read through
Read
Fsx for Lustre
mount
Training Instance
Data Access for Model Training on AWS
Amazon EFS Filesystem
18
18
Instance Filesystem:
/opt/ml/input/data/training-channel
Training Script Process
(train.py)
Read
EFS
mount
Training Instance
Alluxio for Data Access in
Model Training on AWS
Model Training
Alluxio on AWS - Reference Architecture
Model Serving
Inference cluster
Models
Training Data
Models
1
2
3
4
5
Alluxio
Training cluster
Training Data
2
20
Alluxio
Alluxio on AWS Provides:
21
● Automatically load / unload / update data from your existing data lake
● Faster access to training data informed by data access patterns
● Maintain optimal data access with high data throughput to keep the GPU fully
utilized
● Deploy models faster and provides high concurrency model serving to inference
nodes
● Increase the productivity of the data engineering team by eliminating the need to
manage data copies
● Reduce cloud storage API and egress costs, such as the cost of S3 GET requests,
data transfer costs, etc.
Model Training
Alluxio on AWS - Reference Architecture
Model Serving
Inference cluster
Models
Training Data
Models
1
2
3
4
5
Alluxio
Training cluster
Training Data
2
22
Alluxio
GCP
Alluxio Model Training
Demonstration
Alluxio Demo Environment
24
Local Folder / Dataset
GPU Training
Storage
Kubernetes
Interactive
Notebook
Alluxio
Operator
Visualization
Dashboard
Alluxio
Alluxio Demo 

25
Alluxio AWS Model Training Demo - Recording
Alluxio FUSE vs AWS S3 FUSE Demo - Recording
Alluxio APIs Demo - Recording
26
Training Directly from Storage
- > 80% of total time is spent in DataLoader
- Result in Low GPU Utilization Rate (<20%)
Visualization Dashboard Results (w/o Alluxio)
27
Visualization Dashboard Results (with Alluxio)
Training with Alluxio
- Reduced DataLoader Rate from 82% to 1% (82X)
- Increase GPU Utilization Rate from 17% to 93% (5X)
Q&A
twitter.com/alluxio slackin.alluxio.io
linkedin.com/alluxio
www.alluxio.io
JOIN THE CONVERSATION
ON SLACK
ALLUXIO.IO/SLACK

Weitere Àhnliche Inhalte

Ähnlich wie Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS

AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
Deconstructing a Machine Learning Pipeline with Virtual Data Lake
Deconstructing a Machine Learning Pipeline with Virtual Data LakeDeconstructing a Machine Learning Pipeline with Virtual Data Lake
Deconstructing a Machine Learning Pipeline with Virtual Data LakeAlluxio, Inc.
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionFabian Hadiji
 
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Codemotion
 
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...Trivadis
 
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning modelTrain, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning modelCloudera Japan
 
File Repository on GAE
File Repository on GAEFile Repository on GAE
File Repository on GAElynneblue
 
Accelerating Cloud Training With Alluxio
Accelerating Cloud Training With AlluxioAccelerating Cloud Training With Alluxio
Accelerating Cloud Training With AlluxioAlluxio, Inc.
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOSQAware GmbH
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28Amazon Web Services
 
AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics Ruben Pertusa Lopez
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at ScaleDatabricks
 
Big Data Driven At Eway
Big Data Driven At Eway Big Data Driven At Eway
Big Data Driven At Eway Tu Pham
 
Migrating Large Scale Datasets
Migrating Large Scale DatasetsMigrating Large Scale Datasets
Migrating Large Scale DatasetsAmazon Web Services
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
Keepler | IoT Analytics & AI on Edge Computing
Keepler | IoT Analytics & AI on Edge ComputingKeepler | IoT Analytics & AI on Edge Computing
Keepler | IoT Analytics & AI on Edge ComputingKeepler Data Tech
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceAWS Germany
 

Ähnlich wie Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS (20)

AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
Deconstructing a Machine Learning Pipeline with Virtual Data Lake
Deconstructing a Machine Learning Pipeline with Virtual Data LakeDeconstructing a Machine Learning Pipeline with Virtual Data Lake
Deconstructing a Machine Learning Pipeline with Virtual Data Lake
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to production
 
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
 
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
Azure Days 2019: Wie bringt man eine Data Analytics Plattform in die Cloud? (...
 
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning modelTrain, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
 
File Repository on GAE
File Repository on GAEFile Repository on GAE
File Repository on GAE
 
Accelerating Cloud Training With Alluxio
Accelerating Cloud Training With AlluxioAccelerating Cloud Training With Alluxio
Accelerating Cloud Training With Alluxio
 
contentDM
contentDMcontentDM
contentDM
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
Simplify Machine Learning with the Deep Learning AMI | AWS Floor28
 
AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
Big Data Driven At Eway
Big Data Driven At Eway Big Data Driven At Eway
Big Data Driven At Eway
 
Migrating Large Scale Datasets
Migrating Large Scale DatasetsMigrating Large Scale Datasets
Migrating Large Scale Datasets
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Keepler | IoT Analytics & AI on Edge Computing
Keepler | IoT Analytics & AI on Edge ComputingKeepler | IoT Analytics & AI on Edge Computing
Keepler | IoT Analytics & AI on Edge Computing
 
Query your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performanceQuery your data in S3 with SQL and optimize for cost and performance
Query your data in S3 with SQL and optimize for cost and performance
 

Mehr von Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio, Inc.
 
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio, Inc.
 

Mehr von Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
 
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AI
 

KĂŒrzlich hochgeladen

Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
(Genuine) Escort Service Lucknow | Starting â‚č,5K To @25k with A/C đŸ§‘đŸœâ€â€ïžâ€đŸ§‘đŸ» 89...
(Genuine) Escort Service Lucknow | Starting â‚č,5K To @25k with A/C đŸ§‘đŸœâ€â€ïžâ€đŸ§‘đŸ» 89...(Genuine) Escort Service Lucknow | Starting â‚č,5K To @25k with A/C đŸ§‘đŸœâ€â€ïžâ€đŸ§‘đŸ» 89...
(Genuine) Escort Service Lucknow | Starting â‚č,5K To @25k with A/C đŸ§‘đŸœâ€â€ïžâ€đŸ§‘đŸ» 89...gurkirankumar98700
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto GonzĂĄlez Trastoy
 

KĂŒrzlich hochgeladen (20)

Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Call Girls In Mukherjee Nagar đŸ“± 9999965857 đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar đŸ“±  9999965857  đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar đŸ“±  9999965857  đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar đŸ“± 9999965857 đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
(Genuine) Escort Service Lucknow | Starting â‚č,5K To @25k with A/C đŸ§‘đŸœâ€â€ïžâ€đŸ§‘đŸ» 89...
(Genuine) Escort Service Lucknow | Starting â‚č,5K To @25k with A/C đŸ§‘đŸœâ€â€ïžâ€đŸ§‘đŸ» 89...(Genuine) Escort Service Lucknow | Starting â‚č,5K To @25k with A/C đŸ§‘đŸœâ€â€ïžâ€đŸ§‘đŸ» 89...
(Genuine) Escort Service Lucknow | Starting â‚č,5K To @25k with A/C đŸ§‘đŸœâ€â€ïžâ€đŸ§‘đŸ» 89...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS

  • 1. Webinar: EfïŹcient Data Loading for Model Training on AWS Greg Palmer greg.palmer@alluxio.com October 3rd, 2023
  • 2. Adoption of ArtiïŹcial Intelligence (AI) ● 49% of CIOs are using or plan to use AI[1] ● Recent boom of generative AI is accelerating adoption 2 ● Successful AI projects require access to data ● As AI use cases grow more complex
 ○ Understanding data access patterns becomes more important [1] Gartner, “2023 Gartner CIO survey”
  • 3. Barriers to Implementing AI[2] 3 [2] Gartner, “2021 Gartner AI in Organizations survey”
  • 4. How Access to Data Hinders the Success of AI 4 ● High-quality AI models require access to massive datasets ● Data access is slow and costly[3] ● Increasing size of models slows down application performance ● Limited availability of GPUs necessitates remote data transfer[4] ● GPUs waiting for data, results in underutilized GPUs [3] AI and compute, https://openai.com/research/ai-and-compute [4] Amazon EC2 P4 Instances, https://aws.amazon.com/ec2/instance-types/p4
  • 6. Data Access Patterns in the ML Pipeline 6
  • 7. Data Access Patterns in Model Training 7
  • 8. Cloud Data Access Patterns: Training on Unstructured Datasets 8
  • 9. Cloud Data Access Patterns: Training on Structured Datasets 9
  • 10. Cloud Data Access Patterns: Multi-cloud/Multi-region Data Access 10
  • 11. Data Access Solutions Should Support: 11 ● High performance and throughput for ML workloads ● Dataset management, including load/unload/update of data from the data lake ● Cloud-native capabilities, such as multi-tenancy, scalability, and elasticity ● Eliminate data redundancy to avoid managing multiple copies of data ● Reduced dependency on specialized networking hardware ● Flexibility to place compute anywhere, regardless of the location of the data ● Agnostic to cloud service providers to avoid vendor lock-in ● Future-proofing to adapt to advancements in storage and computation technologies ● Security, including consistent authentication and authorization
  • 12. Alluxio-powered Data Access Across the ML Pipeline 12
  • 13. Data Access for Model Training on AWS[5] [5] Amazon AWS Docs: https://docs.aws.amazon.com/sagemaker/latest/dg/model-access-training-data.html
  • 14. Data Access for Model Training on AWS S3 File Mode 14 14 Instance Filesystem: /opt/ml/input/data/training-channel Training Instance Training Script Process (train.py) Copy dataset ahead of time Read
  • 15. Data Access for Model Training on AWS S3 FastFile Mode 15 15 Instance Filesystem: /opt/ml/input/data/training-channel Training Script Process (train.py) Stream in real-time Read FUSE Process mount Training Instance
  • 16. Data Access for Model Training on AWS S3 Pipe Mode 16 16 Instance Filesystem: /opt/ml/input/data/training-channel Training Script Process (train.py) Stream in real-time Read Training Instance
  • 17. Data Access for Model Training on AWS Amazon Fsx for Lustre 17 17 Instance Filesystem: /opt/ml/input/data/training-channel Training Script Process (train.py) Read through Read Fsx for Lustre mount Training Instance
  • 18. Data Access for Model Training on AWS Amazon EFS Filesystem 18 18 Instance Filesystem: /opt/ml/input/data/training-channel Training Script Process (train.py) Read EFS mount Training Instance
  • 19. Alluxio for Data Access in Model Training on AWS
  • 20. Model Training Alluxio on AWS - Reference Architecture Model Serving Inference cluster Models Training Data Models 1 2 3 4 5 Alluxio Training cluster Training Data 2 20 Alluxio
  • 21. Alluxio on AWS Provides: 21 ● Automatically load / unload / update data from your existing data lake ● Faster access to training data informed by data access patterns ● Maintain optimal data access with high data throughput to keep the GPU fully utilized ● Deploy models faster and provides high concurrency model serving to inference nodes ● Increase the productivity of the data engineering team by eliminating the need to manage data copies ● Reduce cloud storage API and egress costs, such as the cost of S3 GET requests, data transfer costs, etc.
  • 22. Model Training Alluxio on AWS - Reference Architecture Model Serving Inference cluster Models Training Data Models 1 2 3 4 5 Alluxio Training cluster Training Data 2 22 Alluxio GCP
  • 24. Alluxio Demo Environment 24 Local Folder / Dataset GPU Training Storage Kubernetes Interactive Notebook Alluxio Operator Visualization Dashboard Alluxio
  • 25. Alluxio Demo 
 25 Alluxio AWS Model Training Demo - Recording Alluxio FUSE vs AWS S3 FUSE Demo - Recording Alluxio APIs Demo - Recording
  • 26. 26 Training Directly from Storage - > 80% of total time is spent in DataLoader - Result in Low GPU Utilization Rate (<20%) Visualization Dashboard Results (w/o Alluxio)
  • 27. 27 Visualization Dashboard Results (with Alluxio) Training with Alluxio - Reduced DataLoader Rate from 82% to 1% (82X) - Increase GPU Utilization Rate from 17% to 93% (5X)