SIGMOD SRC 2021 Talk

•

0 gefällt mir•4 views

KabirNagrecha

Won 1st place at ACM SIGMOD's research competition.

Software

Hydra: Efficient Training for Larger-than-Memory Deep Learning Models
Kabir Nagrecha, Arun Kumar
GPU
GPU
GPU

Deep Learning and Scale…A Natural Pairing
GPT-3: 175B parameters
BERT-Large: 345M parameters
Megatron-LM: 1 Trillion Parameters

But how do we train them?
No GPU in the world can train a
trillion-parameter model…
But perhaps 1000 GPUs together?
Model Parallelism – Combining
GPUs into a “super-device”

Model Parallelism VS Task Parallelism
Model
Parallelism
Task/Data
Parallelism
Larger
Models
Faster
Training
AN IMPOSSIBLE
CHOICE!

Scheduling Shard Training: Sharded-LRTF
A
B B C A
C
Device 1
Device 2
A B
B
C A
C
Schedule 1
Schedule 2 Device 1
Device 2

But is this enough?
Latency
Throughput
Hydra
Model Parallelism
Ideal

Shard Prefetching – Double Buffering for ML
GPU
Model A Model B

Evaluations
Hydra produces near-optimal speedups!
82% Average GPU Utilization >7.4X Speedups with 8
Devices
Pretraining of 1B parameter BERT architectures
12 models

What’s Next?
• Mixed Task-Data parallelism – a new tradeoff space?
• Multi-query Optimization – shared layers?
• Multi-node training for large-scale operations
GPU
GPU
GPU

Empfohlen

Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com

S6211 - CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUsIBM

Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Databricks

Large-Scale Training with GPUs at FacebookFaisal Siddiqi

Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...Databricks

Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling

Spark Summit EU talk by Rolf JagermanSpark Summit

Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks

Empfohlen

Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com

S6211 - CuMF: Large-Scale Matrix Factorization on Just One Machine with GPUsIBM

Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for De...Databricks

Large-Scale Training with GPUs at FacebookFaisal Siddiqi

Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...Databricks

Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling

Spark Summit EU talk by Rolf JagermanSpark Summit

Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks

Fine tuning large LMsSylvainGugger

Generalized Linear Models with H2O Sri Ambati

DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0Sahil Kaw

Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks

Spark Summit EU talk by Josef HabdankSpark Summit

Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank

Performance myths in androidJavier Gamarra

BlazingSQL + RAPIDS AI at GTC San Jose 2019Rodrigo Aramburu

Asynchronous Hyperparameter Optimization with Apache SparkDatabricks

Toronto meetup 20190917Bill Liu

Deep learning with FPGAAyush Singh, MS

Dl2 computing gpuArmando Vieira

Google TPUHao(Robin) Dong

Open power ddl and lmsGanesan Narayanasamy

BlazingSQL & Graphistry - Netflow DemoRodrigo Aramburu

Saturn - UCSD CNS Research ReviewKabirNagrecha

Saturn: Joint Optimization for Large-Model Deep LearningKabirNagrecha

RAPIDS OverviewNVIDIA Japan

Distributed deep learning optimizations for Financegeetachauhan

Next generation analytics with yarn, spark and graph labImpetus Technologies

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba

Weitere ähnliche Inhalte

Ähnlich wie SIGMOD SRC 2021 Talk

Fine tuning large LMsSylvainGugger

Generalized Linear Models with H2O Sri Ambati

DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0Sahil Kaw

Resource-Efficient Deep Learning Model Selection on Apache SparkDatabricks

Spark Summit EU talk by Josef HabdankSpark Summit

Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank

Performance myths in androidJavier Gamarra

BlazingSQL + RAPIDS AI at GTC San Jose 2019Rodrigo Aramburu

Asynchronous Hyperparameter Optimization with Apache SparkDatabricks

Toronto meetup 20190917Bill Liu

Deep learning with FPGAAyush Singh, MS

Dl2 computing gpuArmando Vieira

Google TPUHao(Robin) Dong

Open power ddl and lmsGanesan Narayanasamy

BlazingSQL & Graphistry - Netflow DemoRodrigo Aramburu

Saturn - UCSD CNS Research ReviewKabirNagrecha

Saturn: Joint Optimization for Large-Model Deep LearningKabirNagrecha

RAPIDS OverviewNVIDIA Japan

Distributed deep learning optimizations for Financegeetachauhan

Next generation analytics with yarn, spark and graph labImpetus Technologies

Ähnlich wie SIGMOD SRC 2021 Talk (20)

Fine tuning large LMs

Generalized Linear Models with H2O

DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0

Resource-Efficient Deep Learning Model Selection on Apache Spark

Spark Summit EU talk by Josef Habdank

Prediction as a service with ensemble model in SparkML and Python ScikitLearn

Performance myths in android

BlazingSQL + RAPIDS AI at GTC San Jose 2019

Asynchronous Hyperparameter Optimization with Apache Spark

Toronto meetup 20190917

Deep learning with FPGA

Dl2 computing gpu

Google TPU

Open power ddl and lms

BlazingSQL & Graphistry - Netflow Demo

Saturn - UCSD CNS Research Review

Saturn: Joint Optimization for Large-Model Deep Learning

RAPIDS Overview

Distributed deep learning optimizations for Finance

Next generation analytics with yarn, spark and graph lab

Kürzlich hochgeladen

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba

AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

Direct Style Effect Systems -The Print[A] Example- A Comprehension AidPhilip Schwarz

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171

Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda

10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8

Kürzlich hochgeladen (20)

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques

A Secure and Reliable Document Management System is Essential.docx

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...

10 Trends Likely to Shape Enterprise Technology in 2024

Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf

SIGMOD SRC 2021 Talk

1. Hydra: Efficient Training for Larger-than-Memory Deep Learning Models Kabir Nagrecha, Arun Kumar GPU GPU GPU

2. Deep Learning and Scale…A Natural Pairing GPT-3: 175B parameters BERT-Large: 345M parameters Megatron-LM: 1 Trillion Parameters

3. But how do we train them? No GPU in the world can train a trillion-parameter model… But perhaps 1000 GPUs together? Model Parallelism – Combining GPUs into a “super-device”

4. Model Parallelism - Overview

5. Model Parallelism…isn’t Parallel

6. “Processors” or Repositories? DRAM GPU

7. Model Parallelism VS Task Parallelism Model Parallelism Task/Data Parallelism Larger Models Faster Training AN IMPOSSIBLE CHOICE!

8. Shard Alternator Parallelism (SHARP)

9. Scheduling Shard Training: Sharded-LRTF A B B C A C Device 1 Device 2 A B B C A C Schedule 1 Schedule 2 Device 1 Device 2

10. But is this enough? Latency Throughput Hydra Model Parallelism Ideal

11. Shard Prefetching – Double Buffering for ML GPU Model A Model B

12. Evaluations Hydra produces near-optimal speedups! 82% Average GPU Utilization >7.4X Speedups with 8 Devices Pretraining of 1B parameter BERT architectures 12 models

13. What’s Next? • Mixed Task-Data parallelism – a new tradeoff space? • Multi-query Optimization – shared layers? • Multi-node training for large-scale operations GPU GPU GPU