SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Hacking GPUs for Deep Learning
MLConf New York
Jeff Johnson
Facebook AI Research
jhj@fb.com
Deep (convolutional) Neural Networks
Revolution in machine learning
Convolution: since 1980s. Deep: flops since 2000s
Avoid feature engineering
▪ With enough data, let network discover feature
representations
▪ Can work even for NLP. No word segmentation, use
raw character data.
2D Convolutional Nets (images)
LeCun, Bottou,
Bengio and Haffner,
1998
Krizhevsky,
Sutskever and
Hinton, 2012
2D Convolutional Nets
Progress towards smaller kernels and deeper nets
Network architecture ImageNet 1000 class top-5
error
AlexNet ~15%
OverFeat ~13%
ZeilerNet ~11%
Oxford-VGG ~7%
GoogLeNet ~6%, ~4.5%
PReLU (MSR) ~4.9%
Human performance 3-5%
3D Convolutional Nets (videos)
C3D (Tran et al., 2014)
DeepVideo (Karpathy et al., 2014)
1D Convolutional Nets (text, sequences)
Collobert et al., 2011
Zhang and LeCun, 2015
RNNs and LSTMs (text, sequences)
Graves, Mohamed and Hinton, 2013
Mikolov, 2014
Deep Neural Networks
Supervised learning. Unsupervised ???
Train with back-propagation/SGD variants
Strong scaling is unsolved
▪ Distributed parameter space exploration (e.g.,
Hogwild!; Niu et al. 2011)
▪ Distributed hyperparameter space exploration
(e.g., Bayesian optimization; Snoek et al. 2012)
Characteristics
Deep nets are flop eaters
Convolutions are expensive
Pointwise calcuations (log/exp, ReLU, */+, ...)
Neighborhood reductions (pooling, convolution)
Scaling network parameters
 increased learning capacity; overfitting
 more training data (real or synthetic),
regularization required
Deep nets are bandwidth eaters
More parameters = more memory, data to exchange
Barrier to cross-machine parallelism
▪ periodic exchanges, compression, quantization
Increase reuse of memory while local?
▪ interspersed reductions are resistant to fusion of
computations
▪ generalized programming language problem
Deep nets are latency sensitive
Serial dependency of training
fprop => bprop => fprop => ...
Serial dependency of multi-layer networks
layer 1 => layer 2 => layer 3 => ...
Multiple path dependent networks (RNNs, multi-layer
LSTMs)
Deep nets are also small?
Deeper = smaller feature planes, more of them
input Rm => expand to Rn => non-lin => reduce to
Rk
Problems are tiny in HPC terms
4096×4096 FFT, FE/PDE on massive grids, ...
NLP tasks can be sparse
Setup/kernel launch latency on GPU can dominate
compute
The tools
Vector processors
SIMD: Single Instruction,
Multiple Data
Serial processor with ability to
operate on more than one
piece of data concurrently
Cray-1 (1976)
Vector processors
Hard to use: instructions only operate on 4, 8, 16, ...
pieces of data at a time. Boundary/alignment
effects. Great if your vectors are large, but...
float* a = ...; // is this aligned (a % 16 == 0)?
float* b = ...; // is this aligned (b % 16 == 0)?
for (i = 0; i < 18; ++i) { // how to handle [16, 17]?
b[i] += a[i]; // SIMD this?!? masking/loop epilogue
}
“Vector cores”?
SIMD variant: NVIDIA calls
“SIMT”
Lots of simple cores (CM)
Hide latency through many
threads + switching (Tera)
“Pixel/vertex shaders” in 2000s
GPUs => GPGPU
CM-1 (1983)
Tera MTA (1995)
GPU versus CPU
GPUs represent a different form of vector
programming (“vector cores”)
▪ 32-wide vector of threads (“warp”)
Sufficiently optimized CPU code can be on par with
GPU perf (Tflop range with AVX2/512, exploit multi-
level caches, deep pipelines, prefetch, ...)
Vector programming: easier with GPUs than CPUs
Sweetspot is different from GPU codes
Parallelization + vectorization
Serial nature of commonly used CPU programming
languages sometimes hides opportunities
Auto-vectorizing/parallelizing compilers + DSLs can’t
yet compete with expert hand-rolled
▪ DSLs like Halide (Ragan-Kelley et al. 2013) show
promise but need a few more generations
Sprinkle in (OpenMP) doesn’t cut it
Who wins
CPU GPU
flops ✔
(vectorize: AVX2/512 gives
Tflop range)
✔
Tesla K40: 2880 fp32 ALU
pipelines
main memory b/w ✖
(Xeon Phi improves)
✔
latency ✔
(high clock, reordering;
caches are large and work if
you obey them)
✖
(threads slow, non-smem
caches irrelevant, CPU ->
GPU control overhead)
boundary effects,
small/irregular sizes
✔✖
(branches easy, vectorization
hard)
✖
(warp divergence, load
imbalance)
parallel programming model ✖
(vectorization hard, perf
black box)
✔✖
(CUDA is very different,
domain knowledge)
Tool + problem = solution?
Dive into 2D Convolutional Nets
Somewhat computationally expensive
O(b × f × f’ × n2 × k2)
1st layer AlexNet:
▪ 13.493 Gflop (1 flop here = fp32 multiply-add)
▪ 77.2 Mbyte in, 63.7 Mbyte out (fp32)
▪ Perfect caching + reuse, 175 flop/byte in
▪ No caching + reuse, 0.125 flop/byte in
The problem
Programmable caches (shared memory, registers, ...)
not large enough for perfect reuse
Space of all possible square 2D convolution
problems is 5/6-dimensional
Parameter Size
minibatch size (b) 128
input feature maps (f) 3
output feature maps (f’) 96
input feature size (n x n) 224
convolution kernel size (k x k) 11
convolution kernel stride (SxS) (optional) 4
Converting
Space of all possible matrix multiplications = 3
dimensional (ANxMBMxP = CNxP)
NVIDIA, Intel, others have put lots of effort into
optimizing many parts of this space
▪ Rephrase convolution as a matrix multiplication!
▪ NVIDIA’s cuDNN
But:
Sgemm originally optimized for large problems
13x13 * 3x3 is a small convolution. Unrolling it 192
times it might be enough to feed GPU
Large convolutions are intractable?
Small feature maps/
convolutions = boundary
effects bad for GPUs
Facebook AI Research work
2D convolution via FFT
Fast convolutional nets with fbfft: A GPU Performance
Evaluation (Vasilache, Johnson et al., 2015 ICLR
conference track oral)
Convolution => pointwise × in Fourier basis
Choice of basis is wide open! 2i is great perf
O(b f f’ n2 k2) => O(b f f’ n2 + (b f + f f’ + bf’) n2 log n)
▪ >= 5x5 kernels, faster than cuDNN
fbfft
cuFFT optimized for large FFT sizes
fbfft: smaller data, fit in registers, focus on warp
Data layout
Different problem sizes => different data layout
▪ cudaconv: DHWB (optimal for large b)
▪ deeper layers: HWBD/BHWD (many feature maps)
▪ b=1 faster convergence?
▪ b=128 better compute utilization
Smaller problems, exploit different layout/batching
▪ fbcunn 1D convolution
Latency hiding: what holds you back?
▪ Compute bound? (math)
▪ Memory b/w bound? (streaming)
▪ Memory latency bound? (sparse)
Almost all “deep learning” algorithms are b/w bound
on GPU. Low math intensity!
cuBLAS: Sgemm b/w bound. Dgemm compute
bound
Kernel fusion: CPU vs GPU
Reduces memory b/w pressure
Exploits cache locality and register reuse
CPU: fusion not necessary
Kernel tiling + interleaving works due to caches
GPU: fusion necessary
Tiling + interleaving doesn’t work: smem not
persistent, caches too small/irrelevant
Kernel fusion
CUDA kernel = hard optimization boundary on GPU
Loop interchange, lifting, better fusion on CPU
CUDA: parallelization layer not visible to optimizer.
Auto-tuning desired. HW specific non-linear tradeoffs
Scripting languages are further barrier to fusion on
both CPU and GPU (Torch)
Kernel fusion
Torch: transposition is common operation
▪ size (80, 40) stride (40, 1) => size (40, 80) stride
(1, 40)
▪ Old approach: transpose in memory, perform work,
copy back
▪ New approach: rewrite kernel to handle
transpositions. Optimize if non-transposed
Runtime fusion (CUDA 7.0, Theano)
Exploiting parallelism
end

Weitere ähnliche Inhalte

Was ist angesagt?

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)RCCSRENKEI
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Grigory Sapunov
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Ganesan Narayanasamy
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
FPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAFPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAHiroki Nakahara
 
Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceBen Ball
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...Kazuki Fujikawa
 
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...Hiroki Nakahara
 

Was ist angesagt? (20)

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
SPAA11
SPAA11SPAA11
SPAA11
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)第11回 配信講義 計算科学技術特論A(2021)
第11回 配信講義 計算科学技術特論A(2021)
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Naist2015 dec ver1
Naist2015 dec ver1Naist2015 dec ver1
Naist2015 dec ver1
 
LSTM
LSTMLSTM
LSTM
 
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...Fine grained asynchronism for pseudo-spectral codes - with application to tur...
Fine grained asynchronism for pseudo-spectral codes - with application to tur...
 
Rnn & Lstm
Rnn & LstmRnn & Lstm
Rnn & Lstm
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
FPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGAFPL15 talk: Deep Convolutional Neural Network on FPGA
FPL15 talk: Deep Convolutional Neural Network on FPGA
 
Deep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for FinanceDeep Learning in Python with Tensorflow for Finance
Deep Learning in Python with Tensorflow for Finance
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
 

Andere mochten auch

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Universitat Politècnica de Catalunya
 
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...MLconf
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016MLconf
 
Quoc le, slides MLconf 11/15/13
Quoc le, slides  MLconf 11/15/13Quoc le, slides  MLconf 11/15/13
Quoc le, slides MLconf 11/15/13MLconf
 
Visualizing inference
Visualizing inferenceVisualizing inference
Visualizing inferenceAlex Morrise
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsMathias Niepert
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationYunchao He
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksMachine Learning Prague
 
The basics and design of lua table
The basics and design of lua tableThe basics and design of lua table
The basics and design of lua tableShuai Yuan
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Keunwoo Choi
 
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016MLconf
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewKeunwoo Choi
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection PipelineAbhinav Dadhich
 
101: Convolutional Neural Networks
101: Convolutional Neural Networks 101: Convolutional Neural Networks
101: Convolutional Neural Networks Mad Scientists
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksMark Chang
 
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Vincenzo Lomonaco
 

Andere mochten auch (20)

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
 
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
Soumith Chintala, Artificial Intelligence Research Engineer, Facebook at MLco...
 
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
Hussein Mehanna, Engineering Director, ML Core - Facebook at MLconf ATL 2016
 
Quoc le, slides MLconf 11/15/13
Quoc le, slides  MLconf 11/15/13Quoc le, slides  MLconf 11/15/13
Quoc le, slides MLconf 11/15/13
 
Visualizing inference
Visualizing inferenceVisualizing inference
Visualizing inference
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Convolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classificationConvolutional neural networks for sentiment classification
Convolutional neural networks for sentiment classification
 
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani WithanawasamScene classification using Convolutional Neural Networks - Jayani Withanawasam
Scene classification using Convolutional Neural Networks - Jayani Withanawasam
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural Networks
 
The basics and design of lua table
The basics and design of lua tableThe basics and design of lua table
The basics and design of lua table
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
 
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
Teresa Larsen, Founder & Director, ScientificLiteracy.org at MLconf ATL 2016
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
101: Convolutional Neural Networks
101: Convolutional Neural Networks 101: Convolutional Neural Networks
101: Convolutional Neural Networks
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
 
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...Deep Learning for Computer Vision: A comparision between Convolutional Neural...
Deep Learning for Computer Vision: A comparision between Convolutional Neural...
 

Ähnlich wie Jeff Johnson, Research Engineer, Facebook at MLconf NYC

Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningRenaldas Zioma
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Community
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a FoeHaim Yadid
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learningAmer Ather
 
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Red Hat Developers
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Slide_N
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiectureHaris456
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFShapeBlue
 

Ähnlich wie Jeff Johnson, Research Engineer, Facebook at MLconf NYC (20)

Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT Ceph Day Netherlands - Ceph @ BIT
Ceph Day Netherlands - Ceph @ BIT
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Ndp Slides
Ndp SlidesNdp Slides
Ndp Slides
 
Inferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on SparkInferno Scalable Deep Learning on Spark
Inferno Scalable Deep Learning on Spark
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
Shenandoah GC: Java Without The Garbage Collection Hiccups (Christine Flood)
 
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
Multiple Cores, Multiple Pipes, Multiple Threads – Do we have more Parallelis...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Graphics processing uni computer archiecture
Graphics processing uni computer archiectureGraphics processing uni computer archiecture
Graphics processing uni computer archiecture
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Xen in Linux (aka PVOPS update)
Xen in Linux (aka PVOPS update)Xen in Linux (aka PVOPS update)
Xen in Linux (aka PVOPS update)
 

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Jeff Johnson, Research Engineer, Facebook at MLconf NYC

  • 1. Hacking GPUs for Deep Learning MLConf New York Jeff Johnson Facebook AI Research jhj@fb.com
  • 2. Deep (convolutional) Neural Networks Revolution in machine learning Convolution: since 1980s. Deep: flops since 2000s Avoid feature engineering ▪ With enough data, let network discover feature representations ▪ Can work even for NLP. No word segmentation, use raw character data.
  • 3. 2D Convolutional Nets (images) LeCun, Bottou, Bengio and Haffner, 1998 Krizhevsky, Sutskever and Hinton, 2012
  • 4. 2D Convolutional Nets Progress towards smaller kernels and deeper nets Network architecture ImageNet 1000 class top-5 error AlexNet ~15% OverFeat ~13% ZeilerNet ~11% Oxford-VGG ~7% GoogLeNet ~6%, ~4.5% PReLU (MSR) ~4.9% Human performance 3-5%
  • 5. 3D Convolutional Nets (videos) C3D (Tran et al., 2014) DeepVideo (Karpathy et al., 2014)
  • 6. 1D Convolutional Nets (text, sequences) Collobert et al., 2011 Zhang and LeCun, 2015
  • 7. RNNs and LSTMs (text, sequences) Graves, Mohamed and Hinton, 2013 Mikolov, 2014
  • 8. Deep Neural Networks Supervised learning. Unsupervised ??? Train with back-propagation/SGD variants Strong scaling is unsolved ▪ Distributed parameter space exploration (e.g., Hogwild!; Niu et al. 2011) ▪ Distributed hyperparameter space exploration (e.g., Bayesian optimization; Snoek et al. 2012)
  • 10. Deep nets are flop eaters Convolutions are expensive Pointwise calcuations (log/exp, ReLU, */+, ...) Neighborhood reductions (pooling, convolution) Scaling network parameters  increased learning capacity; overfitting  more training data (real or synthetic), regularization required
  • 11. Deep nets are bandwidth eaters More parameters = more memory, data to exchange Barrier to cross-machine parallelism ▪ periodic exchanges, compression, quantization Increase reuse of memory while local? ▪ interspersed reductions are resistant to fusion of computations ▪ generalized programming language problem
  • 12. Deep nets are latency sensitive Serial dependency of training fprop => bprop => fprop => ... Serial dependency of multi-layer networks layer 1 => layer 2 => layer 3 => ... Multiple path dependent networks (RNNs, multi-layer LSTMs)
  • 13. Deep nets are also small? Deeper = smaller feature planes, more of them input Rm => expand to Rn => non-lin => reduce to Rk Problems are tiny in HPC terms 4096×4096 FFT, FE/PDE on massive grids, ... NLP tasks can be sparse Setup/kernel launch latency on GPU can dominate compute
  • 15. Vector processors SIMD: Single Instruction, Multiple Data Serial processor with ability to operate on more than one piece of data concurrently Cray-1 (1976)
  • 16. Vector processors Hard to use: instructions only operate on 4, 8, 16, ... pieces of data at a time. Boundary/alignment effects. Great if your vectors are large, but... float* a = ...; // is this aligned (a % 16 == 0)? float* b = ...; // is this aligned (b % 16 == 0)? for (i = 0; i < 18; ++i) { // how to handle [16, 17]? b[i] += a[i]; // SIMD this?!? masking/loop epilogue }
  • 17. “Vector cores”? SIMD variant: NVIDIA calls “SIMT” Lots of simple cores (CM) Hide latency through many threads + switching (Tera) “Pixel/vertex shaders” in 2000s GPUs => GPGPU CM-1 (1983) Tera MTA (1995)
  • 18. GPU versus CPU GPUs represent a different form of vector programming (“vector cores”) ▪ 32-wide vector of threads (“warp”) Sufficiently optimized CPU code can be on par with GPU perf (Tflop range with AVX2/512, exploit multi- level caches, deep pipelines, prefetch, ...) Vector programming: easier with GPUs than CPUs Sweetspot is different from GPU codes
  • 19. Parallelization + vectorization Serial nature of commonly used CPU programming languages sometimes hides opportunities Auto-vectorizing/parallelizing compilers + DSLs can’t yet compete with expert hand-rolled ▪ DSLs like Halide (Ragan-Kelley et al. 2013) show promise but need a few more generations Sprinkle in (OpenMP) doesn’t cut it
  • 20. Who wins CPU GPU flops ✔ (vectorize: AVX2/512 gives Tflop range) ✔ Tesla K40: 2880 fp32 ALU pipelines main memory b/w ✖ (Xeon Phi improves) ✔ latency ✔ (high clock, reordering; caches are large and work if you obey them) ✖ (threads slow, non-smem caches irrelevant, CPU -> GPU control overhead) boundary effects, small/irregular sizes ✔✖ (branches easy, vectorization hard) ✖ (warp divergence, load imbalance) parallel programming model ✖ (vectorization hard, perf black box) ✔✖ (CUDA is very different, domain knowledge)
  • 21. Tool + problem = solution?
  • 22. Dive into 2D Convolutional Nets Somewhat computationally expensive O(b × f × f’ × n2 × k2) 1st layer AlexNet: ▪ 13.493 Gflop (1 flop here = fp32 multiply-add) ▪ 77.2 Mbyte in, 63.7 Mbyte out (fp32) ▪ Perfect caching + reuse, 175 flop/byte in ▪ No caching + reuse, 0.125 flop/byte in
  • 23. The problem Programmable caches (shared memory, registers, ...) not large enough for perfect reuse Space of all possible square 2D convolution problems is 5/6-dimensional Parameter Size minibatch size (b) 128 input feature maps (f) 3 output feature maps (f’) 96 input feature size (n x n) 224 convolution kernel size (k x k) 11 convolution kernel stride (SxS) (optional) 4
  • 24. Converting Space of all possible matrix multiplications = 3 dimensional (ANxMBMxP = CNxP) NVIDIA, Intel, others have put lots of effort into optimizing many parts of this space ▪ Rephrase convolution as a matrix multiplication! ▪ NVIDIA’s cuDNN
  • 25. But: Sgemm originally optimized for large problems 13x13 * 3x3 is a small convolution. Unrolling it 192 times it might be enough to feed GPU Large convolutions are intractable? Small feature maps/ convolutions = boundary effects bad for GPUs
  • 26. Facebook AI Research work 2D convolution via FFT Fast convolutional nets with fbfft: A GPU Performance Evaluation (Vasilache, Johnson et al., 2015 ICLR conference track oral) Convolution => pointwise × in Fourier basis Choice of basis is wide open! 2i is great perf O(b f f’ n2 k2) => O(b f f’ n2 + (b f + f f’ + bf’) n2 log n) ▪ >= 5x5 kernels, faster than cuDNN
  • 27. fbfft cuFFT optimized for large FFT sizes fbfft: smaller data, fit in registers, focus on warp
  • 28. Data layout Different problem sizes => different data layout ▪ cudaconv: DHWB (optimal for large b) ▪ deeper layers: HWBD/BHWD (many feature maps) ▪ b=1 faster convergence? ▪ b=128 better compute utilization Smaller problems, exploit different layout/batching ▪ fbcunn 1D convolution
  • 29. Latency hiding: what holds you back? ▪ Compute bound? (math) ▪ Memory b/w bound? (streaming) ▪ Memory latency bound? (sparse) Almost all “deep learning” algorithms are b/w bound on GPU. Low math intensity! cuBLAS: Sgemm b/w bound. Dgemm compute bound
  • 30. Kernel fusion: CPU vs GPU Reduces memory b/w pressure Exploits cache locality and register reuse CPU: fusion not necessary Kernel tiling + interleaving works due to caches GPU: fusion necessary Tiling + interleaving doesn’t work: smem not persistent, caches too small/irrelevant
  • 31. Kernel fusion CUDA kernel = hard optimization boundary on GPU Loop interchange, lifting, better fusion on CPU CUDA: parallelization layer not visible to optimizer. Auto-tuning desired. HW specific non-linear tradeoffs Scripting languages are further barrier to fusion on both CPU and GPU (Torch)
  • 32. Kernel fusion Torch: transposition is common operation ▪ size (80, 40) stride (40, 1) => size (40, 80) stride (1, 40) ▪ Old approach: transpose in memory, perform work, copy back ▪ New approach: rewrite kernel to handle transpositions. Optimize if non-transposed Runtime fusion (CUDA 7.0, Theano)
  • 34. end

Hinweis der Redaktion

  1. Let’s go over the components of a shard.
  2. Let’s go over the components of a shard.
  3. GPUs excel at stream processing; CPUs excel at sparsity, branching, pointer chasing (strong word)
  4. Let’s go over the components of a shard.
  5. Let’s go over the components of a shard.