3. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Blob
3
• There are some implemented functions,
such as:
▪ assum_diff(), sumsq_data(), update(), asum_data()…
• Has both side of implementation of math
functions
▪ CPU: For example: caffe_axpy
a. Using CBLAS Library
▪ GPU: for example: caffe_gpu_axpy
a. Using cuBLAS Library
• Use SyncMemory class to do the data sync
between CPU and GPU
▪ Always use {cpu,gpu}_data() or mutable_{cpu,gpu}_data()
to get data pointer
4. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Layer
4
• caffe::Layer is a base class
• All the layers as follows all
inherent caffe:Layer
▪ Data, Vision, Recurrent,
Common, Normalization,
Activation, Loss layers, and so
on.
▪ http://caffe.berkeleyvision.org/tut
orial/layers.html
5. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Layers have GPU implemented code
5
• src/caffe/layers/
▪ *_layer.cu
▪ cudnn_*_layer.cu
• src/caffe/util/
▪ math_functions.cu
▪ im2col.cu
• include/caffe/util/
▪ device_alternate.hpp
CUDA macro definition
6. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Layer
6
• Setup()
▪ Initialize layers
• Forward()
▪ Use bottom blob’s data as input to the layer and
calculate the output/loss to top blob’s data.
• Backward()
▪ Use top blob’s diff as input to the layer and calculate
the diff/gradient to the bottom blob’s diff.
▪ For the calculation of diff/gradient, it’s about
bottom_diff - top_diff · top_data
8. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
NCCL::Run()
8
• boost::barrier
▪ it is a synchronization
point between multiple
threads.
• Worker
▪ class Worker : public
InternalThread
10. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
Broadcast and All-Reduce in Caffe
10
• Worker is an internal thread served a GPU
• The picture introduces the broadcast and all-reduce operation
14. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE
LeNet
14
LeNet: a layered model composed of convolution and subsampling operations followed
by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
• LeNet-5
▪ https://world4jason.gitbooks.io/research-
log/content/deepLearning/CNN/Model%20&%20ImgNet/lenet.html