Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Caffe studying 2017

109 Aufrufe

Veröffentlicht am

This slide is based on Caffe to introduce the implementation in details. It contains Blob, Net, Layer, Data Flow, and NCCL.

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Caffe studying 2017

  1. 1. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Caffe Study 資訊與通訊研究所 劉得彥 Danny Liu
  2. 2. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE concept of view in Blob 2
  3. 3. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Blob 3 • There are some implemented functions, such as: ▪ assum_diff(), sumsq_data(), update(), asum_data()… • Has both side of implementation of math functions ▪ CPU: For example: caffe_axpy a. Using CBLAS Library ▪ GPU: for example: caffe_gpu_axpy a. Using cuBLAS Library • Use SyncMemory class to do the data sync between CPU and GPU ▪ Always use {cpu,gpu}_data() or mutable_{cpu,gpu}_data() to get data pointer
  4. 4. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Layer 4 • caffe::Layer is a base class • All the layers as follows all inherent caffe:Layer ▪ Data, Vision, Recurrent, Common, Normalization, Activation, Loss layers, and so on. ▪ http://caffe.berkeleyvision.org/tut orial/layers.html
  5. 5. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Layers have GPU implemented code 5 • src/caffe/layers/ ▪ *_layer.cu ▪ cudnn_*_layer.cu • src/caffe/util/ ▪ math_functions.cu ▪ im2col.cu • include/caffe/util/ ▪ device_alternate.hpp  CUDA macro definition
  6. 6. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Layer 6 • Setup() ▪ Initialize layers • Forward() ▪ Use bottom blob’s data as input to the layer and calculate the output/loss to top blob’s data. • Backward() ▪ Use top blob’s diff as input to the layer and calculate the diff/gradient to the bottom blob’s diff. ▪ For the calculation of diff/gradient, it’s about bottom_diff - top_diff · top_data
  7. 7. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Solver using NCCL 7
  8. 8. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE NCCL::Run() 8 • boost::barrier ▪ it is a synchronization point between multiple threads. • Worker ▪ class Worker : public InternalThread
  9. 9. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Caffe training with NCCL 9
  10. 10. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Broadcast and All-Reduce in Caffe 10 • Worker is an internal thread served a GPU • The picture introduces the broadcast and all-reduce operation
  11. 11. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE The Data Layer 11
  12. 12. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Prototxt: Define Net 12
  13. 13. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Prototxt: Define Net 13
  14. 14. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE LeNet 14 LeNet: a layered model composed of convolution and subsampling operations followed by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ] • LeNet-5 ▪ https://world4jason.gitbooks.io/research- log/content/deepLearning/CNN/Model%20&%20ImgNet/lenet.html
  15. 15. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Lenet.prototxt 15 name: "LeNet" layer { name: "data" type: "Input" top: "data" input_param { shape: { dim: 64 dim: 1 dim: 28 dim: 28 } } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_para m { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_para m { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "prob" type: "Softmax" bottom: "ip2" top: "prob" }
  16. 16. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE 都在GPU 上做 Caffe data flow 16 name: "LogReg" layer { name: "mnist" type: "Data" top: "data" top: "label" data_param { source: "input_leveldb" batch_size: 64 } } layer { name: "ip" type: "InnerProduct" bottom: "data" top: "ip" inner_product_param { num_output: 2 } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip" bottom: "label" top: "loss" } Forward() Forward() Forward() Load_batch() Copy data to GPU 計算 output 計算 output 計算 loss Backward() Backward() 計算 gradient 計算 gradient Step() ApplyUpdate() ForwardBackward() Normalize(param_id); Regularize(param_id); this->net_->Update(); Prefetching data and label cblas_saxpy (data, diff) on_gradients_ready() NCCL: All-Reduce for diff/gradient Learnable params
  17. 17. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE 17 • blocking_queue.cpp:49 Waiting for data 43 template<typename T> 44 T BlockingQueue<T>::pop(const string& log_on_wait) { 45 boost::mutex::scoped_lock lock(sync_->mutex_); 46 47 while (queue_.empty()) { 48 if (!log_on_wait.empty()) { 49 LOG_EVERY_N(INFO, 1000)<< log_on_wait; 50 } 51 sync_->condition_.wait(lock); 52 } 53 54 T t = queue_.front(); 55 queue_.pop(); 56 return t; 57 }
  18. 18. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Batch Data Prefetching 18 先以 InternalThread 將Image放到CPU Memory 再以cudaMemcpyAsync Copy到 GPU Memory

×