Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

TensorFlow Study Part I

278 Aufrufe

Veröffentlicht am

This slide is going to introduce the concept of TensorFlow based on the source code study, including tensor, operation, computation graph and execution.

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

TensorFlow Study Part I

  1. 1. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE TensorFlow Study (Part I) 劉得彥 Danny Liu 資訊與通訊研究所 ICL
  2. 2. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Tensor 2 • A tensor is an n-dimensional structure ▪ n=0 : Single Value ▪ n=1 : List of Values ▪ n=2 : Matrix of values ▪ n=3 : Cube of values ▪ … • The Tensor: ▪ Data is in TensorBuffer as an Eigen::Tensor ▪ Shape definition is in TensorShape ▪ Reference counting is in RefCounted https://www.slideshare.net/EdurekaIN/introduction-to-tensorflow-deep-learning-using-tensorflow- tensorflow-tutorial-edureka
  3. 3. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Operation 3 • Here is customized Sigmoid Op example: ▪ Input Tensors: x = tf.constant([[1.0, 0.0], [0.0, -1.0]]) ▪ Output Tensors: y = cpp_con_sigmoid.cpp_con_sigmoid(x)
  4. 4. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Operation 4 • Register OP definition to kernel builder Class CppConSigmoid
  5. 5. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Expressing: Ops 5 http://public.kevinrobinsonblog.com/docs/A%20tour%20through%20the%20Te nsorFlow%20codebase%20-%20v4.pdf
  6. 6. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Expressing: Ops 6
  7. 7. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Build a Computation Graph 7 • Tensors flows between operations https://machinelearningblogs.com/2017/09/07/tensorflow-tutorial-part-1-introduction/ Tensor Operator Computation Graph
  8. 8. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE TensorFlow Framework 8 Python APIs C++ APIs swig C APIs (tensorflow/c/c_api.h) Layers Estimator Keras Canned Estimator TensorFlow Core libraries (C++) core, runtime, graph, grappler, ops, kernels … Application using C, Golang… C++ Application Python Application Python Application Limited functions : Do inference ( Ongoing )
  9. 9. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE DirectSession Build a Computation Graph 9 • TF Graph is the computation graph in session Python APIs C++ APIs tf.MatMul(a, b) swig C APIs MatMul(scope, a, b) GraphDef TF Graph Protobuf text Convert to GraphDef with Graph::ToGraphDef() Serialization / Deserialization for distribution training 1464 def _create_c_op(graph, node_def, inputs, control_inputs): Protobuf text Convert to Graph (node and edge) with ConvertGraphDefToGraph()
  10. 10. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Computation Graph in details 10 • tf.get_default_graph() can generate the ProtoBuf message of graph (GraphDef) • Distributed Session use this to prune the graph and send to another device via grpc. node { name: "mse" op: "Mean" input: "Square" input: "Const" attr { key: "T" value { type: DT_FLOAT } } attr { key: "Tidx" value { type: DT_INT32 } } attr { key: "keep_dims" value { b: false } } } node { name: "Const" op: "Const" attr { key: "dtype" value { type: DT_INT32 } } attr { key: "value" value { tensor { dtype: DT_INT32 tensor_shape { dim { size: 2 } } tensor_content: "000000000000001 000000000" } } } } node { name: "Square" op: "Square" input: "sub" attr { key: "T" value { type: DT_FLOAT } } } node { name: "sub" op: "Sub" input: "predictions" input: "y" attr { key: "T" value { type: DT_FLOAT } } }
  11. 11. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Computation Graph in details 11 • We can use graph editor to manipulate our graph, ▪ for instance: swap out our targeted output tenor and in to a gradient op. • from tensorflow.contrib import graph_editor as ge ▪ ge.add_control_inputs() ▪ ge.connect() ▪ ge.sgv() ▪ ge. remap_inputs()
  12. 12. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Re-use models 12 Protobuf: pb ( binary ) or pb_txt Protobuf binary: pb It contains weights data Define model/network in Python Train model/network in Python Save model: *.ckpt freeze_graph Load model/network in C++ Inferencing Transer learning Model.data, model.index, and model.meta Another way to use model in C++ original way:
  13. 13. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Compile C++ TensorFlow app/ops 13 • Here is an example to use CMake instead of Bazel BUILD… is more convenient and the binary file is much smaller.
  14. 14. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Gradient Calculation 14 • TensorFlow uses reverse-mode autodiff ▪ It computes all the partial derivatives of the outputs with regards to all the inputs in just n outputs + 1 graph traversals.
  15. 15. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Where are the Gradient Ops? 15 • The gradient op registration could be in Python or C++ Python C++
  16. 16. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Computation Graph Execution 16 • a simple example that illustrates graph construction and execution using the C++ API https://www.tensorflow.org/api_guides/cc/guide How to know what happened? 1. export TF_CPP_MIN_VLOG_LEVEL=2 or 2. import os os.environ['TF_CPP_MIN_LOG_LEVEL']='0'
  17. 17. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Log information 17 • The environment is 1 GPU and 32 CPU cores ▪ Decide Session Factory type ( dicrect session ) a. Inter op parallelism threads: 32 ▪ Build Executor a. Find and add visiable GPU devices b. Create TF devices mapping to physical GPU device c. Build 4 kinds of streams: » CUDA stream » Host_to_Device stream » Device_to_Host stream » Device_to_Device stream ▪ PoolAllocator for ProcessState CPU allocator ▪ BFCAllocator a. Create Bins … ▪ Grappler ( computation graph optimization ) a. Do something … ▪ Op Kernel a. Instantiating kernel for node b. Processing / Computing node » Allocate and deallocate tensors with allocators ( cpu or gpu: cuda_host_bfc ) ▪ PollEvents (sington) StreamGroupFactory
  18. 18. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Log: Op kernel processing 18 • 2018-02-23 11:19:04.563051: I tensorflow/core/common_runtime/executor.cc:1561] Process node: 96 step 2 output/output/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc1/fc1/Relu, output/kernel/read) is dead: 0 • 2018-02-23 11:19:04.563053: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1 • 2018-02-23 11:19:04.563059: I tensorflow/core/platform/default/device_tracer.cc:307] PushAnnotation output/output/MatMul:MatMul • 2018-02-23 11:19:04.563065: I tensorflow/core/common_runtime/gpu/gpu_device.cc:445] GpuDevice::Compute output/output/MatMul op MatMul on GPU0 stream[0] • 2018-02-23 11:19:04.563090: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorAllocation { step_id: 2 kernel_name: "output/output/MatMul" tensor { dtype: DT_FLOAT shape { dim { size: 1024 } dim { size: 10 } } allocation_description { requested_bytes: 40960 allocated_bytes: 68608 allocator_name: "GPU_0_bfc" allocation_id: 85 has_single_reference: true ptr: 1108332340992 } } } • 2018-02-23 11:19:04.563118: I tensorflow/stream_executor/stream.cc:3521] Called Stream::ThenBlasGemm(transa=NoTranspose, transb=NoTranspose, m=10, n=1024, k=128, alpha=1, a=0x1020dc15300, lda=10, b=0x1020dc62e00, ldb=128, beta=0, c=0x1020dc16700, ldc=10) stream=0x6006d80 • 2018-02-23 11:19:04.563129: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1 • 2018-02-23 11:19:04.563130: I tensorflow/stream_executor/cuda/cuda_blas.cc:1881] doing cuBLAS SGEMM: at=0 bt=0 m=10 n=1024 k=128 alpha=1.000000 a=0x1020dc15300 lda=10 b=0x1020dc62e00 ldb=128 beta=0.000000 c=0x1020dc16700 ldc=10 • 2018-02-23 11:19:04.563156: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:307 func: cuLaunchKernel • 2018-02-23 11:19:04.563164: I tensorflow/core/platform/default/device_tracer.cc:497] LAUNCH stream 0x5fd4490 correllation 1217 kernel sgemm_32x32x32_NN • 2018-02-23 11:19:04.563168: I tensorflow/core/platform/default/device_tracer.cc:471] 1217 : output/output/MatMul:MatMul • 2018-02-23 11:19:04.563198: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:307 func: cuLaunchKernel • 2018-02-23 11:19:04.563210: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 1 • 2018-02-23 11:19:04.563222: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 2 kernel_name: "output/output/MatMul" tensor { dtype: DT_FLOAT shape { dim { size: 1024 } dim { size: 10 } } allocation_description { requested_bytes: 40960 allocated_bytes: 68608 allocator_name: "GPU_0_bfc" allocation_id: 85 has_single_reference: true ptr: 1108332340992 } } } • 2018-02-23 11:19:04.563236: I tensorflow/core/common_runtime/executor.cc:1673] Synchronous kernel done: 96 step 2 output/output/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](fc1/fc1/Relu, output/kernel/read) is dead: 0 • 2018-02-23 11:19:04.563244: I tensorflow/core/common_runtime/step_stats_collector.cc:264] Save dev /job:localhost/replica:0/task:0/device:GPU:0 nt 0x7fa0a7786830
  19. 19. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Log: Tensor Allocation / Deallocation 19 • 2018-02-23 11:19:04.430051: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorOutput { step_id: 2 kernel_name: "pool3/dropout/cond/Merge" tensor { dtype: DT_FLOAT shape { dim { size: 1024 } dim { size: 12544 } } allocation_description { requested_bytes: 51380224 allocated_bytes: 82837504 allocator_name: "GPU_0_bfc" allocation_id: 81 ptr: 1108467515392 } } } • ... ... • 2018-02-23 11:19:04.564922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:445] GpuDevice::Compute train/gradients/train/Mean_grad/Prod/_16 op _Send on GPU0 stream[0]. • .. ... • 2018-02-23 11:19:04.566499: I tensorflow/core/framework/log_memory.cc:35] __LOG_MEMORY__ MemoryLogTensorDeallocation { allocation_id: 81 allocator_name: "GPU_0_bfc" } This tensor is going to be deallocated…
  20. 20. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Log: Event Manager 20 • 2018-02-23 11:19:04.565108: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 1 used_events_ • 22018-02-23 11:19:04.565123: I tensorflow/core/platform/default/device_tracer.cc:483] ApiCallback 1:279 func: cuMemcpyDtoHAsync_v • 22018-02-23 11:19:04.565132: I tensorflow/core/platform/default/device_tracer.cc:471] 1286 : edge_69_train/gradients/train/Mean_grad/Prod • 2018-02-23 11:19:04.565138: I tensorflow/stream_executor/cuda/cuda_driver.cc:1215] successfully enqueued async memcpy d2h of 4 bytes from 0x1020f310000 to 0x1020de00a00 on stream 0x7fa18db423b0 • 2018-02-23 11:19:04.565144: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:151] QueueInUse free_events_ 1 used_events_ 2 • 2018-02-23 11:19:04.565152: I tensorflow/stream_executor/stream.cc:302] Called Stream::ThenRecordEvent(event=0x7fa0a778b3f0) stream=0x7fa18db1b7f0 • 2018-02-23 11:19:04.565159: I tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:190] PollEvents free_events_ 0 used_events_ 3
  21. 21. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE IntraProcessRendezvous 21 • 2018-02-23 11:19:04.577400: I tensorflow/core/common_runtime/rendezvous_mgr.cc:42] IntraProcessRendezvous Send 0x7fa18d9bab20 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/d evice:GPU:0;edge_185_Conv2_SwapIn;0:0 • 2018-02-23 11:19:03.141265: I tensorflow/core/common_runtime/rendezvous_mgr.cc:119] IntraProcessRendezvous Recv 0x7fa18d9bab20 /job:localhost/replica:0/task:0/device:CPU:0;0000000000000001;/job:localhost/replica:0/task:0/d evice:GPU:0;edge_185_Conv2_SwapIn;0:0 GPU:0
  22. 22. 工業技術研究院機密資料 禁止複製、轉載、外流 ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE TensorFlow Graph Exection 22 Device Level: memory management Session Level: Global control Executor Level: Run graph async Op Level: Compute forward and gradient. Executors get created for each subgraph Get node ( operation ) from “ready” queue Call into Stream, which contains stream_executor

×