複習及補充機器學習與深度學習,說明物件偵測要解決的問題。
探討策略1: One-Shot Solution,舉 YOLO 為例及其 Hands-On 操作,並探討其他相關演算法與其發展;其次探討策略2: Divide-and-Conquer,以 Faster RCNN 為例與利用 Tensorflow Object Detection API 進行練習,探討其他相關演算法與其發展。
最後探討增進訓練結果與演算法發展,並介紹機器學習的推論與應用與應用機器學習導入產業。
We first reviewed the Machine Learning basis, introduced what object detection is, and then described what the problems it is going to solve. (both the localization and the category issues)
Second, we introduced two types of algorithms that represent two different ideas. One is a One-Shot solution and the other is a divide-and-conquer way. The representative algorithm for the one-shot solution is "YOLO" and the other one is "Faster R-CNN". We also implemented the whole YOLO training and inference processes from scratch via Tensorflow 2.0. On the other hand, we introduced how to use Tensorflow Object Detection APIs to implement the whole Faster R-CNN training and inference processes.
Third, we quickly introduced the evolution of several famous object detection algorithms and how to improve training performance and results.
In the final, we introduced the gap between the AI industrial in research and in practice.
21. YOLO 開源碼導讀 Jupyter NB / Colab,僅列重要函式
21
Hyperparameters - 調整多種超參數,包含學習率、訓練次數、批次大小等。
Data Preprocessing VOC2012_DataGenerator 批次資料產生器,包含圖像擴增前處理、圖像標籤轉換等。
Model
(Training/
Evaluation)
Model Building 建立深度神經網路模型,包含 yolo 與 tiny-yolo。
Loss Function 損失函數實作,定義如何更新神經網路中參數。
Metrics Function 用來評估訓練過程的客觀指標函數。
Pretrained Model 可透過轉移學習的概念,加速學習過程。
Training 開始進行模型訓練,並透過許多 callback 函式微調訓練過程。
Saving 將訓練至一階段、透過指標或結束等時間點或達到指標的模型儲存。
Inference
(Evaluation)
Non-Maximum Suppression 將多個高度重疊(IOU)的預測物件進行整合的函式。
New Metric: mAP 完整評估訓練結果並與其他演算法比較的客觀指標函數。
Detection 推論範例,應用此物件偵測模型的範例。
27. 開始訓練模型 學習率、過度學習、交叉驗證學習
27
let C be N (N-fold cross validation)
let b be the batch size of each training unit
let T be the iterator (iterator = all training images / b)
let parameters be hyperparamters for training a model
set images to all images in array
set labels to all labels in array
marco training be a function calculating loss value and update weights
marco metrics be are functions calculating inspection values
marco update_parameters be a function updating parameters
marco saved_model be a function saving model while condition meets
marco testing_metrics be a function used for evaluate the testing dataset
marco preprocess_images be a function used for processing images and transforming as array
marco preprocess_labels be a function used for processing labels
sub data_generator(b): array, array is float
return preprocess_images(images, b), preprocess_labels(labels, b)
sub fit_generator(images, labels, parameters, metrics): loss, metrics is float
set loss, metrics_value to training(images, labels, parameters, metrics)
return loss, metrics_value
for c (1 to C) do
do for t (1 to T) do
set images, labels to data_generator(b)
set loss, metrics_value to fit_generator(images, labels, parameters, metrics)
set parameters to update_parameters(parameters, loss)
saved_model(loss, metrics)
end for
end for
set testing_loss, testing_metrics to test_metrics(images, labels, “testing”)
圖像前處理,包含正規化、調整
大小等;標籤前處理或轉換等。
視條件來調整超參數,包含學習
率、回復參數等。
視條件儲存模型權重。(損失度
最小可能不是最佳模型)
訓練過程可能會根據超參數的
變化而重複多次。
透過測試資料評估模型。
28. 評估訓練後模型指標 導入 mAP,先談談 AP (Average Precision)
28
1
2
3
AP 就是 Precision-Recall 線底下的面積 (AUC)
- Tommy Huang
(2019) Medium
38. SSD 介紹與中場休息 Why not take a break?
● SSD 與 YOLO 為 One-Shot Solution 的核心代表。
● SSD 透過特徵圖分層萃取方式,提供一個更有效取出不同層級與細微的物件偵
測方式。
● SSD 雖可有效萃取更完整的物件資訊,但會產生相當多的冗餘預測框。
● DSSD、RSSD、FSSD 透過補償細微資訊、整合更多處理後的資料與強化卷積骨
架訊號方式,有效提升 SSD 在準確度與精準度上的表現。
38
39. Faster R-CNN 架構 Divide-and-Conquer,延伸或協助分支優勢
39
輸入圖
Conv.
Layer
特徵圖
Proposals
ROIPooling Layer
FC FC
FCs
Softmax
BBox
Regressor
for each ROI
Regional
Proposal
Network
ROI Feature
Vector
61. 推論與應用場景 硬體是一大的挑戰,混合架構是目前的主流
61
Tensorflow
Extended
Distributed / Scalable
APIs
Data
Pipeline
Trainer
Serving Pusher
Framework
APIs
(MXNet)
Customized
Service
APIs
Tensorflo
w
Lite
Optimizer
Quantization
Model
Arch-
based
(OpenVINO,
NCNN)
Model
High-Performace Frames
Operators
64. 資料參考來源
1. Joseph Redmon, et al. (2015) You Only Look Once: Unified, Real-Time Object Detection. arXiv.
2. Visual Object Classes Challenge 2012 (VOC2012), http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
3. Ross Girshick (2015) Fast R-CNN. arXiv.
4. Shaoqing Ren, Kaiming He, et al. (2016) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
Networks. arXiv.
5. RoIPooling、RoIAlign笔记 (2018) https://www.cnblogs.com/wangyong/p/8523814.html
6. Tensorflow Object Detection API (2019) https://github.com/tensorflow/models/tree/master/research/object_detection
7. Use Siri on all your Apple devices (2019) https://support.apple.com/en-us/HT204389
8. Android 7 Nougat – Google Assistant (2019) https://mcmw.abilitynet.org.uk/android-7-nougat-google-assistant
9. Wei Liu, Dragomir Anguelov, et al. (2016) SSD: Single Shot MultiBox Detector. arXiv.
10. Cheng-Yang Fu, et al. (2017) DSSD : Deconvolutional Single Shot Detector. arXiv.
11. Joseph Redmon et al. (2016) YOLO9000: Better, Faster, Stronger. arXiv.
12. Joseph Redmon et al. (2018) YOLOv3: An Incremental Improvement. arXiv.
13. Jisoo Jeong et al. (2017) Enhancement of SSD by concatenating feature maps for object detection. arXiv.
14. Zuo-Xin Li et al. (2018) FSSD: Feature Fusion Single Shot Multibox Detector. arXiv.
15. Kaiming He et al. (2018) Mask R-CNN. arXiv.
16. Jifeng Dai et al. (2016) R-FCN: Object Detection via Region-based Fully Convolutional Networks. arXiv.
64