- How to tackle an object detection competition
- Schwert's 6th-place solution on Open Images Challenge 2019
- presented at the lunch workshop of the 26th Symposium on Sensing via Image Information (2020).
Driving Behavioral Change for Information Management through Data-Driven Gree...
Tackling Open Images Challenge (2019)
1. Mobility Technologies Co., Ltd.
Tackling Open Images Challenge
- presented at the 26th Symposium on Sensing via Image
Information
June 12, 2020
Hiroto Honda, Mobility Technologies Co., Ltd.
3. Mobility Technologies Co., Ltd.3
About Me
Hiroto Honda
https://hirotomusiker.github.io/
kaggle name : Schwert
‘Schwert’ = sword in German
R&D of Imaging devices in a Japanese Electronics company
→ DeNA computer vision team →Mobility Technologies
4. Mobility Technologies Co., Ltd.4
Check out my Blog Series!
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
Digging into Detectron 2 (object detection)
6. Mobility Technologies Co., Ltd.
Val Data
6
How to Try Kaggle
Test data
→private leaderboard
→public leaderboard
Train Data
How can you maximize your
model’s score on the HIDDEN
test data?
Evaluation metrics are described in the ‘Evaluation’ section - mean
average precision、Dice Coefficient, and so on. Sometimes non-standard
metrics are employed and discussed in the ‘Discussion’ threads.
Cross Validation and Test data
Val Data
Train Data
Val Data
Train Data
7. Mobility Technologies Co., Ltd.7
Open Images Dataset (v5) :
900 million images collected from Flickr
・16M Bounding box annotations of 600 classes on 1.9M images
・Segmentation polygons on 350-class instances
・329 inter-object relationship
Open Images Challenge
https://storage.googleapis.com/openimages/web/challenge.html
https://www.kaggle.com/c/open-images-2019-object-detection/
8. Mobility Technologies Co., Ltd.8
1GB of bounding box data!! (on 500GB of image data)
How Huge is Open Images Dataset ?
12. Mobility Technologies Co., Ltd.12
What an Object Detector Looks Like
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
13. Mobility Technologies Co., Ltd.13
Backbone Network
Region Proposal
Network
ROI Head
accuracy written in papers is achieved by managing
more than 100 config parameters
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
What an Object Detector Looks Like
14. Mobility Technologies Co., Ltd.14
How It Was Hard to Reproduce YOLOv3 in PyTorch
took months to perfectly reproduce the original repo’s accuracy.
implementation details such as weight init, loss definition, and lr schedule are
critical
https://github.com/DeNA/PyTorch_YOLOv3
blog: https://medium.com/@hirotoschwert/reproducing-training-performance-of-yolov3-in-pytorch-part-0-a792e15ac90d
15. Mobility Technologies Co., Ltd.15
You Should Care Tiny Accuracy Differences
Model Name AP
A: Faster R-CNN Res50 34.8
B: Faster R-CNN Res50 +
Feature Pyramid Network
36.7
C: RetinaNet (single-shot)
Res50 Feature Pyramid
Network + Focal Loss
35.7
NIPS’15
CVPR’17
ICCV’17
model B from a non-official repo with AP=33.0 is less accurate than
the official model A
16. Mobility Technologies Co., Ltd.16
MMDetection (CUHK)
https://github.com/open-mmlab/mmdetection
Detectron 2 (Facebook)
https://github.com/facebookresearch/detectron2
automl/efficientdet (Google)
https://github.com/google/automl/tree/master/efficientdet
tpu/models (Google)
https://github.com/tensorflow/tpu/tree/master/models/official
R. Wightman repos (tf->pytorch, non-official)
https://github.com/rwightman
Popular and Reliable Detection Frameworks
Authors’ official repos are basically recommended
Schwert used
maskrcnn-benchmark for the
competition
17. Mobility Technologies Co., Ltd.
17
takes 1 GPU month to train one model!
How to Choose Approaches for Large-scale Detection Competition
1month
one attempt is so costly...
18. Mobility Technologies Co., Ltd.18
1:Last Year’s solutions
2:Detection papers (CVPR, ICCV…)
3:Benchmark website such as papers with code
are good resources to find:
“An Exclusive Feature that Apparently Contributes to the score” (EFAC)
How to Choose Approaches for Large-scale Detection Competition
19. Mobility Technologies Co., Ltd.19
Looks like ResNet50 works..
OK, let’s try ResNeXt101
...and why not adding Random Cropping_
Example of Bad Experiment
model 1 (baseline)
new
feature
A
new
feature
B
model 2
Important to add / remove one exclusive feature at a time!
21. Mobility Technologies Co., Ltd.21
Schwert’s ranks:
Detection Track: 6th / 558 (Gold) [1] [2]
Segmentation Track: 11th / 193 (Silver) [3]
Relationship Track: 30th / 201 (Silver)
Results of Open Images Competition (2019)
# Team Name # of
members
score
1 MMfruit 5 0.65887
2 imagesearch 7 0.65337
3 Prisms 6 0.64214
4 PFDet 6 0.62221
5 Omni-Detection 3 0.60406
6 Schwert 1 (solo) 0.60231
7 Team 5 5 0.60210
8 pudae 1 (solo) 0.59727
Got a solo gold medal at the first kaggle competition!
22. Mobility Technologies Co., Ltd.22
“An Exclusive Feature that Apparently Contributes to the score” (EFAC)
EFAC examples from the solution writeups of Open Images 2018 [4][5][6]
・class balancing (3rd、5pts↑)
・Ensemble (1st / 3rd、5pts↑)
・voting NMS (1st / 3rd)
・long cosine annealing (2nd)
・parent class expansion
・ResNext 152 + SE (1st, 2nd, 3rd)
class balancing and model ensemble are essential
23. Mobility Technologies Co., Ltd.23
mean Average Precision (mAP) at IoU > 0.5 , avg of 500 classes
1: EVERY class is equal, even if it’s extremely rare.
images including ‘person’ instances:250,000
‘torch’ instances : 18
2: Strict localization is not required.
classification matters...
Evaluation Metrics
24. Mobility Technologies Co., Ltd.24
Method 1:Class Balancing [1]
- Equal probability for a model to encounter a certain class.
- Rare classes: increase sampling rate.
- Non-rare classes: limit number of images.
- Total number of images: 4k x 500 (2M) → efficient training
25. Mobility Technologies Co., Ltd.25
Method 2 : Ensembling Pipeline of Multiple Models [1]
・Baseliene model: ResNeXt152 [7] + Deformable Convnets v2 [8] + Feature
Pyramid Network [9]
・Train different types of models on training data with different seeds
・8 models are ensembled
26. Mobility Technologies Co., Ltd.26
Contribution of each exclusive feature on val and leaderboard accuracies
Ablation Study
Backbone Deformable
Convolutions
Parent
Expansion
Data Size val AP private LB
ResNeXt101 None Inference Time 4k per class 69.8 54.0
ResNeXt101 DCN v2 Inference Time 4k per class 72.2 (+2.4)
ResNeXt152 None Inference Time 4k per class 72.2 (+2.4)
ResNeXt152 None Inference Time 16k per class 72.4 (+2.6)
ResNeXt152 DCN v2 Inference Time 4k per class 73.2 (+3.4) 56.4 (best
single model)
ResNeXt152 None Training Time 4k per class 72.4 (+2.6)*
27. Mobility Technologies Co., Ltd.27
Method 3:Enhanced (Voting) NMS [6]
Non-Maximum Suppression for Model Ensembling
When the multiple boxes from different models are overlapped, the
resulting box earns added confidence scores
28. Mobility Technologies Co., Ltd.28
Result of 8 Model Ensembling
Backbone Deformable
Convolutions
Parent
Expansion
Data Size val AP private LB
ResNeXt152 DCN v2 Inference
Time
4k per class 73.2 (+3.4) 56.4 (best
single
model)
Ensemble of
8 models +
NMS tuned
60.23
~13th
place
6th
place!
33. Mobility Technologies Co., Ltd.33
・Kaggle is a wonderful platform where you can learn cutting-edge computer vision
methods and implementations. Discussion with great kagglers is always fun
・Like research, it’s a tough but fun job to develop (or surpass) the state-of-the-art method
methods
・Choosing a reliable framework is a must for Object Detection competitions
・Understand the past solutions and pick an Exclusive Feature that Apparently Contributes to
the score (EFAC)
Take-Home Messages
34. Mobility Technologies Co., Ltd.34
[1] Hiroto Honda, “The 6th Place Solution for the Open Images 2019 Object Detection Track, ”
presented at ICCVW 2019, https://hirotomusiker.github.io/files/schwert_open_images_6th_solution_v1.pdf
[2] Hiroto Honda, “6th place solution” , discussion in Open Images 2019 Object Detection Track,
https://www.kaggle.com/c/open-images-2019-object-detection/discussion/110953
[3] Hiroto Honda, “11th place solution, discussion in Open Images 2019 Instance Segmentation Track,
https://www.kaggle.com/c/open-images-2019-instance-segmentation/discussion/111351
[4] kivajok, 1st place writeup, https://storage.googleapis.com/openimages/web/challenge.html
[5] Takuya Akiba et al., “PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection
Track”, arXiv:1809.00778
[6] Yuan Gao et al., “Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete
Annotation and Data Imbalance”, arXiv:1810.06208
[7] Saining Xie et al., “Aggregated Residual Transformations for Deep Neural Networks,” CVPR 2017
[8] Xizhou Zhu et al., “Deformable ConvNets v2: More Deformable, Better Results”, CVPR 2019
[9] Tsung-Yi Lin et al., “Feature Pyramid Networks for Object Detection”, CVPR 2017
* All the photos used in this presentation were taken by Hiroto Honda
References