9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Qualcomm research-imagenet2015
1. 1
NeoNet: Object centric training
for image recognition
Daniel Fontijne, Koen E. A. van de Sande, Eren Gölge,
R. Blythe Towal, Anthony Sarah, Cees G. M. Snoek
Qualcomm Technologies, Inc., December 17, 2015
Presented by:
Daniel Fontijne
Senior Staff Engineer
4. 4
The base network for all our submissions is the inceptionnetwork as
introduced in the batch normalization paper by Ioffe & Szegedy.
Foundation: Batch-normalized inception
Ioffe & Szegedy ICML 2015
5. 5
Network in an inception module
Note: the 5x5 path is not used.
Lin et al. ICLR 2014
12. 12
Random crop selection might miss the object of interest.
Network tries to remember ‘butterfly’ when presented with leaves.
Solution: use provided boxes to assure crop contains the object.
− For images without box annotation, use best box predicted by localization system.
Object preserving crops
X
16. 16
Foundations.
− Generate box proposals using fast selective search.
− Train box-classification networks on crops.
Object centric training.
− Object pre-training network.
− Object localization network.
− Object alignment network.
Localization overview
Girshik et al. PAMI 2016
Uijlings et al. IJCV 2013
17. 17
Use the bounding box annotations for pre-training.
Increase the number of classes from N to 2*N+1:
− N classes for the object, well-framed.
− N classes for partially framed objects.
− 1 class for ‘background’, i.e., object not visible.
1% – 1.5% improvement compared to standard pre-training.
Object centric pre-training
18. 18
Dual-head network to account for missing bounding boxes.
− One with 1000 outputs.
− One with 2001 outputs. No error gradient when box annotation is missing.
Object centric pre-training
19. 19
Fully connected layer on top of Inception 4e and 5b.
Re-train Inception 5b and new head.
Then fine-tune entire network.
Object localization network
21. 21
A 40% border worked best.
− Such that in 7x7 resolution of Inception 5b there is a 1 pixel border.
Bordering the object
22. 22
Extra head for object box alignment.
Classification head is also used, but with cross entropy cost.
Object alignment network
23. 23
Object box alignment moves corners up to 50% of the width and height.
100% border allows network to ‘see’ full range of possible alignments.
~2% gain.
Object alignment border
24. 24
Component breakdown
Top-5 localization error
First attempt 24.0%
40% border, FC on top of inception 5b 22.5%
FC on top of inception 5b+4e 21.8%
Object centric pre-training 20.3%
Ensemble of 8 17.5%
Object alignment 15.5%
Final result with ILSVRC blacklist applied 14.5%
27. 27
Improved selective search
Fast Improved
Color spaces 2 3
Segmentations 2 4
Similarity functions 2 4
Average boxes 1,600 5,000
MABO 77.5 82.6
Time (s) 0.8 2.4
mAP 41.2 44.0
28. 28
Five inception-style networks for feature extraction
− Two trained on 1,000 object classes, no input border, fine-tuning on detection boxes
− Three trained on 1,000 object windows with input border, no fine tuning
Object detection network
29. 29
Component breakdown
mAP on validation set
Best object class network 44.6
Best object centric network 47.7
Ensemble of 5 51.9
30. 30
Component breakdown
mAP on validation set
Best object class network 44.6
Best object centric network 47.7
Ensemble of 5 51.9
+ context 53.2
Four classification networks
fine tuned with
200 detection class labels
31. 31
mAP on validation set
Best object class network 44.6
Best object centric network 47.7
Ensemble of 5 51.9
+ context 53.2
+ object alignment 54.6
Component breakdown
34. 34
Our best submission: an ensemble of two inception nets.
− Reduce fully connected layer from 1,000 to 401 outputs.
− Use pre-trained weights from ImageNet 1,000 (~325 epochs).
− Train Inception 5b and fully connected layer for two epochs.
− Fine-tune entire network for eight epochs.
Adding other networks reduced the accuracy
Places 2 overview
35. 35
Component breakdown (top-5 error)
Single view Multi view
~325 epochs pre-training 17.9% 16.8%
First attempt. 112 epochs pre-training. 19.1% 17.9%
512 channel 5b, Alex-style FC head 20.0% 18.4%
32 images / batch 18.7% 17.6%
Randomized RELU 18.2% 17.5%
Ensemble of 7 - 16.7%
Ensemble of 2 - 16.5%
36. 36
Final places 2 results
20
19.4
19.3
18.0
17.6
17.4
16.9
15 16 17 18 19 20 21
HiVision
MERL
ntu_rose
Trimps-Soushen
NeoNet
SIAT_MMLAB
WM
Top-5 classification error on test set
NeoNet is competitive on scene classification