【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

•

2 gefällt mir•2,385 views

arXiv1509.07627 http://arxiv.org/abs/1509.07627 In this paper, we evaluate convolutional neural network (CNN) features using the AlexNet architecture developed by [9] and very deep convolutional network (VGGNet) architecture developed by [16]. To date, most CNN researchers have employed the last layers before output, which were extracted from the fully connected feature layers. However, since it is unlikely that feature representation effectiveness is dependent on the problem, this study evaluates additional convolutional layers that are adjacent to fully connected layers, in addition to executing simple tuning for feature concatenation (e.g., layer 3 + layer 5 + layer7) and transformation, using tools such as principal component analysis. In our experiments, we carried out detection and classification tasks using the Caltech 101 and Daimler Pedestrian Benchmark Datasets.

Wissenschaft

Feature Evaluation of Deep Convolutional Neural
Networks for Object Recognition and Detection
Hirokatsu KATAOKA, Kenji Iwata, Yutaka SATOH
National Institute of Advanced Industrial Science and Technology (AIST)
http://www.hirokatsukataoka.net/
arXiv preprint arXiv:1509.07627
http://arxiv.org/abs/1509.07627

Feature Evaluation
•  Significant task in computer vision
–  Based on the DeCAF [Donahue+, ICML2014], we evaluate several CNN
features + SVM classifier
–  The representative architecture: AlexNet [Krizhevsky+, NIPS2012] &
VGGNet[Simonyan+, ICLR2015]
–  Basic Idea1: Which layer has better feature in CNN architecture?
–  Basic Idea2: Mid- & High-level CNN features should be concatenated!
(e.g. Layer 3 + Layer 5 + Layer 7)

CNN Architecture & Feature Extraction
•  AlexNet & VGGNet
–  AlexNet: 8-layer architecture
–  VGGNet: 16-layer arhitecture (each pooling layer and last 2 FC layers are
applied as feature vector)
Input

Conv

Conv

Pool

Conv

Pool

FC

FC

So.max

Input

Conv

Conv

Pool

FC

FC

AlexNet

VGGNet

Conv

Conv

Pool

Conv

Conv

Pool

Conv

Conv

Pool

Conv

Conv

Pool

FC

So.max

Input

Conv

Pool

FC

So.max

:
Image
input

:
Convolu:onal
layer

:
Max-‐pooling
layer

:
Fully-‐connected
layer

:
So.max
layer

Layer1

Layer2

Layer3

Layer4

Layer5

Layer6

Layer7

Layer1

Layer2

Layer3

Layer4

Layer5

Layer6

Layer7

Experiment
•  Settings
–  Layer: 3 – 7 (middle and deeper layers)
•  Conv., pooling and fully-connected layers
–  Concatenation and transformation
•  Layer 345, 456, 567, 357
•  Principal component analysis (PCA): 1500dims
–  Classifier
•  Support vector machine (SVM)
•  The parameters are based on DeCAF [Donahue+, ICML2014]
•  Datasets
–  Daimler pedestrian benchmark dataset (pedestrian detection) [Munder+,
TPAMI2006]
–  Caltech 101 dataset (object classification) [Fei-Fei+, CVPRW2004]

Results on the Daimler dataset
•  Daimler pedestrian benchmark dataset
–  VGGNet Layer 5 (original vector) is the best rate (99.35%)
–  In AlexNet, Layer 3 with PCA is the best rate (98.71%)
Mid-layer is tend to be better rate on the pedestrian detection data

Results on the Caltech 101 dataset
•  Caltech 101 dataset
–  VGGNet Layer 5 (original vector) is the best rate (91.80%)
–  In AlexNet, Layer 5 with PCA is the best rate (78.37%)
The layer before FC layer performs good rate in object classification

Feature Concatenation
•  Three-layer connection with PCA
–  Layer 345, 456, 567, 357
–  4,500 dimensions (1,500dims at each vector)
–  Left: Daimler
–  Right: Caltech 101
Daimler Caltech 101
VGGNet layer 567 is the significant tuning
Pedestrian detection: mid-level feature
Object classification: high-level feature

Conclusion
•  Feature evaluation with AlexNet & VGGNet
–  VGGNet is better than AlexNet
–  Mid-level feature is good for pedestrian detection, and high-level feature is
good for object classification task
–  Concatenation of VGGNet - 5th Pooling, last 2 FC layers is the best setting on
the Daimler pedestrian benchmark and Caltech 101 dataset
–  PCA is effective transformation for CNN feature

Weitere ähnliche Inhalte

Andere mochten auch

【慶應大学講演】なぜ、博士課程に進学したか？Hirokatsu Kataoka

【論文紹介】Fashion Style in 128 Floats: Joint Ranking and Classification using Wea...Hirokatsu Kataoka

Practical UX Methods - as presented at FOWD 2014Patrick McNeil

10 tips for a better UX surveyCaroline Jarrett

Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2Daiki Shimada

CVPR 2016 まとめ v1cvpaper. challenge

Deep Residual Learning (ILSVRC2015 winner)Hirokatsu Kataoka

TensorFlowによるCNNアーキテクチャ構築Hirokatsu Kataoka

ECCV 2016 速報Hirokatsu Kataoka

CVPR 2016 速報Hirokatsu Kataoka

【チュートリアル】コンピュータビジョンによる動画認識Hirokatsu Kataoka

【ECCV 2016 BNMW】Human Action Recognition without HumanHirokatsu Kataoka

Andere mochten auch (12)

【慶應大学講演】なぜ、博士課程に進学したか？

【論文紹介】Fashion Style in 128 Floats: Joint Ranking and Classification using Wea...

Practical UX Methods - as presented at FOWD 2014

10 tips for a better UX survey

Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2

CVPR 2016 まとめ v1

Deep Residual Learning (ILSVRC2015 winner)

TensorFlowによるCNNアーキテクチャ構築

ECCV 2016 速報

CVPR 2016 速報

【チュートリアル】コンピュータビジョンによる動画認識

【ECCV 2016 BNMW】Human Action Recognition without Human

Mehr von Hirokatsu Kataoka

【チュートリアル】コンピュータビジョンによる動画認識 v2Hirokatsu Kataoka

【SSII2015】人を観る技術の先端的研究Hirokatsu Kataoka

PythonによるCVアルゴリズム実装Hirokatsu Kataoka

CV分野におけるサーベイ方法Hirokatsu Kataoka

【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-Hirokatsu Kataoka

Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity ...Hirokatsu Kataoka

Mehr von Hirokatsu Kataoka (6)

【チュートリアル】コンピュータビジョンによる動画認識 v2

【SSII2015】人を観る技術の先端的研究

PythonによるCVアルゴリズム実装

CV分野におけるサーベイ方法

【チュートリアル】動的な人物・物体認識技術 -Dense Trajectories-

Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity ...

Kürzlich hochgeladen

trihybrid cross , test cross chi squaresusmanzain586

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju

Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde

Let’s Say Someone Did Drop the Bomb. Then What?LUMINATIVE MEDIA/PROJECT COUNSEL MEDIA GROUP

CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9

Volatile Oils Pharmacognosy And Phytochemistry -INandakishor Bhaurao Deshmukh

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems

《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29

Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa

Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48

OECD bibliometric indicators: Selected highlights, April 2024innovationoecd

The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar

basic entomology with insect anatomy and taxonomyDrAnita Sharma

(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54

Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad

Radiation physics in Dental Radiology...navyadasi1992

Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju

Observational constraints on mergers creating magnetism in massive starsSérgio Sacani

Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju

Kürzlich hochgeladen (20)

trihybrid cross , test cross chi squares

Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf

Microteaching on terms used in filtration .Pharmaceutical Engineering

Let’s Say Someone Did Drop the Bomb. Then What?

CHROMATOGRAPHY PALLAVI RAWAT.pptx

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...

Volatile Oils Pharmacognosy And Phytochemistry -I

User Guide: Pulsar™ Weather Station (Columbia Weather Systems)

《Queensland毕业文凭-昆士兰大学毕业证成绩单》

Bioteknologi kelas 10 kumer smapsa .pptx

Vision and reflection on Mining Software Repositories research in 2024

OECD bibliometric indicators: Selected highlights, April 2024

The dark energy paradox leads to a new structure of spacetime.pptx

basic entomology with insect anatomy and taxonomy

(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)

Environmental Biotechnology Topic:- Microbial Biosensor

Radiation physics in Dental Radiology...

Pests of castor_Binomics_Identification_Dr.UPR.pdf

Observational constraints on mergers creating magnetism in massive stars

Pests of safflower_Binomics_Identification_Dr.UPR.pdf

【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

1. Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection Hirokatsu KATAOKA, Kenji Iwata, Yutaka SATOH National Institute of Advanced Industrial Science and Technology (AIST) http://www.hirokatsukataoka.net/ arXiv preprint arXiv:1509.07627 http://arxiv.org/abs/1509.07627

2. Feature Evaluation •  Significant task in computer vision –  Based on the DeCAF [Donahue+, ICML2014], we evaluate several CNN features + SVM classifier –  The representative architecture: AlexNet [Krizhevsky+, NIPS2012] & VGGNet[Simonyan+, ICLR2015] –  Basic Idea1: Which layer has better feature in CNN architecture? –  Basic Idea2: Mid- & High-level CNN features should be concatenated! (e.g. Layer 3 + Layer 5 + Layer 7)

3. CNN Architecture & Feature Extraction •  AlexNet & VGGNet –  AlexNet: 8-layer architecture –  VGGNet: 16-layer arhitecture (each pooling layer and last 2 FC layers are applied as feature vector) Input Conv Conv Pool Conv Pool FC FC So.max Input Conv Conv Pool FC FC AlexNet VGGNet Conv Conv Pool Conv Conv Pool Conv Conv Pool Conv Conv Pool FC So.max Input Conv Pool FC So.max : Image input : Convolu:onal layer : Max-‐pooling layer : Fully-‐connected layer : So.max layer Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7 Layer1 Layer2 Layer3 Layer4 Layer5 Layer6 Layer7

4. Experiment •  Settings –  Layer: 3 – 7 (middle and deeper layers) •  Conv., pooling and fully-connected layers –  Concatenation and transformation •  Layer 345, 456, 567, 357 •  Principal component analysis (PCA): 1500dims –  Classifier •  Support vector machine (SVM) •  The parameters are based on DeCAF [Donahue+, ICML2014] •  Datasets –  Daimler pedestrian benchmark dataset (pedestrian detection) [Munder+, TPAMI2006] –  Caltech 101 dataset (object classification) [Fei-Fei+, CVPRW2004]

5. Results on the Daimler dataset •  Daimler pedestrian benchmark dataset –  VGGNet Layer 5 (original vector) is the best rate (99.35%) –  In AlexNet, Layer 3 with PCA is the best rate (98.71%) Mid-layer is tend to be better rate on the pedestrian detection data

6. Results on the Caltech 101 dataset •  Caltech 101 dataset –  VGGNet Layer 5 (original vector) is the best rate (91.80%) –  In AlexNet, Layer 5 with PCA is the best rate (78.37%) The layer before FC layer performs good rate in object classification

7. Feature Concatenation •  Three-layer connection with PCA –  Layer 345, 456, 567, 357 –  4,500 dimensions (1,500dims at each vector) –  Left: Daimler –  Right: Caltech 101 Daimler Caltech 101 VGGNet layer 567 is the significant tuning Pedestrian detection: mid-level feature Object classification: high-level feature

8. Conclusion •  Feature evaluation with AlexNet & VGGNet –  VGGNet is better than AlexNet –  Mid-level feature is good for pedestrian detection, and high-level feature is good for object classification task –  Concatenation of VGGNet - 5th Pooling, last 2 FC layers is the best setting on the Daimler pedestrian benchmark and Caltech 101 dataset –  PCA is effective transformation for CNN feature

【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (12)

Mehr von Hirokatsu Kataoka

Mehr von Hirokatsu Kataoka (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection