Digest of Human Detection from CVPR2015

Digest of Human Detection
from CVPR 2015
Jan. 27th, 2016, Daichi SUZUO

Digest of Human Detection from CVPR2015
Features
1. Combination Features and Models for Human Detection - Y. Jiang et al.
2. Filtered Channel Features for Pedestrian Detection - S. Zhang et al.
Training
3. Learning Scene-Specific Pedestrian Detectors without Real Data - H.Hattori et al.
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
5. Pedestrian Detection aided by Deep Learning Semantic Tasks - Y. Tian et al.
Dataset / Benchmark
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.

Fundamentals of Human Detection
• Machine learning based bi-class classifier
• Sliding window search
Negative class
Positive class Convert to
image feature
Training Classifier
Classifier
Crop Feature
extraction
Human?
Not human?

Image features
1. Combination Features and Models for Human Detection
- Y. Jiang et al.
2. Filtered Channel Features for Pedestrian Detection
- S. Zhang et al.

θ
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Popular HOG feature[Dalal05]
Input image Edge-image
Edge
extraction
(“cell”)
pixel-wise
gradient
power
Histogram

θ
• Popular HOG feature[Dalal05]: 1st order feature
power
Input image 1st derivative
Differentiate
Histogram
(“cell”)
pixel-wise
gradient
idea: How about extending to 0-th/2nd order?

• 2nd order: HOB – “bar” shape
• Same as HOG, just using 2nd derivative
• 0th order: HOC – color feature
• Using HSI color space; H as θ, S as power
ignore I
convert to HSI
R
G
V

• Combine them into one vector: HOG-III feature

• Train different classifiers from the same HOG-IIIs
• Detect individually, and fuse into one result
Input
image
HOG-III
features
Detection by
Grammar model[Girshick11]
Detection by
Poselet model[Bourdev10]
Fusion
Final
result
(This is one of the key process of the method
Please refer the original paper for more details)

Effect of HOG-III
Effect of Fusion
Feature AP
HOG 45.8%
HOC+HOG+HOB 50.1%
HOG-III 51.3%
Classifier AP
Single use of Grammer 45.8%
Single use of Poselet 47.0%
Fusion 52.3% Combining HOG-III and Fusion
performs best

2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Viola-Jones method)
…
…
…
Input image
Learn decision-tree
by AdaBoost
Extract “Haar-like”
Features (scalar)
※Sum of difference between
white and black region

(Integral Channel Features)
…
…
…
Input image
Learn decision-tree
by AdaBoost
“channel”
Extract sum
of rectangle
※Unlike
Haar-like
Transform

(Filtered Channel Features)
…
…
…
Learn decision-tree
by AdaBoost
“channel” Apply various
filters
(convolution)
…
*
*
Pick-up
pixel value
as a feature
…

Using 50 filters
performs bestAchieved the highest accuracy

Training
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
4. Taking a Deeper Look at Pedestrians
- J. Hosang et al.
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.

• Train detector by CG-based training datasets
Real background
(static image)
annotate
CG-based human
composite
Simulated scene

• Not only scene-specific, but also location-specific!
…
Classifier
Classifier
…
Grid with overwrap
(102~105 patches)
Training images
(~103 pos, ~103 neg
for each patch)
Joint
Classifier
Ensemble
Training
Scene-specific
Location-specific
detectors

Patch size # detectors Avg. Precision
8x8 371 .802
16x16 102 .798
32x32 30 .764
Effect of location-specific detection
Example of the detection result
Comparison

“convnet still underperforms state-of-the-arts”
…Really?
Enhance know-how of convnet based detector
• Small network (CifarNet) / Big network (AlexNet)
• Window size
• How to collect training images
• Fine-tuning
• Number and Type of layers
• …

Convnet with the best configuration outperforms!
Interesting points:
• Ratio of pos/neg does not affect
to the accuracy so much
• Data-augumentation is effective
• Network size should be chosen
by the amount of training samples
• ...

Binary-classification is sometimes insufficient…
Human
Not human
(Hard negatives)
It is necessary to use semantic information jointly

Classify pedestrian and Recognize semantic at once!

Also recognizes current scene semantics
• Pedestrian attribute (e.g. wearing backpack)
• Background attribute (e.g. road, sky, …)

Difficult to collect various (annotated) negs from one dataset…
Transfer from other annotated datasets by TA-CNN
(Please refer the original and related papers for more details about TA-CNN…)

Comparison with CNN-based methods
Example of detection results

Benchmark / Dataset

• Dataset of visible-light and thermal image
Contributions:
• Color and thermal images
• Both test/training data
• Temporally-corresponded tag
• Large enough
• …

• Human detection is still challenging
• Deep learning does not necessarily solve
every problems at this moment 
• There are several knowledge that might be helpful
for your research/hobby/…
Takeaways

References / Supplemental materials

1. Filtered channel features for pedestrian detection
4. Taking a Deeper Look at Pedestrians
• Author's website: http://rodrigob.github.io/
3. Learning Scene-Specific Pedestrian Detectors without Real Data
• Project: http://vishnu.boddeti.net/projects/detection-by-synthesis.html
• YouTube: https://youtu.be/2Jf7faozHUs
5. Pedestrian Detection aided by Deep Learning Semantic Tasks
• Project: http://mmlab.ie.cuhk.edu.hk/projects/TA-CNN/
6. Multispectral Pedestrian Detection: Benchmark Dataset and Baseline
• Lab: http://rcv.kaist.ac.kr/v2/
And all the papers of CVPR2015 are available at cv-foundation.org
See also

Digest of Human Detection from CVPR2015

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Digest of Human Detection from CVPR2015

Ähnlich wie Digest of Human Detection from CVPR2015 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Digest of Human Detection from CVPR2015

Hinweis der Redaktion