Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Digest of Human Detection from CVPR2015
1. Digest of Human Detection
from CVPR 2015
Jan. 27th, 2016, Daichi SUZUO
2. Digest of Human Detection from CVPR2015
Features
1. Combination Features and Models for Human Detection - Y. Jiang et al.
2. Filtered Channel Features for Pedestrian Detection - S. Zhang et al.
Training
3. Learning Scene-Specific Pedestrian Detectors without Real Data - H.Hattori et al.
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
5. Pedestrian Detection aided by Deep Learning Semantic Tasks - Y. Tian et al.
Dataset / Benchmark
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
3. Fundamentals of Human Detection
• Machine learning based bi-class classifier
• Sliding window search
Negative class
Positive class Convert to
image feature
Training Classifier
Classifier
Crop Feature
extraction
Human?
Not human?
4. Image features
1. Combination Features and Models for Human Detection
- Y. Jiang et al.
2. Filtered Channel Features for Pedestrian Detection
- S. Zhang et al.
5. θ
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Popular HOG feature[Dalal05]
Input image Edge-image
Edge
extraction
(“cell”)
pixel-wise
gradient
power
Histogram
6. θ
• Popular HOG feature[Dalal05]: 1st order feature
power
Input image 1st derivative
Differentiate
Histogram
(“cell”)
pixel-wise
gradient
idea: How about extending to 0-th/2nd order?
1. Combination Features and Models
for Human Detection - Y. Jiang et al.
7. 1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• 2nd order: HOB – “bar” shape
• Same as HOG, just using 2nd derivative
• 0th order: HOC – color feature
• Using HSI color space; H as θ, S as power
ignore I
convert to HSI
R
G
V
8. 1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Combine them into one vector: HOG-III feature
9. 1. Combination Features and Models
for Human Detection - Y. Jiang et al.
• Train different classifiers from the same HOG-IIIs
• Detect individually, and fuse into one result
Input
image
HOG-III
features
Detection by
Grammar model[Girshick11]
Detection by
Poselet model[Bourdev10]
Fusion
Final
result
(This is one of the key process of the method
Please refer the original paper for more details)
10. 1. Combination Features and Models
for Human Detection - Y. Jiang et al.
Effect of HOG-III
Effect of Fusion
Feature AP
HOG 45.8%
HOC+HOG+HOB 50.1%
HOG-III 51.3%
Classifier AP
Single use of Grammer 45.8%
Single use of Poselet 47.0%
Fusion 52.3% Combining HOG-III and Fusion
performs best
11. 2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Viola-Jones method)
…
…
…
Input image
Learn decision-tree
by AdaBoost
Extract “Haar-like”
Features (scalar)
※Sum of difference between
white and black region
12. 2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Integral Channel Features)
…
…
…
Input image
Learn decision-tree
by AdaBoost
“channel”
Extract sum
of rectangle
※Unlike
Haar-like
Transform
13. 2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
• Extension of “Integral Channel Features” [Dollár09]
• ChnFtrs: Extension of “Viola-Jones method” [Viola02]
(Filtered Channel Features)
…
…
…
Learn decision-tree
by AdaBoost
“channel” Apply various
filters
(convolution)
…
*
*
Pick-up
pixel value
as a feature
…
14. 2. Filtered Channel Features
for Pedestrian Detection - S. Zhang et al.
Using 50 filters
performs bestAchieved the highest accuracy
15. Training
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
4. Taking a Deeper Look at Pedestrians
- J. Hosang et al.
5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
16. • Train detector by CG-based training datasets
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
Real background
(static image)
annotate
CG-based human
composite
Simulated scene
17. • Not only scene-specific, but also location-specific!
3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
…
Classifier
Classifier
…
Grid with overwrap
(102~105 patches)
Training images
(~103 pos, ~103 neg
for each patch)
Joint
Classifier
Ensemble
Training
Scene-specific
Location-specific
detectors
18. 3. Learning Scene-Specific Pedestrian Detectors
without Real Data - H. Hattori et al.
Patch size # detectors Avg. Precision
8x8 371 .802
16x16 102 .798
32x32 30 .764
Effect of location-specific detection
Example of the detection result
Comparison
19. “convnet still underperforms state-of-the-arts”
…Really?
Enhance know-how of convnet based detector
4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
• Small network (CifarNet) / Big network (AlexNet)
• Window size
• How to collect training images
• Fine-tuning
• Number and Type of layers
• …
20. 4. Taking a Deeper Look at Pedestrians - J. Hosang et al.
Convnet with the best configuration outperforms!
Interesting points:
• Ratio of pos/neg does not affect
to the accuracy so much
• Data-augumentation is effective
• Network size should be chosen
by the amount of training samples
• ...
21. 5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Binary-classification is sometimes insufficient…
Human
Not human
(Hard negatives)
It is necessary to use semantic information jointly
22. 5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
23. 5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
Also recognizes current scene semantics
• Pedestrian attribute (e.g. wearing backpack)
• Background attribute (e.g. road, sky, …)
24. 5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Classify pedestrian and Recognize semantic at once!
Difficult to collect various (annotated) negs from one dataset…
Transfer from other annotated datasets by TA-CNN
(Please refer the original and related papers for more details about TA-CNN…)
25. 5. Pedestrian Detection aided by
Deep Learning Semantic Tasks - Y. Tian et al.
Comparison with CNN-based methods
Example of detection results
26. Benchmark / Dataset
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
27. • Dataset of visible-light and thermal image
6. Multispectral Pedestrian Detection :
Benchmark Dataset and Baseline - S. Hwang et al.
Contributions:
• Color and thermal images
• Both test/training data
• Temporally-corresponded tag
• Large enough
• …
29. • Human detection is still challenging
• Deep learning does not necessarily solve
every problems at this moment
• There are several knowledge that might be helpful
for your research/hobby/…
Takeaways
31. 1. Filtered channel features for pedestrian detection
4. Taking a Deeper Look at Pedestrians
• Author's website: http://rodrigob.github.io/
3. Learning Scene-Specific Pedestrian Detectors without Real Data
• Project: http://vishnu.boddeti.net/projects/detection-by-synthesis.html
• YouTube: https://youtu.be/2Jf7faozHUs
5. Pedestrian Detection aided by Deep Learning Semantic Tasks
• Project: http://mmlab.ie.cuhk.edu.hk/projects/TA-CNN/
6. Multispectral Pedestrian Detection: Benchmark Dataset and Baseline
• Lab: http://rcv.kaist.ac.kr/v2/
And all the papers of CVPR2015 are available at cv-foundation.org
See also
Then, let us go to the main topics.
The first group is about image features, there are 2 papers.
Why are features important?
Good feature suppresses useless change of images, like lightning conditions, while keeping enough information.
This is kind of a trade-off, where the difficulty comes from.
The principle of good detector is;
If the training dataset can be collected from exactly same scene as the detection time,
the detector will surely outperform the general detector trained by common training data.
But data-annotation is really heavy task, although it is necessary to create training dataset.
The basic idea of the paper is generation of training data by combination of actual background and CG-generated people.
This approach is called “generative learning”.
Then, let us go to the main topics.
The first group is about image features, there are 2 papers.
Why are features important?
Good feature suppresses useless change of images, like lightning conditions, while keeping enough information.
This is kind of a trade-off, where the difficulty comes from.
・色やサーマル単独だったらあるけど同時提供は初。などなど
Then, let us go to the main topics.
The first group is about image features, there are 2 papers.
Why are features important?
Good feature suppresses useless change of images, like lightning conditions, while keeping enough information.
This is kind of a trade-off, where the difficulty comes from.
・色やサーマル単独だったらあるけど同時提供は初。などなど
Then, let us go to the main topics.
The first group is about image features, there are 2 papers.
Why are features important?
Good feature suppresses useless change of images, like lightning conditions, while keeping enough information.
This is kind of a trade-off, where the difficulty comes from.