Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
Human Action Recognition Using 3D Joint Information and HOOFD Features
1. Human Action Recognition Using 3D
Joint Information and Pyramidal
HOOFD Features
MSc Thesis by
Barış Can Üstündağ
Thesis Advisor: Prof. Dr. Mustafa Ünel
Buraya görseller eklenecek
2. • Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Human Action Recognition Using 3D Joint Information and HOOFD
Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline
3. • Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Human Action Recognition Using 3D Joint Information and HOOFD
Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline
4. • Motion Perception
– Gunnar Johansson [1971]
• Sequence of images for
Human Motion Analysis
• ‘Moving Light Displays’
enable identification of
people and gender
• Motion Capture [2014]
– Dawn of the Planet of
the Apes
Motivation
5. • Vast amount of Data
YouTube
• More than 34K hours of video
uploaded every day
Surveillance Cameras
• ~30 M cameras in the US
• ~700K video hours every day
Motivation
8. • Video Categorization
– How many human-pixels are there?
Movies TV YouTube
35% 34% 40%
Motivation
9. • Rehabilitation
– 15M people suffer fom
stroke every year
– Automated systems
– Gamification
Motivation - Application
10. • Release of Low-cost Depth Cameras
– Kinect (2010)
– Google Tango (developers only, 2014)
– Leap Motion (2013)
• Effective and robust performance
given
– Complex background
– Challenging viewpoints
– Occlusions
Motivation – Why depth?
Google Tango
Leap Motion
11. Intensity
Based
• Extraction of Cuboids
• Motion History
Images, Motion
Energy Images
Depth
Map
Based
• Depth Motion Maps
• Histogram of
Oriented 4D Normals
Skeletal
Data
Based
• SMIJ – Sequence of
most informative
Joints
• HOJ3D – Histogram
of 3D Joint Locations
Related Work
12. Related Work
• Extraction of Cuboids,
Dollar et al. [CVPR, 2005]
• Motion History Images
Motion Energy Images,
Gorelick et al. [PAMI, 2007]
Intensity
Based
13. Related Work
• Histogram of Oriented
4D Normals (HON4D)
Oreifej et al. [CVPR, 2013]
• Depth Motion Maps,
Yang et al. [JRTIP, 2012]
Depth Map
Based
14. Related Work
• Sequence of Most Informative Joints (SMIJ),
Ofli et al. [CVIU, 2013]
• View Invariant Human
Action Recognition
Using Histogram of
3D Joints,
Xia et al. [CVPR, 2012]
Skeletal Data
Based
15. • Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Human Action Recognition Using 3D Joint Information and HOOFD
Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline
17. • Kinect
– Depth data acquisition is
accomplised by using ‘Light
Coding’ Method
• In order to process the depth data
in any application
– Formation of shadows
– Eliminating the noise
Acquiring Depth
Data
Feature
Extraction
Feature
Representation
Classification
18. • Shadows
– Generated by the foreground objects
• Noise
– Rough object boundaries caused gaps
and holes on depth data
• Bilateral Filter
Space term Range term
Acquiring Depth
Data
Feature
Extraction
Feature
Representation
Classification
19. • Joint Features
– 20 Joints are provided
by Kinect SDK
– 10 Joint Angles and their
derivatives calculated:
T
kk
k
1
20. • Joint Features
– Mapped to spherical
Coordinates
– Origin is aligned to
the hip center
– Radius parameter is
discarded
Acquiring
Depth Data
Feature Extraction
Feature
Representation
Classification
21. • Histogram of Oriented Optical Flows
from Depth (HOOFD)
Acquiring
Depth Data
Feature Extraction
Feature
Representation
Classification
- Optical Flow from Depth Data
•Mapping of depth data to intensity image
•Depth values (z) represented as intensity (I)
•Optical flow field which is invariant to sudden change of brightness
22. - Optical Flow
• 2D displacement of pixel patches on the
image plane
• Brightness Constancy Equation
• Linearizing assuming small (u,v) using Taylor
Series Expansion
• Histogram of Oriented Optical Flows
from Depth (HOOFD)
),(),,( , ttyyxxItyxI
0),,(),(),,(),(),,( tyxIyxvtyxIyxutyxI tyx
t
x
yxu
),(
t
y
yxv
),(
23. • Optical Flow – Lucas Kanade Method
• Apply it within a local patch
• Minimize using Least-Squares method
yx
tyx IvtyxIutyxIvuE
,
2
),,(),,(),(
ty
tx
yyx
yxx
II
II
v
u
III
III
2
2
bA u
bAAA TT 1
u
24. • Optical Flow – Horn Schunk Method
• Assumption: global smoothness in the flow over the whole image
dydxvvuuE
D
yxyxs 2222Smoothness error:
dydxIvIuIE
D
tyxc 2Error in brightness
constancy equation
sc EE Minimize:
25. • Histogram of Oriented Optical Flow from
Depth
• Binning according to:
– Primary Angle between the flow vector and the horizontal axis
– Magnitude of the flow vector
• Orientation & Magnitude images
Histogram Binning example with bin size = 4
)(tan 1
u
v
22
vuM
26. • Signal Warping
– If it is a longer action instance -> Discard frames
– If it is a shorter action instance -> Replicate and
insert frames
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
27. • Pyramidal HOOFD Features
– Histogram of Oriented Optical Flow from Depth
After obtaining optical flows patches
1. Patches are extracted around each joint
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
28. • Pyramidal HOOFD Features
– Histogram of Oriented Optical Flow from Depth
After obtaining optical flows patches
1. Patches are extracted around each joint
2. HOOFDs are calculated in a pyramidal
fashion
Level 2
Level 3
Level 1
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
31. • Supervised learning
methods
– Training examples are
attached to known classes
• Spam filtering on an e-mail
client
– Examples: Naive Bayes,
Support Vector Machines
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
32. • Naive Bayes Classifier
– Independence assumption between features
• For example: a car ‘Volkswagen’ with a red color
and 17 inch wheels and these features contribute
independently to classify that this car is a
‘Volkswagen’
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
33. • Support Vector Machines
– Calculates the choice of the most optimal
hyperplane that defines the decision boundary
between two classes
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
34. • Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Action Recognition Using 3D Joint Information and HOOFD Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline
39. Experiment - 2
Settings
• Feature: HOOFD Features
• Dataset: MSR Action 3D
• Ratio: 50% Training 50% Test
40. Experiment - 2
Settings
• Feature: HOOFD Features
• Dataset: MSR Action 3D
• Ratio: 50% Training 50% Test
41. Experiment - 2
Settings
• Feature: HOOFD Features
• Dataset: MSR Action 3D
• Ratio: 50% Training 50% Test
Smash Action Forward Punch Action
42. Experiment - 3
Settings
• Feature: HOOFD Features
• Dataset: MSR Action Pairs
• Ratio: 50% Training 50% Test
43. Conclusion & Future Work
• We developed a novel human action recognition framework by fusing 3D Joint information and
HOOFD features
• We proposed a new feature called Histogram of Oriented Optical Flow from Depth (HOOFD)
• Several experiments with publicly available datasets were conducted to assess the performance of
the proposed technique.
• Comparison with state-of-the-art algorithms show the success of our algorithm.
• As future work,
– Potential of HOOFD will be fully explored
– Different popular classification approaches will be employed (Bag of Words, Random Forest, Boosted Trees)
Brightness values of individual pixels on a local patch are preserved.
By linearizing the equation around I(x,y,t) using Taylor series expansion we obtained the second equation
Even though we assumed that the equation is equal to 0, practically it is not.
We then discretize the equation and applied it within a local patch and we acquired this cost function
Minimizing this function using least squares gives us the optical flow vectors as a result
However in the literature there is also another method proposed by Horn and Schunk, which introduced a global smoothness constraid over the whole image.
This is a useful method to correct errors that is caused by the gaps and holes on depth data.
Smoothness is introduced by minimizing the velocities, optical flow vectors
HON4D: To make the descriptors more discriminative, they quantized the 4D
space using the vertices of a polychoron
Dictionary Learning – Group Sparsity Geometric Constraint with Temporal Pyramid Matching