Human Action Recognition Using 3D Joint Information and HOOFD Features

Human Action Recognition Using 3D
Joint Information and Pyramidal
HOOFD Features
MSc Thesis by
Barış Can Üstündağ
Thesis Advisor: Prof. Dr. Mustafa Ünel
Buraya görseller eklenecek

• Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Human Action Recognition Using 3D Joint Information and HOOFD
Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline

• Motion Perception
– Gunnar Johansson [1971]
• Sequence of images for
Human Motion Analysis
• ‘Moving Light Displays’
enable identification of
people and gender
• Motion Capture [2014]
– Dawn of the Planet of
the Apes
Motivation

• Vast amount of Data
YouTube
• More than 34K hours of video
uploaded every day
Surveillance Cameras
• ~30 M cameras in the US
• ~700K video hours every day
Motivation

• Video Categorization
Movies TV YouTube
Motivation

– How many human-pixels are there?
Movies TV YouTube
Motivation

– How many human-pixels are there?
Movies TV YouTube
35% 34% 40%
Motivation

• Rehabilitation
– 15M people suffer fom
stroke every year
– Automated systems
– Gamification
Motivation - Application

• Release of Low-cost Depth Cameras
– Kinect (2010)
– Google Tango (developers only, 2014)
– Leap Motion (2013)
• Effective and robust performance
given
– Complex background
– Challenging viewpoints
– Occlusions
Motivation – Why depth?
Google Tango
Leap Motion

Intensity
Based
• Extraction of Cuboids
• Motion History
Images, Motion
Energy Images
Depth
Map
Based
• Depth Motion Maps
• Histogram of
Oriented 4D Normals
Skeletal
Data
Based
• SMIJ – Sequence of
most informative
Joints
• HOJ3D – Histogram
of 3D Joint Locations
Related Work

Related Work
• Extraction of Cuboids,
Dollar et al. [CVPR, 2005]
• Motion History Images
Motion Energy Images,
Gorelick et al. [PAMI, 2007]
Intensity
Based

Related Work
• Histogram of Oriented
4D Normals (HON4D)
Oreifej et al. [CVPR, 2013]
• Depth Motion Maps,
Yang et al. [JRTIP, 2012]
Depth Map
Based

Related Work
• Sequence of Most Informative Joints (SMIJ),
Ofli et al. [CVIU, 2013]
• View Invariant Human
Action Recognition
Using Histogram of
3D Joints,
Xia et al. [CVPR, 2012]
Skeletal Data
Based

Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification
Human Action Recognition Using 3D Joint Information and HOOFD Features
• Depth
Acquisition
• Formation of
shadows
• Eliminating the
noise
• 3D Joints
• HOOFD
• Signal Warping
• Pyramidal HOOFD
Features
• Naive Bayes
• Support Vector
Machines

• Kinect
– Depth data acquisition is
accomplised by using ‘Light
Coding’ Method
• In order to process the depth data
in any application
– Formation of shadows
– Eliminating the noise
Acquiring Depth
Data
Feature
Extraction
Feature
Representation
Classification

• Shadows
– Generated by the foreground objects
• Noise
– Rough object boundaries caused gaps
and holes on depth data
• Bilateral Filter
Space term Range term
Acquiring Depth
Data
Feature
Extraction
Feature
Representation
Classification

• Joint Features
– 20 Joints are provided
by Kinect SDK
– 10 Joint Angles and their
derivatives calculated:
T
kk
k
1


• Joint Features
– Mapped to spherical
Coordinates
– Origin is aligned to
the hip center
– Radius parameter is
discarded
Acquiring
Depth Data
Feature Extraction
Feature
Representation
Classification

• Histogram of Oriented Optical Flows
from Depth (HOOFD)
Acquiring
Depth Data
Feature Extraction
Feature
Representation
Classification
- Optical Flow from Depth Data
•Mapping of depth data to intensity image
•Depth values (z) represented as intensity (I)
•Optical flow field which is invariant to sudden change of brightness

- Optical Flow
• 2D displacement of pixel patches on the
image plane
• Brightness Constancy Equation
• Linearizing assuming small (u,v) using Taylor
Series Expansion
• Histogram of Oriented Optical Flows
from Depth (HOOFD)
),(),,( , ttyyxxItyxI 
0),,(),(),,(),(),,(  tyxIyxvtyxIyxutyxI tyx
t
x
yxu


),(
t
y
yxv


),(

• Optical Flow – Lucas Kanade Method
• Apply it within a local patch
• Minimize using Least-Squares method
 

yx
tyx IvtyxIutyxIvuE
,
2
),,(),,(),(


























ty
tx
yyx
yxx
II
II
v
u
III
III
2
2
bA u

  bAAA TT 1
u




• Optical Flow – Horn Schunk Method
• Assumption: global smoothness in the flow over the whole image
    dydxvvuuE
D
yxyxs   2222Smoothness error:
  dydxIvIuIE
D
tyxc   2Error in brightness
constancy equation
sc EE Minimize:

• Histogram of Oriented Optical Flow from
Depth
• Binning according to:
– Primary Angle between the flow vector and the horizontal axis
– Magnitude of the flow vector
• Orientation & Magnitude images
Histogram Binning example with bin size = 4
)(tan 1
u
v
 22
vuM 

• Signal Warping
– If it is a longer action instance -> Discard frames
– If it is a shorter action instance -> Replicate and
insert frames
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification

• Pyramidal HOOFD Features
– Histogram of Oriented Optical Flow from Depth
After obtaining optical flows patches
1. Patches are extracted around each joint
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification

• Pyramidal HOOFD Features
– Histogram of Oriented Optical Flow from Depth
After obtaining optical flows patches
1. Patches are extracted around each joint
2. HOOFDs are calculated in a pyramidal
fashion
Level 2
Level 3
Level 1
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification

Level 2
Level 3
Level 1
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification

• Supervised learning
methods
– Training examples are
attached to known classes
• Spam filtering on an e-mail
client
– Examples: Naive Bayes,
Support Vector Machines
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification

• Naive Bayes Classifier
– Independence assumption between features
• For example: a car ‘Volkswagen’ with a red color
and 17 inch wheels and these features contribute
independently to classify that this car is a
‘Volkswagen’
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification

• Support Vector Machines
– Calculates the choice of the most optimal
hyperplane that defines the decision boundary
between two classes
Acquiring
Depth Data
Feature
Extraction
Feature
Representation
Classification

• Introduction to Human Action Recognition
– Motivation, Applications
– Related Work
• Action Recognition Using 3D Joint Information and HOOFD Features
– Acquiring Depth Data
– Feature Extraction
• 3D Joints
• HOOFD
– Feature Representation
– Classification
• Experiments
– Datasets
• MSR Action 3D Dataset
• MSR Action Pairs Dataset
• MSRC-12 Gesture Dataset
• Conclusions & Future Work
Outline

• Datasets
– MSR Action 3D
• 10 Subjects
• 20 Actions
– MSR Pairs 3D
• 10 Subjects
• 12 Actions
– MSRC-12 Gesture
• 30 Subjects
• 12 Actions
Experiments

Experiment - 1
Settings
• Dataset: MSRC-12 Gesture
• Feature: Joint Features
• Ratio:
• Leave-one-subject-out-cross-valuation
• 50% Training 50% Test
• 75% Training 25% Test

Experiment - 2
Settings
• Feature: HOOFD Features
• Dataset: MSR Action 3D
• Ratio: 50% Training 50% Test

Experiment - 2
Settings
• Dataset: MSR Action 3D
Smash Action Forward Punch Action

Experiment - 3
Settings
• Dataset: MSR Action Pairs

Conclusion & Future Work
• We developed a novel human action recognition framework by fusing 3D Joint information and
HOOFD features
• We proposed a new feature called Histogram of Oriented Optical Flow from Depth (HOOFD)
• Several experiments with publicly available datasets were conducted to assess the performance of
the proposed technique.
• Comparison with state-of-the-art algorithms show the success of our algorithm.
• As future work,
– Potential of HOOFD will be fully explored
– Different popular classification approaches will be employed (Bag of Words, Random Forest, Boosted Trees)

Human Action Recognition Using 3D Joint Information and HOOFD Features

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (15)

Ähnlich wie Human Action Recognition Using 3D Joint Information and HOOFD Features

Ähnlich wie Human Action Recognition Using 3D Joint Information and HOOFD Features (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Human Action Recognition Using 3D Joint Information and HOOFD Features

Hinweis der Redaktion