2. Overview:
1. Video
2. Video structure
3. Video processing general block diagram
4. Features used for representation of video frame
5. Description of these features
6. Application: Video Surveillance- Fall Detection
7. Comparison of efficiency of these features
8. Conclusion and Future Work
9. References
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
2
3. Video
•Video is a rich information source including
i. Frames (individual images): typically 1/25 or 1/30
seconds.
ii. Shot (change links between frames- cuts, fades, dissolves,
wipes): sequence of similar frames- elementary video
units, single event.
iii. Clip / Scene: sequence of shots consecutive in time, space,
action. It has its own changes in color, shapes, motion of
both camera and objects acquisition (shot angles, camera
motion)
iv. Episode: consecutive scenes, each type of video has its
own characteristics depending on application
(commercials, news, whether, sports)
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
3
4. Video Structure
Figure 1. video structure
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
4
5. Video processing
General block diagram of video processing includes:
i. Video input
ii. Pre processing
iii. Feature extraction
iv. Event modeling
v. Classification
vi. Output or results
Video
input
Pre
processing
Feature
extraction
Event
model
Classification
Event
model
Output
Or
result
Figure 2. Basic block diagram of video processing
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
5
6. Features used for representation of video frame
Almost all feature extraction algorithms reduce the large dimensionality of
the video domain by extracting a small number of features from one or
more regions of interest in each video frame. Such features include the
following:
1. Luminance/dominant color
2. Luminance/color histogram
3. Image edges
4. Features in transform domain
5. Image motion
i. Motion detection
ii. Area based
iii. differential approach
iv. Optical flow
6. Spatial domain for feature extraction
i. Single pixel
ii. Rectangular block
iii. Arbitrarily shaped region
iv. Whole frame 3/31/2016MKSSS's Cummins College of Engg. for
Women (E&TC Department), Pune
6
7. Features used for representation of video frame (cont.)
7. Low-level
i. Edge detection
ii. Corner detection
iii. Blob detection
iv. Ridge detection
v. Scale-invariant feature transform
8. Curvature
i. Edge direction
ii. changing intensity
iii. autocorrelation
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
7
8. Features used for representation of video frame (cont.)
9. Shape based
i. Thresholding
ii. Blob extraction
iii. Template matching
iv. Hough transform
•Lines
•Circles/ellipses
•Arbitrary shapes (generalized Hough transform)
•Works with any parameterizable feature (class variables,
cluster detection, etc..)
10. Flexible methods
i. Deformable
ii. parameterized shapes
iii. Active contours (snakes)
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
8
9. 1. Luminance/color
• Simplest feature
• Used to characterize an image as its average grayscale
luminance
• Susceptible to changes in illumination
• A more robust choice is to use one or more statistics of the
values in a suitable color space
2. Luminance/grayscale/color histogram
• Richer feature
• Discriminator
• Easy to compute
• Insensitive to translational, rotational, and zooming camera
motions
• Does not represent the spatial distribution of color in an image3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
9
10. 3. Image edges
• Sufficiently invariant to illumination changes and several types
of motion
• It is related to the human visual perception of a scene
• disadvantage is computational cost, noise sensitivity, and when
not post-processed, high dimensionality
4. Features in transform domain
• Transformations lead to representations in lower dimensions
• Such as discrete Fourier transform, discrete cosine transform
and wavelets
• Disadvantages include high computational cost, effects of
blocking while computing the transform domain coefficients,
and loss of information caused by retaining only a few
coefficients 3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
10
11. 5. Motion feature
• Used as a feature for detecting shot transitions
• But it is usually coupled with other features, since motion itself
can be highly discontinuous within a shot (when motion
changes abruptly)
• Not useful when there is no motion in the video
6. Spatial domain for feature extraction
• The size of the region from which individual features are
extracted plays an important role in the overall performance
• Small region tends to reduce detection invariance with respect
to motion
• Large region might lead to missed transitions between similar
shots
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
11
12. 6.i. Single pixel
• Derive a feature for each pixel such as luminance and edge
strength
• Feature vector of very large dimension
• Very sensitive to motion, unless motion compensation is
subsequently performed
6.ii.Rectangular block:
• Segment each frame into equal-sized blocks and extract a set
of features
• Such as average color or orientation, color histogram
• Invariant to small motion of camera and object
• Adequate discriminator for shot boundary detection
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
12
13. 6.iii. Arbitrarily shaped region
• Applied to arbitrarily shaped and sized regions in a frame,
derived by spatial segmentation
• Based on the most homogeneous regions, facilitates a better
detection of temporal discontinuities
• Disadvantage is the high computational complexity and
instability of region segmentation
6.iv. Whole frame
• Extract features (e.g., histograms) from the whole frame
• Advantage of being robust with respect to motion within a
shot
• But tend to have poor performance at detecting the change
between two similar shots
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
13
14. Low-Level Features
•The most common image features used in the literature are: color,
texture, and object shape (spatial layout)
•Its implementation can be easily managed using feature vectors
and a similarity/distance measure
1. Color Feature[3]
• inherent nature of inaccuracy in description of the same
semantic content by different color quantization and /or by the
uncertainty of human perception.
• independent of image size and orientation.
• most straight-forward features utilized by humans for visual
recognition and discrimination.
• Statistically, it denotes the joint probability of the intensities of
the three color channels.
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
14
15. Color Feature Extraction Models
The extraction of the color features for each of the methods is performed
in the HSV (hue, saturation and value) perceptual color space, where
Euclidean distance corresponds to the human visual system’s notion of
distance or similarity between colors.
i. The Conventional Color Histogram (CCH)[3]
• Indicates the frequency of occurrence of every color in the image
• The probability mass function of the image intensities
• The CCH can be represented as
• Where A, B and C are the three color channels and N is the number
of pixels in the image
• Computationally, it is constructed by counting the number of pixels
of each color (in the quantized color space) 3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
15
16. ii. The Color Correlogram (CC)[3]
• Expresses how the spatial co-relation of pairs of colors changes with
distance
• Defined as a table indexed by color pairs, where the dth entry at
location (i,j) is computed by counting number of pixels of color j at
a distance d from a pixel of color i in the image, divided by the total
number of pixels in the image
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
16
17. iii. The Fuzzy Color Histogram(FCH)[3]
• Here a pixel belongs to all histogram bins with different degrees of
membership to each bin.
• Given a color space with K color bins, the FCH of an image I is
defined as F(I)=[f1,f2,…fk] where , where N is the number
of pixels in the image and μij is the membership value of the jth
pixel to the ith color bin, and it is given by , where dij is the
Euclidean distance between the color of pixel j and the ith color bin,
and ς is the average distance between the colors in the quantized
color space.
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
17
18. iv. The Color/Shape-Based Method (CSBM)[3]
• Here a quantized color image I’is obtained from the original image
I by quantizing pixel colors in the original image.
• A connected region having pixels of identical color is regarded as
an object. The area of each object is encoded as the number of
pixels in the object.
• Further, the shape of an object is characterized by ‘perimeter
intercepted lengths’ (PILs), obtained by intercepting the object
perimeter with eight line segments having eight different
orientations and passing through the object center.
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
18
19. 2. Texture Feature
•Classified into two categories: structural and statistical.
•Structural methods, including morphological operator and
adjacency graph, describe texture by identifying structural
primitives and their placement rules.
•Most effective when applied to textures that are very regular.
•One can define texture as the visual patterns that have properties
of homogeneity that do not result from the presence of only a single
color or intensity.
•Texture determination is ideally suited for medical image
retrievals
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
19
20. Texture Feature Extraction Models
i. The Steerable pyramid[3]
• This pyramid recursively splits an image into a set of oriented sub -
bands and a low pass residual.
• The image is decomposed into on decimated low pass sub bands and
a set of un-decimated directional sub bands.
• Analytically the band pass filter in polar co-ordinates, at I is
composed of a radial part and an angular part.
• where, , and L is the total number of orientations
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
20
21. ii. The Contourlet Transform[3]
• This is combination of a Laplacian pyramid (LP) provides the multiscale
decompositions and a Directional Filter Bank (DFB) provides
multidirectional decompositions.
• The LP is decompositions of original image into a hierarchy of images such
that each level corresponds to a different band of image frequencies. This is
done by taking the difference of the original image and the Gaussian low
pass filtered version of the image. The Gaussian low pass kernel is defined
as
where are the horizontal and vertical frequencies respectively
• The DFB realizes a division of the spectrum into wedge‐shaped slices, as
shown in Figure3 . The low frequency components are separated from the
directional components. After decimation, the decomposition is iterated
using the same DFB.
Figure 3. DFB decomposition into wedges
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
21
22. iii. The Gabor Wavelet Transform[3]
• This Transform dilates and rotates the Two dimensional Gabor
function.
• The image is then convolved with each of the obtained Gabor
functions.
• The Gabor function, in the Fourier domain, is given by:
Where, are the bandwidths of the filter.
• To obtain a Gabor filter bank with orientations and scales,the
Gabor function is rotated and dilated as follows:
where , , and ,
and .
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
22
23. iv. The Complex Directional Filter Bank (CDFB)[3]
• Consists of a Laplacian pyramid and a pair of DFBs, designated as
primal and dual filter banks.
• The filters of these filter banks are designed to have special phase
functions, so that the overall filter is the Hilbert transform of the
primal filter bank.
• A multi‐resolution representation is obtained by reiterating the
decomposition in the low pass branch .
• The block in Figure 4. shows one level of the CDFB, where
are low pass filters.
Figure 4.one level of the CDFB 3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
23
24. 3. Shape Feature[3]
• Used as another feature in image retrieval.
• Useful only in very restricted environments, which provide a good
basis for segmentation .
• Shape descriptors are diverse, e.g. turning angle functions,
deformable templates, algebraic moments, and Fourier coefficients.
4. Combinations of color, texture, and shape [2]
• Features Similarity is based on visual characteristics such as
dominant colors, shapes and textures.
• Many systems provide the possibility to Combine or select between
one or more models.
• In a combination of color, texture and contour features is used.
• Extends the color histogram with textural information by weighting
each Pixel’s contribution with its Laplacian.
• Also provides several different techniques for information retrieval
in video processing. 3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
24
25. Comparison of the Color and Texture Features
•Color and Texture feature models can be compared on the basis of the parameters like
computational speed , Dimensionality , Similarity , Number of orientation , Sub bands ,
retrieval results etc[3].
Table1. Pros and Cons of the four Color Feature Model
Color features Pros Cons
Conventional Color
Histogram
-Simple
-Fast computation
-High dimensionality
-No color similarity
-No spatial info
Color Correlogram -Encodes spatial info -Very slow computation
-High dimensionality
-Does not encode color similarity
Fuzzy Color Histogram -Fast computation
-Encodes color similarity
-Robust to quantization noise
-Robust to change in contrast
-High dimensionality
-More computation
-Appropriate choice of
membership weights needed
Color/Shape Method -Encodes spatial info
-Encodes area
-Encodes shape
-More computation
-Sensitive to clutter
-Choice of appropriate color
quantization thresholds needed
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
25
26. Table 2 Pros and Cons of the Three Texture Feature Model[3]
Texture features Pros Cons
Steerable Pyramid -Supports any number of
orientation
-Sub-bands
undecimated, hence
more computation and
Storage
Contourlet Transform -Lower sub-bands
decimated
-Number of orientations
supported needs to be
power of 2
Gabor Wavelet
Transform
-Achieves highest
retrieval results
-Results in over-
complete representation
of image
-Computationally
intensive
Complex Directional
Filter Bank
-Competitive retrieval
results
-Computationally
intensive
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
26
27. Application: Video Surveillance- Fall Detection[1]
According to the application the feature selection varies. For
our application i.e. fall detection we have to select relevant features
accordingly.
Fall are a common problem for old people. It can result in
dangerous consequences even death. Thus automatic tools for fall
detection using camera vision can be very useful for helping the
elderly. These methods are based on analyzing extracted features. The
various features include
i. Horizontal and vertical gradients of an object
ii. Motion history image(MHI)
iii. Human shape deformation
iv. Motion history and shape analysis
v. Posture
vi. Orientation angle[2]
vii. Change of center of mass width
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
27
28. Comparison and evaluation of these features
We evaluate the performance of the proposed method by considering
detection rate, false positive rate and misdetection Rate.
The main assumptions made in this work, were that:
• The foreground in the video sequence contains only one person.
• The camera position was fixed through all the video capture in order to
be able to perform frame subtraction[1].
Fall detection is either positive if the automatic method properly
recognizes a fall, or negative if it does not. There are four possible
scenarios:
• True positive (TP): a fall occurs, the system detects it;
• False positive (FP): the system announces a fall, but it did not occur;
• True negative (TN): a normal (no fall) movement is performed, the
system does not declare a fall;
• False negative (FN): a fall occurs but the system does not detect it.
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
28
29. •The comparison of the five methods includes sensitivity and
specificity rates.
•They are calculated using the following equations:
Sensitivity(%)=TP/(TP+FN)
Specificity(%)=TN/(TN+FP)
•High sensitivity means that most fall incidents are correctly
detected.
•High specificity implies that most normal activities are not
detected as fall events.
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
29
30. Comparison of efficiency of these features
Table 3 represents the result of human fall detection using
different methods[1].
Table 3. Fall Detection Performance(%)
Method Sensitivity (%) Specificity (%)
Vertical and Horizontal
gradient
92 89
Motion History image 90 75
Shape deformation 96 87
Shape deformation +
Motion History
97 95
Posture 92 90
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
30
31. Conclusion and Future Work
•A comparative study in feature selection on fall detection was
presented.
•For motion history and the vertical and horizontal gradients
approaches, some sequences of sitting down and lying down are
detected as falls.
•For the deformation shape, sitting down sequences are sometimes
indicated as a fall event.
•But the combination of motion history and shape deformation
features presents important results.
Future work includes the construction of new automatic tools for
predicting the risk of falls using different classifier network. The
new system will use the combination of shape deformation and
motion history as features.
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
31
32. References
1. Mabrouka Hagui , Mohamed Ali Mahjoub, “Features selection in
video fall detection”, IEEE IPAS’14: International image processing
applications and systems conference 2014.
2. Hamid Rajabi, Manoochehr Nahvi, “An Intelligent Video
Surveillance System for Fall and Anesthesia Detection For Elderly
and Patients”, 2nd International Conference on Pattern Recognition
and Image Analysis (IPRIA 2015) March 11-12, 2015.
3. Neetesh Gupta, Dr. Vijay Anant Athavale, “Comparative Study of
Different Low Level Feature Extraction Techniques for Content
based Image Retrieval”, International Journal of Computer
Technology and Electronics Engineering (IJCTEE) Volume 1, Issue
1, August 2011.
3/31/2016
MKSSS's Cummins College of Engg. for Women (E&TC
Department), Pune
32