Query clip genre recognition using tree pruning technique for video retrieval

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 4, July-August (2013), © IAEME
257
QUERY CLIP GENRE RECOGNITION USING TREE PRUNING
TECHNIQUE FOR VIDEO RETRIEVAL
Vilas Naik1
, Vishwanath Chikaraddi2
, Prasanna Patil3
1
Department of CSE, Basaveshwar Engineering College, Bagalkot, India
2
3
ABSTRACT
Optimal efficiency of the retrieval techniques depends on the search methodologies that are
used in the data retrieving system. The use of inappropriate search methodologies may make the
retrieval system ineffective. In recent years, the multimedia storage grows and the cost for storing
multimedia data is cheaper. So there is huge number of videos available in the video repositories. It
is difficult to retrieve the relevant videos from large video repository as per user interest. Hence, an
effective video and retrieval system based on recognition is essential for searching video relevant to
user query from a huge collection of videos. An approach, which retrieves video from repository by
recognizing genre of user query clip is presented. The method extracts regions of interest from every
frame of query clip based on motion descriptors. These regions of interest are considered as objects
and are compared with similar objects from knowledge base prepared from various genre videos for
object recognition and a tree pruning technique is employed to do genre recognition of query clip.
Further the method retrieves videos of same genre from repository. The method is evaluated by
experimentation over data set containing three genres i.e. sports movie and news videos.
Experimental results indicate that the proposed algorithm is effective in genre recognition and
retrieval.
Keywords: Genre recognition, Motion detection, Video retrieval, Visual query, ROI, Tree pruning.
1. INTRODUCTION
During recent years, methods have been developed for retrieval of videos based on common
visual features such as, color, texture, shape, motion. These features are employed in finding
similarity between query and videos from repository. Despite the sustained efforts in the last years,
the paramount challenge remains bridging the semantic gap. By this it means that low level features
are easily measured and computed, but the starting point of the retrieval process is typically the high
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 4, July-August (2013), pp. 257-266
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E

258
level query from a human. Translating or converting the question posed by a human to the low level
features seen by the computer illustrates the problem in bridging the semantic gap. Thus Video
retrieval is an important technology used in the design of video search engines and extraction of a
preliminary set of related videos from the database.
Content-based visual information retrieval (CBVIR) is the application of computer vision
and the video retrieval, that is, the problem of searching for intended video in large databases.
“Content” in this context might refers to color, shapes, textures, or any other information that can be
derived from the image or video itself. Without the ability to examine video content, searches must
rely on metadata such as captions or keywords, which may be laborious or expensive to produce. The
content-based video retrieval is the most challenging and important problem of practical value. It can
help users to retrieve desired video from a large video database efficiently based on the video
contents through user interactions. The video retrieval system can be roughly divided into two major
components: a module for extracting representative features from video frames and one defining an
appropriate similarity model to find similar video frames from database. Many approaches used
different kinds of features to represent a video frame, including histogram, shape information,
texture, text analysis. Few approaches integrate features to improve the retrieval performance
However all the frames of video may not be globally homogeneous but are represented by
frame containing different objects. Further frames of videos of same genre contain similar set of
objects may be in different spatiotemporal combination. Therefore extracting objects from video,
annotating them over video is an important step towards finding similar videos. The work proposed
is a method which segments the objects from frames and recognizes them and employ tree pruning
for identifying spatiotemporal combination of these object in frames then into video for recognizing
the genre of the video.
The rest of this paper is organized in 4 sections as follows. Section 2 provides a literature
overview. Section 3 presents the proposed algorithm and the details about it. In Section 4
experimentation and results are discussed. Finally, in Section 5 conclusions are given.
2. RELATED WORK
The literature presents numerous algorithms and techniques for the retrieval of
significant videos from the database due to the widespread interest of content-based video
retrieval in a large number of applications. Some recent researches related to content-based video
retrieval are discussed in this section
The traditional text based search experiences the subsequent drawbacks: The manual
annotations are time consuming and costly to implement. With the increase in the number of media
in a database, the complexities in determining the required information also increases. The manually
annotation of all attributes of the media content is a difficult task [1]. Also the manual annotations
fall short in handling the difference of subjective perception. Acquiring the entire attributes of the
content of any media is unachievable [2]. For this reason, a good search technique for Content-Based
Video Retrieval System (CBVR) is required. In other words, content-based is defined as the search
which will examine the original image contents. Here, content relates to colors, shapes, textures, or
any other information that can be obtained from the image directly [3].Recently, CBVR system has
been widely studied. In CBVR, vital information is automatically taken out by employing signal
processing and pattern recognition techniques for audio and video signals. Digital video needs to
efficiently store the index, store, and retrieve the visual information from multimedia database.
Video has both spatial and temporal dimensions and video index should capture the spatio-temporal
contents of the scene. In order to achieve this, a framework mainly works into three basic steps. Shot
segmentation, Feature extraction and finally similarity match for effective retrieval of the query clip.
This approach has established a general framework of image retrieval from a new perspective. The

259
query example may be an image, a shot or a clip. A shot is a sequence of frames that was
continuously captured by the same camera, while a clip is a series of shots describing a particular
event. Current techniques for content-based video retrieval can be broadly classified into two
categories.
First type with frame sequence matching [4], proposed a scheme that matches videos based
on similarity of temporal activity, it finds similar “actions”. Furthermore, it provides precise
temporal localization of the actions in the matched videos. Video sequences are represented as a
sequence of feature vectors called fingerprints. The fingerprint of the query video is matched against
the fingerprints of videos in a database using sequential matching.
In [5] author achieves the compact shot representation by integrating the color and spatial
features of individual frame. In the video matching step, a shot similarity measure is defined to locate
the occurrence of similar video clips in the database. In [6] original two-phase scheme for video
similarity detection is proposed. For each video sequence, they extract two kinds of signatures with
different granularities: coarse and near coarse. In the second phase, the query video example is
compared with the results of the first phase according to the similarity measure of the near signature.
They achieve better quality results than the conventional approaches. Many works [7],[8],[9] & [10]
has been discussed and designed their models using the concept of frames sequence matching.
Second type is key-frame based shot matching: in [11] the algorithm using key-frames of
abrupt transitions is implemented. They extracted image features (color, texture and motion) around
the key frames. For each key frame in the query, a similar value is obtained with respect to the key
frames in the database video. Consecutive key frames in the database video that are highly similar to
the query key frames are then used to generate the set of retrieved video clips.
In [12] an efficient algorithm for video sequence matching using the modified Hausdorff distance
and the directed divergence of histograms between successive frames is proposed. To effectively
match the video sequences with a low computational load, author uses the key frames extracted by
the cumulative directed divergence and compares the set of key frames using the modified Hausdorff
distance. The same approach of key frame based shot matching is used by [13], [14], [15],[16] &
[17].
The approach of frame sequence matching is derived from the sequential correlation
matching that is widely used in the signal processing domain. These methods usually focus on frame-
by-frame comparison between two clips in order to find sequences of frames that are consistently
similar. The common drawback of these techniques is the heavy computational cost of the exhaustive
search. Although there exist some techniques [18] to improve the linear scanning speed, their time
complexity still remains at least linear to the size of database. Additionally, these approaches are
susceptible to alignment problem when comparing clips of different encoding rates. In second
category, each video shot is represented by a key-frame compactly. To reduce computational cost,
video sequence matching is achieved by comparing the visual features of key-frames. The problem
with these approaches lies in that they all leave out the temporal variations and correlation between
key-frames within an individual shot. Also, it is not clear as to which image should be used as the
key-frame for a shot. To strike a good balance between searching accuracy and computational cost,
in this paper, we propose an integrated approach for shot matching. In contrast to previous
approaches, our approach analyzes all frames within a shot to extract more visual features for shot
representation. Because there does not exists a single visual feature for the best representation of
video content, we integrate several visual features to capture the spatio-temporal information more
accurately.
The proposed method is video retrieval mechanism based on video clip query. The mechanism
first identifies the genre of query clip and retrieves the videos of same genre from video library. For
the genre recognition manually trained tree pruning technique is employed.

260
3. PROPOSED ALGORITHM FOR GENRE RECOGNITION AND RETRIEVAL
The proposed algorithm extracts sample objects as regions of interest (ROI) from every frame
based detection of significant motion in video. Then the features from each of these ROIs are
extracted and stored in feature file, these features are used for matching. The user entered query clip
frames are extracted and regions of interest are segmented out. Euclidean distance based matching is
adopted to match the ROIs with those stored in database and identified ROI are annotated over the
ROIs in the video itself. Finally system runs tree pruning based technique to recognize genre of
query clip and retrieve the relevant videos from archive.
The proposed algorithm is described in the following steps,
Step1: One video at a time is read from the library and frames are extracted. Each frame is
compared with the previous one and motion is detected. Frames with significant motion differences
are separated as key frame.
Step2: Regions of interest are detected using bounding box and are classified.
Step3: Mean and Standard deviation of RGB and HSV color channels are extracted from the regions
detected using bounding box.
Step4: Query video clip is read, frames are extracted and motion detection is run. Key frames, ROIs
are extracted along with the features.
Step5: Euclidean distance based matching is adopted to match the ROIS with those stored in
database and identified object names are annotated over the objects in the video itself.
Step6: Then system runs a tree pruning based technique to retrieve the relevant videos.
Figure 4.1 Block diagram of the proposed model
VIDEO ARCHIVE
VIDEO
MOTION BASED
KEYFRAME
IDENTIFICATON
REGION
SEGMENTATION AND
FEATURE EXTRACTION
EUCLIDIAN DISTANCE BASED MATCHING
QUERY VIDEO
MOTION BASED
KEYFRAME
IDENTIFICATON
REGION
SEGMENTATION AND
FEATURE EXTRACTION
TREE PRUNING FOR GENRE
CONFIRM
RETRIEVAL RESULT

261
3.1 Key frame identification
Here the proposed method works on group of frames extracted from a video. It takes a list
of files or frames in order in which they will be extracted. It is based on predefined
threshold that specify whether 2 video frames are similar. It s main function is to choose
smaller number of video representative key frames. It starts from 1st
frame from sorted list of
files. If consecutive frames are within threshold, then two frames are similar. Repeat process till
frames are similar, delete all similar frames & take 1st
as key -frame. Start with next frame which is
outside threshold & repeat the steps for all frames of video.
3.2 Identification of region of interest
Detecting the visually attentive regions in images is done using bounding box technique. The
'Bounding Box' function draws a rectangle around region of interest. The rectangle containing the
region, a 1-by-Q *2 vector, where Q is the number of image dimensions: ndims (L), ndims (BW), or
numel (CC.ImageSize), width of the bounding box, upper-left corner.
3.3 Feature extraction
Once regions of interest in the video shot has been segmented and tracked, then computing
the features of the region of interest is done and are stored in the feature library. For each region of
interest Mean and Standard deviation of RGB and HSV channels are extracted.
3.4 Matching and retrieval
In the retrieval, the database video clips that are similar to the query clip are retrieved by
means of measuring the similarity in between the query clip and the database video clips. When a
query clip is given to the proposed retrieval system, all the aforesaid features are extracted as
performed for the database video clips. Then, with the aid of Euclidean distance mechanism,
similarity is measured between every database video clip and the query clip, finally using this result
videos are retrieved with the help of tree pruning.
The distance metric can be termed as similarity measure, which is the key-component in
Content Based Image Retrieval. Here the Euclidean distance between ROIs of the videos in the
database and the ROI of query video are calculated and used for ranking. The query video’s ROI is
more similar to the database video’s ROI if the distance is smaller. If x and y are feature vectors of
database ROIs and query ROI respectively. Then the distance metrics are defined as follows:
The Euclidean distance measure can be defined as
Finally existing video search algorithm utilize a tree-structured hierarchy and subtree pruning
to reduce the search space while traversing the tree from root to leaf nodes for a given query video.

262
Figure 4.2. Illustration of the notion of NRC. (a) Tree-structured hierarchy with the NRC rk
associated with a node k , k =1, 2, 3, 4. (b) The NRC rx of a node x for the cluster Cx
Details are as illustrated in Fig. 1, each node x has a feature vector which represents its whole
cluster, i.e., subtree within the bound of what we call NRC. The NRC, denoted by rx, is defined
as the maximum distance between the node x and the subordinate leaf nodes, i.e., belonging to ,
and is computed as follows:
Where d (.) denotes a distance metric between two feature vectors. To retrieve all similar
videos whose distance from the query q is within a threshold value of , every node in the tree
hierarchy needs to be visited, but some irrelevant clusters can be pruned without degrading the recall
rate of retrieval unless the following triangle inequality holds:
The evaluation of d(x, y) in (2) involves distance computation between two high-
dimensional feature vectors.

263
4. EXPERIMENTATION AND RESULTS
The proposed content-based video retrieval system is implemented using MATLAB
(Matlab2010b) and the performance of the proposed system is analyzed using the evaluation
metrics including precision, recall measures. The experimented are conducted in windows XP based
system with 3GB RAM on a data set containing 200 videos.
4.1 Datasets
We have performed experiments on a dataset of 200 videos obtained from YouTube
website (www.youtube.com). The collected video contains the following categories of objects
presented in these videos such as sports, news and movies. The sample snapshot for the input videos
of the proposed system is given in figure.
Figure 5.1: A sample snapshot for the input database

264
4.2 Identification of region of interest
This representation contains the significant visual contents in a shot by processing
every frame presented in it. The IOROI for the SPORTS video is constructed as shown in figure.
Fig 5.2 a) Snapshot of query video
Figure 5.2 b).A sample snapshot of region of interests
4.3 Feature extraction
For each region of interest Mean and Standard deviation of RGB and HSV channels are
extracted.
4.4 Video retrieval using tree pruning
The features Mean and Standard deviation of RGB and HSV channels are extracted from the
ROI of query video and are matched with the features of ROIs stored in library using the Euclidian
distance mechanism. Finally for retrieving, the proposed system uses tree pruning technique, where
a tree-structured hierarchy is used, in which each node is associated with a ROI image or a feature
vector which represents all of the images belonging to its subtree. In the same context, each child
represents a disjoint subset of the images and thus partitions the subtree rooted at its parent node into
smaller units. Each leaf node corresponds to a single video in the database. With the node radius for
cluster (NRC), which is defined as the maximum distance between the node and its descendants or
cluster, stored at each intermediate node, the triangle inequality is applied to reduce the search space
by pruning irrelevant clusters. The matching score compute is used to retrieve the videos from
the dataset and the retrieved video for the corresponding input videos is given in figure 5.
Figure 5.3 A sample snapshot of retrieved videos

265
4.5 Quantitative analysis
The performance of the proposed approach system is evaluated on the input dataset
using the precision and recall measure. Graph-1 shows the Precision-Recall plot: For quantitative
analysis, videos from each category are given to the proposed system and results are evaluated with
the defined as follows.
Table 5.1 Experimental results
VIDEO GENRE RECALL PRECESSION
SPORTS .625 1.00
MOVIES .538 .92
NEWS .625 1.00
5. CONCLUSION AND FUTURE WORK
An algorithm for content based video retrieval is designed and experimented on sufficient
number of different genres of videos .The algorithm is implemented in Matlab 2010b and executed
on Intel core2duo, 2.66 GHz processor with 3 GB of RAM. The algorithm initially extracted Regions
of interest through motion estimation. Features of these ROIs are extracted and are matched with the
ROIs of query video. Finally retrieval of videos is carried out using tree pruning. The proposed
method has been experimented on different genres of videos like sports, movies and news clips. The
performance of the video summarization algorithm is evaluated by the precision and recall
measures. The experimental results on standard video datasets reveal that the proposed model is
robust and extracts videos from variant genres efficiently.
The system can be further updated with more complicated features that include both shape
and texture descriptors like wavelet moments.
REFERENCES
[1] Chia-Hung Wei, Chang-Tsun Li,(2004) “Content–based multimedia retrieval -
introduction, applications, design of content-based retrieval systems, feature extraction
and representation” International Journal of Wireless and Microwave
Technologies(IJWMT)ISSN: 2076-1449
[2] Petkovic, Milan, Jonker, Willem,(2003)”Content-based video retrieval”, Kluwer Academic
Publishers, Boston, Monograph, 2003, 168 p., Hardcover ISBN: 978-1-4020-7617-6
[3] John Eakins, Margaret Graham,(1999) University of Northumbria at Newcastle, “Content-
based Image Retrieval” (JISC Technology Applications Program Report 39 -1999deo
Browsing Strategies.

266
[4] Mohan, R.(1998), “Video sequence matching”, Proceedings of International Conference on
Acoustic, Speech and Signal Processing, pp. 3697–3700.
[5] Tan Y. Kulkarni S., & Ramadge, P. (1999), “A framework for measuring video similarity
and its application to video query by example”, International Conference on Image
Processing, pp. 106–110.
[6] Naphade, M., Yeung, M. & Yeo, B. (2000), “A novel scheme for fast and efficient video
sequence matching using compact signature”,SPIE Conference on Storage and Retrieval
for Media Database, pp. 564–572.
[7] Hoad, T. & Zobel, J. (2003), “Fast video matching with signature alignment”,ACM
SIGMM International Workshop on Multimedia Information Retrieval, Berkeley, CA, pp.
262–269.
[8] Ren, W. & Singh, S. (2004), “Video sequence matching with spatio-temporal
constraints”, International Conference on Pattern Recognition, pp. 834–837.
[9] Kim, C. & Vasudev, B. (2005), “Spatiotemporal sequence matching for efficient video
copy detection”, IEEE Transactions on Circuits and Systems for Video Technology 15(1),
127–132.
[10] Toguro, M., Suzuki, K., Hartono, P. & Hashimoto, S. (2005), “Video stream retrieval based
on temporal feature of frame difference”, Proceedings of International Conference on
Acoustic, Speech and Signal Processing, Volume 2, pp. 445–448.
[11] Jain, A., Vailaya, A. & Wei, X. (1999), “Query by video clip”, Multimedia Systems 7, 369–
384.
[12] Lienhart, R., Effelsberg, W. & Jain, R. (2000), “VisualGREP: A systematic method to
compare and retrieve video sequences”, Multimedia Tools and Applications 10(1), 47–72.
[13] Liu, X., Zhung, Y. & Pan, Y. (1999), “A new approach to retrieve video by example video
clip”, ACM International Conference on Multimedia, pp. 41–44.
[14] Kim, S. & Park, R. (2002), “An efficient algorithm for video sequence matching using the
modified Hausdorff distance and the directed divergence”, IEEE Transactions on Circuits
and Systems for Video Technology 12(7), 592–596.
[15] Diakopouos, N. & Volmer, S. (2003), “Temporally tolerant video matching’, in ‘ACM
SIGIR Workshop on Multimedia Information Retrieval”, Toronto, Canada.
[16] Peng, Y. & Ngo, C. (2004), “Clip-based similarity measure for hierarchical video
retrieval”, ACM SIGMM International Workshop on Multimedia Information Retrieval, pp.
53–60.
[17] Luo, H., Fan, J., Satoh, S. & Ribarsky, W. (2007),”Large scale news video database
browsing and retrieval via information visualization”, ACM symposium on applied
computing, Seoul, Korea, pp. 1086–1087.
[18] Kashino, K., Kurozumi, T. & Murase, H. (2003), “A quick search method for audio and
video signals based on histogram pruning”, IEEE Transactions on Multimedia 5(3),
348–357.
[19] Reeja S R and Dr. N. P Kavya, “Motion Detection for Video Denoising – The State of Art
and the Challenges”, International Journal of Computer Engineering & Technology
(IJCET), Volume 3, Issue 2, 2012, pp. 518 - 525, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.
[20] Vilas Naik and Raghavendra Havin, “Entropy Features Trained Support Vector Machine
Based Logo Detection Method for Replay Detection and Extraction from Sports
Videos”, International Journal of Graphics and Multimedia (IJGM), Volume 4, Issue 1, 2013,
pp. 20 - 30, ISSN Print: 0976 – 6448, ISSN Online: 0976 –6456.

Query clip genre recognition using tree pruning technique for video retrieval

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Query clip genre recognition using tree pruning technique for video retrieval

Ähnlich wie Query clip genre recognition using tree pruning technique for video retrieval (20)

Mehr von IAEME Publication

Mehr von IAEME Publication (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Query clip genre recognition using tree pruning technique for video retrieval