SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering
               Research and    Applications (IJERA)       ISSN: 2248-9622
           www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028

            The Stroke Filter Based Caption Extraction System
            Miss.Dharmini Esther Thangam.P#1(PG scholar),Mrs.Akila
                              Agnes.S#2(Asst.Prof)
                    Department of Computer Science and Engineering, Karunya University
                                            Coimbatore, India


Abstract-
            Captions in videos provide rich              then caption extraction is carried out. By considering
semantic information the contents of the video can       temporal feature approach the computational load is
be clearly understand with help of the captions.         reduced and more accurate captions are extracted.
Without even seeing the video the person can             The proposed caption extraction method consists of
understand about the video by knowing the                three important steps. They are caption detection,
captions in the video. By considering these reasons      caption localization and caption segmentation. For
captions extraction in videos become an important        caption detection we use stroke based detection
prerequisite. In this paper we use a stroke filter,      method rather than edge, corner or texture based
the stroke filter identifies the strokes in the video    detection because sometimes background has dense
and usually caption regions have strokes such that       edges so background edges may have been mistaken
the strokes identified belongs to captions by which      as that of caption edges in this kind of situation edge
the captions are detected. Then the locations of the     based detection doesn’t works well, If the
captions are localized. In the next Step the caption     background has densely distributed corners corner
pixels are separated from the background pixels.         based detection doesn’t suits and in texture based
At last a post processing step is performed to           detection we must previously know about the texture
check whether the correct caption pixels are             of the captions [4]. Therefore stroke filter is used
extracted. In this paper a step by step sequential       because captions only Have stroke like edges. For
procedure to extract captions from videos is             caption localization spatial-temporal localization is
proposed.                                                used whereas the previous methods used DCT
                                                         algorithm [7] .The drawback of DCT algorithm is
Keywords-stroke filter, caption detection, caption       that it doesn’t considers the time interval of the
extraction,caption pixels                                captions so the same captions are found again and
                                                         again. For caption segmentation we use spatial-
                I. INTRODUCTION                          temporal segmentation, the previous methods used
              Video processing is a particular case      only temporal segmentation [11]. The disadvantage
of signal processing, which often employs video          of the previous method is that there are many types of
filters and        where       the     input       and   close up and medium shots make it difficult to
output signals are video files or streams. The use of    accurately distinguish them, it involves many
digital video is becoming widespread all over the        complex steps and does not handle false positives due
place, extending throughout the world. Captions in       to camera operations, whereas the proposed system
videos provide rich semantic information which is        aims to guarantee each clip contains same caption, so
useful in video content analysis, indexing and           that the efficiency and accuracy of caption extraction
retrieval, traffic monitoring and computerized aid for   is greatly improved.
visually impaired. Caption extraction from videos is a
difficult problem due to the unconstrained nature of                II. PROPOSED APPROACH
general-purpose video. Text can have arbitrary color,
                                                                      The proposed approach is a step by step
size, and orientation. Backgrounds may be complex
                                                         sequential procedure to extract captions from videos.
and changing. The previous caption extraction            The proposed system functions as follows
method considers each video frame as an independent
image and it extracts caption from each video frame.
However, captions occurring in video usually persist
for several seconds. Consider if a movie played by 30
frames per second contains large number of video
frames and if each video frame is considered as an
independent image it takes more time and the same
captions may be extracted again and again [5].To
overcome this disadvantage in our method we
consider the temporal feature in videos such that the
time interval in which same caption is present and

                                                                                               1023 | P a g e
Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering
               Research and    Applications (IJERA)       ISSN: 2248-9622
           www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028
                                                             four values based on horizontal, vice-diagonal,
                     Caption detection                       vertical, main diagonal directions. Then each contour
                                                             is broken down into several sub contours based on
                                                             their connectivity and gradient direction. Then each
                                                             sub contour must contain a parallel edge with the
                                                             same gradient direction in a specified interval
                                                             depending on the width of character strokes.
                   Caption localization




                         Caption
                       Segmentation
                                                                       Fig.2 quantize Õk into four values
            Fig .1Steps in proposed system

A. Caption Detection
A stroke filter is used to extract captions from videos.
The stroke filter identifies the stroke like edges in the
captions. Strokes of a pen or brush are the                          Fig. 3 subcontours in the character K
movements or marks that we make with it when
writing or painting. The stroke filter which can detect      If a contour does not satisfy the above mentioned two
the strokes in video images is designed as follows.          conditions it is removed as non stroke edge and the
First, we define a local image region as a stroke-like       edge doesn’t belongs to captions [12]. If contour
structure, if and only if, in terms of its intensities,(1)   satisfies the above mentioned conditions it is
it is different from its lateral regions(2) its lateral      considered as stroke like edge pixels and thus the
regions are similar to each other (3) it is nearly           captions are detected.
homogenous. By using stroke filter the stroke like           B. Caption Localization
edge pixels are denoted by 1 and the remaining pixels        There are two types of caption localization is
are denoted by 0[17].The stroke like edge pixels are         performed. They are spatial localization and temporal
taken as contours. Sometimes the edge pixels may             localization Spatial localization involves finding the
also come from complex background so to                      location of the caption in the video frame. Temporal
differentiate the caption pixels from edge pixels we         localization involves finding the time interval in
use some techniques they are (i)the strokes always           which the captions are found in consecutive video
have a range of height and width which depend on             frames.
the size of the characters and length and width of            1)Spatial localization-We use SVM classifier to
characters in captions may not be too long. Let w(c i)       identify the location of the caption in the video
and h(ci) denote the width and height of minimum             frame. Compared with other classifiers such as neural
enclosing rectangle(MER) of contour ci, if                   network and decision tree, support vector machine
         W(ci)<=threshold1,h(ci)<=threshold2                 (SVM) is easier to train and has better generalization
                            (1)                              ability. Therefore, we use the SVM classifier and
Where the thresholds are chosen based on the size of         choose radial basis function (RBF) as kernel function
characters to be detected. If the above condition (i) is     of this SVM classifier. The basic idea of SVM is to
not satisfied the contour ci is removed as non stroke        implicitly project the input space onto a higher
edges in complex background.(ii)for each contour the         dimensional space where the two classes are more
gradient direction Õk has to found out. The gradient         linearly separable. Given the caption candidate
Õk is quantized into four values based on horizontal,        produced by the caption detection module, we first
vice-diagonal, vertical, main diagonal directions.           normalize it to a height of 15 pixels height and
Then each contour is broken down into several sub            corresponding resized weight. Then, we use a sliding
contours based on their connectivity and gradient            window with 15X15 pixels to generate several
direction [1]. Then each where the thresholds are            samples[14]. We use the trained SVM classifiers to
chosen based on the size of characters to be detected.       classify each of these samples and fuse the results.
If the above condition (i) is not satisfied the contour      The SVM output scores of theses samples are
ci is removed as non stroke edges in complex                 averaged. If the average is larger that a predefined
background.(ii)for each contour the gradient direction       threshold, then the whole text line is regarded as a
Õk has to found out. The gradient Õk is quantized into       text line, otherwise it is regarded as a false alarm. We


                                                                                                    1024 | P a g e
Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering
               Research and    Applications (IJERA)       ISSN: 2248-9622
           www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028
set the threshold value as 0.3 based on the trade-off      captions extracted must be carried out to get the
between the false acceptance rate (FAR) and false          accurate captions. The sliding window protocol is
rejection rate (FRR).                                      used to refine the segmentation results . we choose an
     2)Temporallocalization-Temporal localization          8 by 8 window as sliding window size[18]. A number
 involves identifying the time interval of the captions.   of caption pixels in sliding window is checked
  It identifies the consecutive frames that containing     against a threshold value. If the number of the
  the same caption and forms a clip with caption or a      caption pixels exceeds a threshold value, then it
clip with no caption St1,t2(x,y)=∑( Et1(x,y) ^Et2(x,y))    implies that some of background pixels are with the
                           (2)                             caption pixels so again segmentation is to carried out
                                                           to get the accurate captions[20].
The logical AND operation is performed to find out
whether the consecutive frames containing the same         These sequential steps are performed to extract
captions. If the result is 1 the both edge pixels are      captions from video.
same otherwise the edge pixels are different. All the
edge pixels of consecutive frames are performed                         III. DATA FLOW DIAGRAMS
logical AND operation to check whether the frames          A. Context Level Dfd
contains captions. If the frames containing same
captions are identified then they are combined to
                                                                              Caption                      Captions
form a clip[16]. The more important thing related to
temporal localization is that whether the clip                                extraction
                                                           Video
generated contains captions ,since only the clips          containing
containing captions are meaning full for the               captions
subsequent computation[8]. If the consecutive frames
does not contain captions they form a clip with no         B. Level 1 Dfd Of Caption Detection
caption. Consider consecutive video frames It1(x,y),
It2(x,y), It3(x,y)…..... It(k-1)(x,y) does not contain
captions but frame Itk(x,y) contains caption then                            Detect stroke
frames It1(x,y) to It(k-1)(x,y) will form a clip with no                     like edges
caption. So no further computation has to be carried          video
out in clip containing no caption.
C. Caption Segmentation
The caption segmentation involves separating the
caption pixels from pixels in caption region located.                       Identify caption
A color based approach is used to extract the caption                       contour
pixels. It is based on the idea that usually caption
pixels are of same color and the color of the caption
pixels is different from that of background pixels.
Mostly the color of the captions will be same over                         Caption contour
different video frames. After getting the color models
of the caption pixels, the probability of each pixel in
the image to caption pixel can be calculated[21].          C. Level 1 DFD Of Caption Localization
According to the probability, most of the caption
pixels can be segmented from the background.
Another method is that partition the RGB color space                                  Spatial
into 64 cubes. We exploit the temporal homogeneity                                    localization
in color of caption pixels to filter out from                    Caption
background pixels. Let Bt1(x,y),Bt2(x,y)........Btn(x,y)         contour
be the pixels of the video frame of the clip containing
captions. By performing logical AND operation of
the pixels we can identify the pixels of the captions
and the background because the caption pixels have
more population than the background pixels.                                         Temporal
          Bt1(x,y)^Bt2(x,y)^........Btn(x,y)                                        localization
(3)
Thus the captions are extracted from the videos.
D. Refining
The captions which have extracted may have some of
the background pixels or some of the caption pixels
may be in the background itself. So refining of                                     Location of caption in video
                                                                                                frame
                                                                                                     1025 | P a g e
Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering
               Research and    Applications (IJERA)       ISSN: 2248-9622
           www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028
D.Level 1 DFD Of Caption Segmentation
                                                             Method        Ng       Nd         Nc         Ptem      Rtem

                       Identify                              Texture       1660     1730       1572       0.909     0.947
                       temporal                              based
                       homogeneity                           Corner        1660     1695       1543       0.910     0.930
    Caption                                                  based
    in video           of colour                             Proposed      1660     1669       1660       0.995     1.000
    frame                              Pixel containing      Method
                                       captions
                                                          B.Spatial Localization
                                                          The performance of spatial localization is measured
                                                          on the definition that a correct localization rectangle
                        Segment the                       is counted if and only if the intersection of a located
                        captions                          text    rectangle(LTR) and a ground-truth text
                                                          rectangle(GTR) covers at least 90% of their
                                                          union.Recall Rloc=Nrc/Ntc and precision Ploc=Nrc/Ngc
                                                          where
                                                          Nrc  number of text rectangles located correctly
                                                          Ngc number of ground truth rectangle
                          Captions                        Ntc total number of rectangles located by the
                                                          proposed system.The performance of the proposed
                                                          system is given by calculation is
These are the data flow diagram of the proposed                               R                   P
approach.
                                                          Location           Rloc=40/50=0.8           Ploc=48/50=0.96
   IV. EXPERIMENTAL EVALUATION
            The datasets that may be chosen as            `            ``
movies, news, sports, talk shows and entertainment        C. Segmentation
shows. In our experiment we use dataset as movies         The failure of segmentation results occur due to two
and sports video. We evaluate the performance of          important reasons. They are (i)due to the incorrect
the proposed system based on three factors. They are      localization that is some part of the caption is outside
based on temporal localization, spatial localization      the bounding box or the entire part of the caption is
and segmentation of captions. For experiments we          not inside the bounding box (ii) if the background is
take 50 sample videos. We extract captions from this      of same color of that of pixels segmentation may
videos and we identify the precision and recall.          result to some amount of failure. Two metrics are
                                                          used for evaluation of segmentation performance
A.Temporal Localization                                   .They are Rseg and Pseg. Recall Rseg=Nrp/Ngp and
 Evaluating the performance of temporal localization      precision Pseg=Nrp/Ntp where
is an important factor because computations are           Ntp  number of text pixels segmented by the system
carried out after the temporal localization .Two          Ngp number of text pixels ground truth
metrics are used for the evaluation of temporal
localization. They are(1)Precision(2)Recall.
PrecisionPtem=Nc/Nd                                                         R                        P
RecallRtem=Nc/Ng
 Where Nc denotes the number of boundaries                Segmentation       Rseg=42/50=0.84          Pseg=46/50=0.92
correctly detected , Nd denotes the number of
boundaries output by the system, Ng denotes the           The experimental results of our proposed method
number of ground truth boundaries .The ground truth       implemented in sports video is
boundaries involves three cases .They are (i) from a
caption to a different caption
 (ii) from no caption to a caption (iii) from a caption
to no caption. The comparison of two temporal
localization methods in different dataset is given as
Thus the stroke filter method has more excellent
performance than the other two methods with the
increase of 8.6%,8.5% in Ptem and 0.53%,7% in Rtem.




                                                                                                    1026 | P a g e
Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering
              Research and    Applications (IJERA)       ISSN: 2248-9622
          www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028




              Fig. 4 input video

                                                            Fig .7 Extraced Captions

                                                             IV. CONCLUSION
                                                        Our proposed system performs step by step
                                              sequential procedure to extract captions from videos.
                                              This method is simple, efficient and flexible. A
                                              stroke like edge detector based on contours is used,
                                              which can effectively remove non-caption edges
                                              introduced from complex background and greatly
                                              benefit the follow-up spatial and temporal
                                              localization processes. This paper can be further
                                              developed by using optical character recognition
                                              module to identify each character in the captions.
                                              This method is used in the traffic monitoring in such
                                              a way that a video camera is located at each
                                              checkpoint which records the video of all the vehicles
                                              crossing that checkpoint .So when vehicles violate
                                              traffic rules the number of that vehicle can be
                                              identified extracting the captions(number plate) from
         Fig .5 stroke edge detection         the video recorded in the checkpoint.Thus the method
                                              is used in the traffic monitoring.

                                                         ACKNOWLEDGMENTS
                                              The authors would like to thank ALMIGHTY GOD
                                              whose blessings have bestowed in me the will power
                                              and confidence to carry out my work

                                                               REFERENCES
                                              [1] Xiaoqian Liu and Weiqiang Wang (April 2012)
                                                  “Robustly         Extracting Captions in Videos
                                                  Based on Stroke-Like Edges and Spatio-
                                                  Temporal Analysis ,” in Proc. IEEE
                                                  Transactions on multimedia, vol 14,No 2.
                                              [2] Bertini.M, C. Colombo, and A. D.
                                                  Bimbo,(Aug.2001)                    “Automatic
                                                  captionlocalization in videos using salient
                                                  points”, in Proc. Int.Conf.Multimedia and
                                                  Expo, pp. 68–71.
             Fig .6 localization

                                                                                   1027 | P a g e
Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering
               Research and    Applications (IJERA)       ISSN: 2248-9622
           www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028
[3] Cho.J, S. Jeong, and B. Choi,( Aug. 2004)            [15] Tupin, F., Maitre, H., Mangin, J.F., Nicolas,
     “News video retrieval using automatic indexing           J.M., Pechersky, E., 1998. Detection of linear
     of korean closed-caption,” Lecture Notes in              features in SAR images: Application to road
     Computer Science, vol. 2945, pp. 694–703.                network extraction. IEEE
[4] Fan.J,D.K.Y. Yau, A.K. Elmagarmid, and W.G.                Trans. GeoScience Remote Sensing 36, 434–
     Aref,(2001)“Automatic Image Segmentation by         453.
     Integrating Color-Edge Extraction and Seeded        [16] Vapnick, V., 1995. The Nature of Statistical
     Region      Growing”,      IEEE      Trans.    on        Learning Theory. Springer.Wang, Y., Liu, Y.,
     ImageProcessing, 10(10): 1454-1466.                      Huang, J.C., 2000. Multimedia content analysis
[5] [5] Justin miller, X. R. Chen, W. Y. Liu, and H.          using both audioand visual clues. IEEE Signal
     J. Zhong, (Sep 2001) “Automatic location of              Process. Mag. 17 (6), 12–36.
     text in video frames”, in Proc. ACM Workshop        [17] Wu, V., Manmatha, R., Riseman, E.M., 1997.
     Multimedia :Information Retrieval, Ottawa,               TextFinder: An automatic system todetect and
     ON, Canada.                                              recognize text in images. IEEE Trans. PAMI 21
[6] Jagath Samarabandu and Xiaoqing Liu(2007)                 (11), 1224–1229.
     ”An Edge-based TextRegion Extraction                [18] Ye, Q., Huang, Q., Gao, W., Zhao, D., 2005.
     Algorithm      for    Indoor     Mobile     Robot        Fast and robust text detection in imagesand
     Navigation”, IEEE Transactions of Knowledge              video frames. Image Vision Comput. 23, 565–
     and DataEngineeringpp.273-280.                           576.
[7] K. C. K. Kim, H. R. Byun, Y. J. Song, Y. W.          [19] Zheng, Y., Li, H., Doermann, D., 2004.
     Choi, S. Y. Chi, K. K. Kim, Y. K. Chung, (Aug.           Machine           printed       text       and
     2004) “Scene text extraction in natural scene            handwritingidentification in noisy document
     images using hierarchical feature combining              images. IEEE Trans. PAMI 26 (3), 337–353.
     and verification,” in Proc. Int.Conf. Pattern       [20] Canny, J.F., 1986. A computational approach to
     Recognition, vol.2, pp. 679–682.                         edge detection. IEEE Trans. Pattern Anal.
[8] M. R. Lyu, J. Song, and M. Cai,( Feb. 2005) “A            Machine Intell. 8, 679–698.
     comprehensive method for multilingual video
     text detection, localization, and extraction,”
     IEEE Trans.Circuit and Systems for Video
     Technology, vol. 15, no. 2, pp. 243–255.
[9] S.Liao, Max W. K. Law, and Albert C. S.
     Chung,( May 2009)” Dominant Local Binary
     Patterns for Texture Classification”, IEEE
     Transactions on Image Processing, Vol. 18, No.
     5.
[10] Qixiang Ye, Wen Gao, Weiqiang Wang, Wei
     Zeng, ( May 2009 ) “A Robust Text Detection
     Algorithm in Images and Video Frames ”,
     IEEE Trans. Circuits Syst. Video Technol., vol.
     19(5), pp. 753–759.
[11] K. I. Kim, K. Jung, and J. H. Kim, “Texture-
     based approach for textdetection in images
     using support vector machines and continuously
     adaptive mean shift algorithm,” Pattern Anal.
     Mach. Intell., vol. 25,no. 12, pp. 1631–1639,
     2003.
[12] C. Garcia and X. Apostolidis, “Text detection
     and segmentation incomplex color images,” in
     Proc. IEEE Int. Conf. Acoustics, Speech,and
     Signal Processing, Istanbul, Turkey, 2000, pp.
     2326–2329.
[13] A.K.Jain and S. Bhattacharjee, “Text
     segmentation using Gabor filtersfor automatic
     document processing,”Mach, Vis. Appl,, vol. 5,
     no. 3, pp.169–184, 1992.
[14] Lyu, M., Song, J., Cai, M., 2005. A
     comprehensive method for multilingual video
     textdetection, localization, and extraction. IEEE
     Trans. CSVT 15 (2), 243–255.


                                                                                            1028 | P a g e

Weitere ähnliche Inhalte

Was ist angesagt?

Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
 
Shadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective ViewShadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective Viewijtsrd
 
Recognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesRecognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesIJCSEA Journal
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...ijsrd.com
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3Cody Liu
 
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...TELKOMNIKA JOURNAL
 
Secure Image Transfer in The Domain Transform DFT
Secure Image Transfer in The Domain Transform DFTSecure Image Transfer in The Domain Transform DFT
Secure Image Transfer in The Domain Transform DFTijcisjournal
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person trackingsajit1975
 
11.compression technique using dct fractal compression
11.compression technique using dct fractal compression11.compression technique using dct fractal compression
11.compression technique using dct fractal compressionAlexander Decker
 
Compression technique using dct fractal compression
Compression technique using dct fractal compressionCompression technique using dct fractal compression
Compression technique using dct fractal compressionAlexander Decker
 
IRJET- Study of SVM and CNN in Semantic Concept Detection
IRJET- Study of SVM and CNN in Semantic Concept DetectionIRJET- Study of SVM and CNN in Semantic Concept Detection
IRJET- Study of SVM and CNN in Semantic Concept DetectionIRJET Journal
 
Implementing kohonen's som with missing data in OTB
Implementing kohonen's som with missing data in OTBImplementing kohonen's som with missing data in OTB
Implementing kohonen's som with missing data in OTBmelaneum
 
Digital video watermarking scheme using discrete wavelet transform and standa...
Digital video watermarking scheme using discrete wavelet transform and standa...Digital video watermarking scheme using discrete wavelet transform and standa...
Digital video watermarking scheme using discrete wavelet transform and standa...eSAT Publishing House
 
Vocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approachVocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approachsipij
 

Was ist angesagt? (20)

Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Learning Moving Cast Shadows for Foreground Detection (VS 2008)
Learning Moving Cast Shadows for Foreground Detection (VS 2008)
 
E04552327
E04552327E04552327
E04552327
 
Shadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective ViewShadow Detection and Removal Techniques A Perspective View
Shadow Detection and Removal Techniques A Perspective View
 
Recognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesRecognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenes
 
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...Shadow Detection and Removal in Still Images by using Hue Properties of Color...
Shadow Detection and Removal in Still Images by using Hue Properties of Color...
 
F0953235
F0953235F0953235
F0953235
 
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...
 
G0523444
G0523444G0523444
G0523444
 
CVGIP 2010 Part 3
CVGIP 2010 Part 3CVGIP 2010 Part 3
CVGIP 2010 Part 3
 
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
FPGA Based Pattern Generation and Synchonization for High Speed Structured Li...
 
Secure Image Transfer in The Domain Transform DFT
Secure Image Transfer in The Domain Transform DFTSecure Image Transfer in The Domain Transform DFT
Secure Image Transfer in The Domain Transform DFT
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person tracking
 
11.compression technique using dct fractal compression
11.compression technique using dct fractal compression11.compression technique using dct fractal compression
11.compression technique using dct fractal compression
 
Compression technique using dct fractal compression
Compression technique using dct fractal compressionCompression technique using dct fractal compression
Compression technique using dct fractal compression
 
IRJET- Study of SVM and CNN in Semantic Concept Detection
IRJET- Study of SVM and CNN in Semantic Concept DetectionIRJET- Study of SVM and CNN in Semantic Concept Detection
IRJET- Study of SVM and CNN in Semantic Concept Detection
 
Implementing kohonen's som with missing data in OTB
Implementing kohonen's som with missing data in OTBImplementing kohonen's som with missing data in OTB
Implementing kohonen's som with missing data in OTB
 
Pc Seminar Jordi
Pc Seminar JordiPc Seminar Jordi
Pc Seminar Jordi
 
Digital video watermarking scheme using discrete wavelet transform and standa...
Digital video watermarking scheme using discrete wavelet transform and standa...Digital video watermarking scheme using discrete wavelet transform and standa...
Digital video watermarking scheme using discrete wavelet transform and standa...
 
Vocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approachVocabulary length experiments for binary image classification using bov approach
Vocabulary length experiments for binary image classification using bov approach
 
H3602056060
H3602056060H3602056060
H3602056060
 

Andere mochten auch

Most popular diwali gift picks for friends
Most popular diwali gift picks for friendsMost popular diwali gift picks for friends
Most popular diwali gift picks for friendsPrimogiftsindia
 
Estratégia de testes ágeis
Estratégia de testes ágeisEstratégia de testes ágeis
Estratégia de testes ágeisCristiano Caetano
 
D. afonso henriques
D. afonso henriquesD. afonso henriques
D. afonso henriquesaasf
 
Alberto Garrido y Bárbara Wilaarts y la huella hídrica en el VII Congreso Ibé...
Alberto Garrido y Bárbara Wilaarts y la huella hídrica en el VII Congreso Ibé...Alberto Garrido y Bárbara Wilaarts y la huella hídrica en el VII Congreso Ibé...
Alberto Garrido y Bárbara Wilaarts y la huella hídrica en el VII Congreso Ibé...Nueva Cultura del Agua
 
FantasticphotosInsólito y raro
FantasticphotosInsólito y raroFantasticphotosInsólito y raro
FantasticphotosInsólito y raroindaianasan
 
Minha certificação core 360º treinamento funcional
Minha certificação core 360º treinamento funcionalMinha certificação core 360º treinamento funcional
Minha certificação core 360º treinamento funcionalF! Web Designer
 
Presentación Antonio Castillo en Jornadas participación Guadalquivir odo de c...
Presentación Antonio Castillo en Jornadas participación Guadalquivir odo de c...Presentación Antonio Castillo en Jornadas participación Guadalquivir odo de c...
Presentación Antonio Castillo en Jornadas participación Guadalquivir odo de c...Nueva Cultura del Agua
 
Primera guerra mundial
Primera guerra mundialPrimera guerra mundial
Primera guerra mundialAnnRaq
 
Codigo de trabajo_de_guatemala_sept2011
Codigo de trabajo_de_guatemala_sept2011Codigo de trabajo_de_guatemala_sept2011
Codigo de trabajo_de_guatemala_sept2011Delfido Pirir
 
Presentacion curso "Foro Joven: Ríos para vivirlos"
Presentacion curso "Foro Joven: Ríos para vivirlos"Presentacion curso "Foro Joven: Ríos para vivirlos"
Presentacion curso "Foro Joven: Ríos para vivirlos"Nueva Cultura del Agua
 
Apostila de jogos legais para professores
Apostila de jogos legais para professoresApostila de jogos legais para professores
Apostila de jogos legais para professoresJanaina Fidalgo
 

Andere mochten auch (20)

Hz3115131516
Hz3115131516Hz3115131516
Hz3115131516
 
Most popular diwali gift picks for friends
Most popular diwali gift picks for friendsMost popular diwali gift picks for friends
Most popular diwali gift picks for friends
 
Estratégia de testes ágeis
Estratégia de testes ágeisEstratégia de testes ágeis
Estratégia de testes ágeis
 
D. afonso henriques
D. afonso henriquesD. afonso henriques
D. afonso henriques
 
Alberto Garrido y Bárbara Wilaarts y la huella hídrica en el VII Congreso Ibé...
Alberto Garrido y Bárbara Wilaarts y la huella hídrica en el VII Congreso Ibé...Alberto Garrido y Bárbara Wilaarts y la huella hídrica en el VII Congreso Ibé...
Alberto Garrido y Bárbara Wilaarts y la huella hídrica en el VII Congreso Ibé...
 
Ana escoz urrutia
Ana escoz urrutiaAna escoz urrutia
Ana escoz urrutia
 
FantasticphotosInsólito y raro
FantasticphotosInsólito y raroFantasticphotosInsólito y raro
FantasticphotosInsólito y raro
 
Minha certificação core 360º treinamento funcional
Minha certificação core 360º treinamento funcionalMinha certificação core 360º treinamento funcional
Minha certificação core 360º treinamento funcional
 
Mapas Conceptuales
Mapas ConceptualesMapas Conceptuales
Mapas Conceptuales
 
Presentación Antonio Castillo en Jornadas participación Guadalquivir odo de c...
Presentación Antonio Castillo en Jornadas participación Guadalquivir odo de c...Presentación Antonio Castillo en Jornadas participación Guadalquivir odo de c...
Presentación Antonio Castillo en Jornadas participación Guadalquivir odo de c...
 
Primera guerra mundial
Primera guerra mundialPrimera guerra mundial
Primera guerra mundial
 
La empresa
La empresaLa empresa
La empresa
 
Trabalho de geografia 3 ano a
Trabalho de geografia 3 ano aTrabalho de geografia 3 ano a
Trabalho de geografia 3 ano a
 
Codigo de trabajo_de_guatemala_sept2011
Codigo de trabajo_de_guatemala_sept2011Codigo de trabajo_de_guatemala_sept2011
Codigo de trabajo_de_guatemala_sept2011
 
El sistema nervioso
El sistema nerviosoEl sistema nervioso
El sistema nervioso
 
Missa 22 05
Missa 22 05Missa 22 05
Missa 22 05
 
Agua, ríos y pueblos en Sevilla
Agua, ríos y pueblos en SevillaAgua, ríos y pueblos en Sevilla
Agua, ríos y pueblos en Sevilla
 
Presentacion curso "Foro Joven: Ríos para vivirlos"
Presentacion curso "Foro Joven: Ríos para vivirlos"Presentacion curso "Foro Joven: Ríos para vivirlos"
Presentacion curso "Foro Joven: Ríos para vivirlos"
 
Doctoradas
DoctoradasDoctoradas
Doctoradas
 
Apostila de jogos legais para professores
Apostila de jogos legais para professoresApostila de jogos legais para professores
Apostila de jogos legais para professores
 

Ähnlich wie Fb3110231028

Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Editor IJARCET
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Editor IJARCET
 
Video indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksVideo indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksIAEME Publication
 
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...IJECEIAES
 
Video Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyVideo Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyIRJET Journal
 
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORK
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORKTEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORK
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORKijscai
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding SystemDesign and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding Systemijtsrd
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
 
Key frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptorsKey frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptorseSAT Publishing House
 
Key frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptorsKey frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptorseSAT Journals
 
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...Best Jobs
 
Shot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesShot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesCSCJournals
 
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONVISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONcscpconf
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)IAESIJEECS
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)IAESIJEECS
 
Motion detection in compressed video using macroblock classification
Motion detection in compressed video using macroblock classificationMotion detection in compressed video using macroblock classification
Motion detection in compressed video using macroblock classificationacijjournal
 
An Efficient Block Matching Algorithm Using Logical Image
An Efficient Block Matching Algorithm Using Logical ImageAn Efficient Block Matching Algorithm Using Logical Image
An Efficient Block Matching Algorithm Using Logical ImageIJERA Editor
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET Journal
 

Ähnlich wie Fb3110231028 (20)

Cb35446450
Cb35446450Cb35446450
Cb35446450
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124
 
Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124Volume 2-issue-6-2119-2124
Volume 2-issue-6-2119-2124
 
Video indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracksVideo indexing using shot boundary detection approach and search tracks
Video indexing using shot boundary detection approach and search tracks
 
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...
 
Gg3311121115
Gg3311121115Gg3311121115
Gg3311121115
 
Video Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: SurveyVideo Content Identification using Video Signature: Survey
Video Content Identification using Video Signature: Survey
 
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORK
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORKTEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORK
TEXT DETECTION AND EXTRACTION FROM VIDEOS USING ANN BASED NETWORK
 
Design and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding SystemDesign and Analysis of Quantization Based Low Bit Rate Encoding System
Design and Analysis of Quantization Based Low Bit Rate Encoding System
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
 
Key frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptorsKey frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptors
 
Key frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptorsKey frame extraction for video summarization using motion activity descriptors
Key frame extraction for video summarization using motion activity descriptors
 
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
A Segmentation Based Sequential Pattern Matching for Efficient Video Copy Det...
 
Shot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion ActivitiesShot Boundary Detection In Videos Sequences Using Motion Activities
Shot Boundary Detection In Videos Sequences Using Motion Activities
 
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONVISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATION
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)
 
24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)24 7912 9261-1-ed a meaningful (edit a)
24 7912 9261-1-ed a meaningful (edit a)
 
Motion detection in compressed video using macroblock classification
Motion detection in compressed video using macroblock classificationMotion detection in compressed video using macroblock classification
Motion detection in compressed video using macroblock classification
 
An Efficient Block Matching Algorithm Using Logical Image
An Efficient Block Matching Algorithm Using Logical ImageAn Efficient Block Matching Algorithm Using Logical Image
An Efficient Block Matching Algorithm Using Logical Image
 
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval IRJET-Feature Extraction from Video Data for Indexing and Retrieval
IRJET-Feature Extraction from Video Data for Indexing and Retrieval
 

Fb3110231028

  • 1. Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028 The Stroke Filter Based Caption Extraction System Miss.Dharmini Esther Thangam.P#1(PG scholar),Mrs.Akila Agnes.S#2(Asst.Prof) Department of Computer Science and Engineering, Karunya University Coimbatore, India Abstract- Captions in videos provide rich then caption extraction is carried out. By considering semantic information the contents of the video can temporal feature approach the computational load is be clearly understand with help of the captions. reduced and more accurate captions are extracted. Without even seeing the video the person can The proposed caption extraction method consists of understand about the video by knowing the three important steps. They are caption detection, captions in the video. By considering these reasons caption localization and caption segmentation. For captions extraction in videos become an important caption detection we use stroke based detection prerequisite. In this paper we use a stroke filter, method rather than edge, corner or texture based the stroke filter identifies the strokes in the video detection because sometimes background has dense and usually caption regions have strokes such that edges so background edges may have been mistaken the strokes identified belongs to captions by which as that of caption edges in this kind of situation edge the captions are detected. Then the locations of the based detection doesn’t works well, If the captions are localized. In the next Step the caption background has densely distributed corners corner pixels are separated from the background pixels. based detection doesn’t suits and in texture based At last a post processing step is performed to detection we must previously know about the texture check whether the correct caption pixels are of the captions [4]. Therefore stroke filter is used extracted. In this paper a step by step sequential because captions only Have stroke like edges. For procedure to extract captions from videos is caption localization spatial-temporal localization is proposed. used whereas the previous methods used DCT algorithm [7] .The drawback of DCT algorithm is Keywords-stroke filter, caption detection, caption that it doesn’t considers the time interval of the extraction,caption pixels captions so the same captions are found again and again. For caption segmentation we use spatial- I. INTRODUCTION temporal segmentation, the previous methods used Video processing is a particular case only temporal segmentation [11]. The disadvantage of signal processing, which often employs video of the previous method is that there are many types of filters and where the input and close up and medium shots make it difficult to output signals are video files or streams. The use of accurately distinguish them, it involves many digital video is becoming widespread all over the complex steps and does not handle false positives due place, extending throughout the world. Captions in to camera operations, whereas the proposed system videos provide rich semantic information which is aims to guarantee each clip contains same caption, so useful in video content analysis, indexing and that the efficiency and accuracy of caption extraction retrieval, traffic monitoring and computerized aid for is greatly improved. visually impaired. Caption extraction from videos is a difficult problem due to the unconstrained nature of II. PROPOSED APPROACH general-purpose video. Text can have arbitrary color, The proposed approach is a step by step size, and orientation. Backgrounds may be complex sequential procedure to extract captions from videos. and changing. The previous caption extraction The proposed system functions as follows method considers each video frame as an independent image and it extracts caption from each video frame. However, captions occurring in video usually persist for several seconds. Consider if a movie played by 30 frames per second contains large number of video frames and if each video frame is considered as an independent image it takes more time and the same captions may be extracted again and again [5].To overcome this disadvantage in our method we consider the temporal feature in videos such that the time interval in which same caption is present and 1023 | P a g e
  • 2. Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028 four values based on horizontal, vice-diagonal, Caption detection vertical, main diagonal directions. Then each contour is broken down into several sub contours based on their connectivity and gradient direction. Then each sub contour must contain a parallel edge with the same gradient direction in a specified interval depending on the width of character strokes. Caption localization Caption Segmentation Fig.2 quantize Õk into four values Fig .1Steps in proposed system A. Caption Detection A stroke filter is used to extract captions from videos. The stroke filter identifies the stroke like edges in the captions. Strokes of a pen or brush are the Fig. 3 subcontours in the character K movements or marks that we make with it when writing or painting. The stroke filter which can detect If a contour does not satisfy the above mentioned two the strokes in video images is designed as follows. conditions it is removed as non stroke edge and the First, we define a local image region as a stroke-like edge doesn’t belongs to captions [12]. If contour structure, if and only if, in terms of its intensities,(1) satisfies the above mentioned conditions it is it is different from its lateral regions(2) its lateral considered as stroke like edge pixels and thus the regions are similar to each other (3) it is nearly captions are detected. homogenous. By using stroke filter the stroke like B. Caption Localization edge pixels are denoted by 1 and the remaining pixels There are two types of caption localization is are denoted by 0[17].The stroke like edge pixels are performed. They are spatial localization and temporal taken as contours. Sometimes the edge pixels may localization Spatial localization involves finding the also come from complex background so to location of the caption in the video frame. Temporal differentiate the caption pixels from edge pixels we localization involves finding the time interval in use some techniques they are (i)the strokes always which the captions are found in consecutive video have a range of height and width which depend on frames. the size of the characters and length and width of 1)Spatial localization-We use SVM classifier to characters in captions may not be too long. Let w(c i) identify the location of the caption in the video and h(ci) denote the width and height of minimum frame. Compared with other classifiers such as neural enclosing rectangle(MER) of contour ci, if network and decision tree, support vector machine W(ci)<=threshold1,h(ci)<=threshold2 (SVM) is easier to train and has better generalization (1) ability. Therefore, we use the SVM classifier and Where the thresholds are chosen based on the size of choose radial basis function (RBF) as kernel function characters to be detected. If the above condition (i) is of this SVM classifier. The basic idea of SVM is to not satisfied the contour ci is removed as non stroke implicitly project the input space onto a higher edges in complex background.(ii)for each contour the dimensional space where the two classes are more gradient direction Õk has to found out. The gradient linearly separable. Given the caption candidate Õk is quantized into four values based on horizontal, produced by the caption detection module, we first vice-diagonal, vertical, main diagonal directions. normalize it to a height of 15 pixels height and Then each contour is broken down into several sub corresponding resized weight. Then, we use a sliding contours based on their connectivity and gradient window with 15X15 pixels to generate several direction [1]. Then each where the thresholds are samples[14]. We use the trained SVM classifiers to chosen based on the size of characters to be detected. classify each of these samples and fuse the results. If the above condition (i) is not satisfied the contour The SVM output scores of theses samples are ci is removed as non stroke edges in complex averaged. If the average is larger that a predefined background.(ii)for each contour the gradient direction threshold, then the whole text line is regarded as a Õk has to found out. The gradient Õk is quantized into text line, otherwise it is regarded as a false alarm. We 1024 | P a g e
  • 3. Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028 set the threshold value as 0.3 based on the trade-off captions extracted must be carried out to get the between the false acceptance rate (FAR) and false accurate captions. The sliding window protocol is rejection rate (FRR). used to refine the segmentation results . we choose an 2)Temporallocalization-Temporal localization 8 by 8 window as sliding window size[18]. A number involves identifying the time interval of the captions. of caption pixels in sliding window is checked It identifies the consecutive frames that containing against a threshold value. If the number of the the same caption and forms a clip with caption or a caption pixels exceeds a threshold value, then it clip with no caption St1,t2(x,y)=∑( Et1(x,y) ^Et2(x,y)) implies that some of background pixels are with the (2) caption pixels so again segmentation is to carried out to get the accurate captions[20]. The logical AND operation is performed to find out whether the consecutive frames containing the same These sequential steps are performed to extract captions. If the result is 1 the both edge pixels are captions from video. same otherwise the edge pixels are different. All the edge pixels of consecutive frames are performed III. DATA FLOW DIAGRAMS logical AND operation to check whether the frames A. Context Level Dfd contains captions. If the frames containing same captions are identified then they are combined to Caption Captions form a clip[16]. The more important thing related to temporal localization is that whether the clip extraction Video generated contains captions ,since only the clips containing containing captions are meaning full for the captions subsequent computation[8]. If the consecutive frames does not contain captions they form a clip with no B. Level 1 Dfd Of Caption Detection caption. Consider consecutive video frames It1(x,y), It2(x,y), It3(x,y)…..... It(k-1)(x,y) does not contain captions but frame Itk(x,y) contains caption then Detect stroke frames It1(x,y) to It(k-1)(x,y) will form a clip with no like edges caption. So no further computation has to be carried video out in clip containing no caption. C. Caption Segmentation The caption segmentation involves separating the caption pixels from pixels in caption region located. Identify caption A color based approach is used to extract the caption contour pixels. It is based on the idea that usually caption pixels are of same color and the color of the caption pixels is different from that of background pixels. Mostly the color of the captions will be same over Caption contour different video frames. After getting the color models of the caption pixels, the probability of each pixel in the image to caption pixel can be calculated[21]. C. Level 1 DFD Of Caption Localization According to the probability, most of the caption pixels can be segmented from the background. Another method is that partition the RGB color space Spatial into 64 cubes. We exploit the temporal homogeneity localization in color of caption pixels to filter out from Caption background pixels. Let Bt1(x,y),Bt2(x,y)........Btn(x,y) contour be the pixels of the video frame of the clip containing captions. By performing logical AND operation of the pixels we can identify the pixels of the captions and the background because the caption pixels have more population than the background pixels. Temporal Bt1(x,y)^Bt2(x,y)^........Btn(x,y) localization (3) Thus the captions are extracted from the videos. D. Refining The captions which have extracted may have some of the background pixels or some of the caption pixels may be in the background itself. So refining of Location of caption in video frame 1025 | P a g e
  • 4. Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028 D.Level 1 DFD Of Caption Segmentation Method Ng Nd Nc Ptem Rtem Identify Texture 1660 1730 1572 0.909 0.947 temporal based homogeneity Corner 1660 1695 1543 0.910 0.930 Caption based in video of colour Proposed 1660 1669 1660 0.995 1.000 frame Pixel containing Method captions B.Spatial Localization The performance of spatial localization is measured on the definition that a correct localization rectangle Segment the is counted if and only if the intersection of a located captions text rectangle(LTR) and a ground-truth text rectangle(GTR) covers at least 90% of their union.Recall Rloc=Nrc/Ntc and precision Ploc=Nrc/Ngc where Nrc  number of text rectangles located correctly Ngc number of ground truth rectangle Captions Ntc total number of rectangles located by the proposed system.The performance of the proposed system is given by calculation is These are the data flow diagram of the proposed R P approach. Location Rloc=40/50=0.8 Ploc=48/50=0.96 IV. EXPERIMENTAL EVALUATION The datasets that may be chosen as ` `` movies, news, sports, talk shows and entertainment C. Segmentation shows. In our experiment we use dataset as movies The failure of segmentation results occur due to two and sports video. We evaluate the performance of important reasons. They are (i)due to the incorrect the proposed system based on three factors. They are localization that is some part of the caption is outside based on temporal localization, spatial localization the bounding box or the entire part of the caption is and segmentation of captions. For experiments we not inside the bounding box (ii) if the background is take 50 sample videos. We extract captions from this of same color of that of pixels segmentation may videos and we identify the precision and recall. result to some amount of failure. Two metrics are used for evaluation of segmentation performance A.Temporal Localization .They are Rseg and Pseg. Recall Rseg=Nrp/Ngp and Evaluating the performance of temporal localization precision Pseg=Nrp/Ntp where is an important factor because computations are Ntp  number of text pixels segmented by the system carried out after the temporal localization .Two Ngp number of text pixels ground truth metrics are used for the evaluation of temporal localization. They are(1)Precision(2)Recall. PrecisionPtem=Nc/Nd R P RecallRtem=Nc/Ng Where Nc denotes the number of boundaries Segmentation Rseg=42/50=0.84 Pseg=46/50=0.92 correctly detected , Nd denotes the number of boundaries output by the system, Ng denotes the The experimental results of our proposed method number of ground truth boundaries .The ground truth implemented in sports video is boundaries involves three cases .They are (i) from a caption to a different caption (ii) from no caption to a caption (iii) from a caption to no caption. The comparison of two temporal localization methods in different dataset is given as Thus the stroke filter method has more excellent performance than the other two methods with the increase of 8.6%,8.5% in Ptem and 0.53%,7% in Rtem. 1026 | P a g e
  • 5. Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028 Fig. 4 input video Fig .7 Extraced Captions IV. CONCLUSION Our proposed system performs step by step sequential procedure to extract captions from videos. This method is simple, efficient and flexible. A stroke like edge detector based on contours is used, which can effectively remove non-caption edges introduced from complex background and greatly benefit the follow-up spatial and temporal localization processes. This paper can be further developed by using optical character recognition module to identify each character in the captions. This method is used in the traffic monitoring in such a way that a video camera is located at each checkpoint which records the video of all the vehicles crossing that checkpoint .So when vehicles violate traffic rules the number of that vehicle can be identified extracting the captions(number plate) from Fig .5 stroke edge detection the video recorded in the checkpoint.Thus the method is used in the traffic monitoring. ACKNOWLEDGMENTS The authors would like to thank ALMIGHTY GOD whose blessings have bestowed in me the will power and confidence to carry out my work REFERENCES [1] Xiaoqian Liu and Weiqiang Wang (April 2012) “Robustly Extracting Captions in Videos Based on Stroke-Like Edges and Spatio- Temporal Analysis ,” in Proc. IEEE Transactions on multimedia, vol 14,No 2. [2] Bertini.M, C. Colombo, and A. D. Bimbo,(Aug.2001) “Automatic captionlocalization in videos using salient points”, in Proc. Int.Conf.Multimedia and Expo, pp. 68–71. Fig .6 localization 1027 | P a g e
  • 6. Miss.Dharmini Esther Thangam.P, Mrs.Akila Agnes.S / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 3, Issue 1, January -February 2013, pp.1023-1028 [3] Cho.J, S. Jeong, and B. Choi,( Aug. 2004) [15] Tupin, F., Maitre, H., Mangin, J.F., Nicolas, “News video retrieval using automatic indexing J.M., Pechersky, E., 1998. Detection of linear of korean closed-caption,” Lecture Notes in features in SAR images: Application to road Computer Science, vol. 2945, pp. 694–703. network extraction. IEEE [4] Fan.J,D.K.Y. Yau, A.K. Elmagarmid, and W.G. Trans. GeoScience Remote Sensing 36, 434– Aref,(2001)“Automatic Image Segmentation by 453. Integrating Color-Edge Extraction and Seeded [16] Vapnick, V., 1995. The Nature of Statistical Region Growing”, IEEE Trans. on Learning Theory. Springer.Wang, Y., Liu, Y., ImageProcessing, 10(10): 1454-1466. Huang, J.C., 2000. Multimedia content analysis [5] [5] Justin miller, X. R. Chen, W. Y. Liu, and H. using both audioand visual clues. IEEE Signal J. Zhong, (Sep 2001) “Automatic location of Process. Mag. 17 (6), 12–36. text in video frames”, in Proc. ACM Workshop [17] Wu, V., Manmatha, R., Riseman, E.M., 1997. Multimedia :Information Retrieval, Ottawa, TextFinder: An automatic system todetect and ON, Canada. recognize text in images. IEEE Trans. PAMI 21 [6] Jagath Samarabandu and Xiaoqing Liu(2007) (11), 1224–1229. ”An Edge-based TextRegion Extraction [18] Ye, Q., Huang, Q., Gao, W., Zhao, D., 2005. Algorithm for Indoor Mobile Robot Fast and robust text detection in imagesand Navigation”, IEEE Transactions of Knowledge video frames. Image Vision Comput. 23, 565– and DataEngineeringpp.273-280. 576. [7] K. C. K. Kim, H. R. Byun, Y. J. Song, Y. W. [19] Zheng, Y., Li, H., Doermann, D., 2004. Choi, S. Y. Chi, K. K. Kim, Y. K. Chung, (Aug. Machine printed text and 2004) “Scene text extraction in natural scene handwritingidentification in noisy document images using hierarchical feature combining images. IEEE Trans. PAMI 26 (3), 337–353. and verification,” in Proc. Int.Conf. Pattern [20] Canny, J.F., 1986. A computational approach to Recognition, vol.2, pp. 679–682. edge detection. IEEE Trans. Pattern Anal. [8] M. R. Lyu, J. Song, and M. Cai,( Feb. 2005) “A Machine Intell. 8, 679–698. comprehensive method for multilingual video text detection, localization, and extraction,” IEEE Trans.Circuit and Systems for Video Technology, vol. 15, no. 2, pp. 243–255. [9] S.Liao, Max W. K. Law, and Albert C. S. Chung,( May 2009)” Dominant Local Binary Patterns for Texture Classification”, IEEE Transactions on Image Processing, Vol. 18, No. 5. [10] Qixiang Ye, Wen Gao, Weiqiang Wang, Wei Zeng, ( May 2009 ) “A Robust Text Detection Algorithm in Images and Video Frames ”, IEEE Trans. Circuits Syst. Video Technol., vol. 19(5), pp. 753–759. [11] K. I. Kim, K. Jung, and J. H. Kim, “Texture- based approach for textdetection in images using support vector machines and continuously adaptive mean shift algorithm,” Pattern Anal. Mach. Intell., vol. 25,no. 12, pp. 1631–1639, 2003. [12] C. Garcia and X. Apostolidis, “Text detection and segmentation incomplex color images,” in Proc. IEEE Int. Conf. Acoustics, Speech,and Signal Processing, Istanbul, Turkey, 2000, pp. 2326–2329. [13] A.K.Jain and S. Bhattacharjee, “Text segmentation using Gabor filtersfor automatic document processing,”Mach, Vis. Appl,, vol. 5, no. 3, pp.169–184, 1992. [14] Lyu, M., Song, J., Cai, M., 2005. A comprehensive method for multilingual video textdetection, localization, and extraction. IEEE Trans. CSVT 15 (2), 243–255. 1028 | P a g e